Analyzing Random numbers

MrAl

Joined Jun 17, 2014
13,704
In order for N tosses of a fair coin to have a mean of exactly 0.5, you must have N/2 heads. This rules out any sequence with an odd number of tosses. In the case of an even number of tosses, the probability of getting exactly N/2 heads is

nCr(N, N/2) / 2^N​

As 2^N grows faster than nCr(N, N/2), the probability of any particular sequence having exactly N/2 heads diminishes as the length of the sequence grows. Indeed, the probability approaches zero as N → ∞.

Recall that you were talking about a practical test for determining whether a given (finite) sequence is random. I've given you "my theory" as to why it is a terrible test for randomness; if you disagree, please explain how my analysis fails.
Because a practical test shows clearly otherwise, the result 'settles' to 1/2 or rather the average of the samples.
To say that settling to the average is antithetical means we must not be talking about the same things or else you have NEVER EVER NEVER EVER done a REAL LIFE practical test to prove or disprove your theoretical 'result'.
DO A GOD DANG TEST WILL YA ?? ha ha :)
 

bogosort

Joined Sep 24, 2011
696
So just because one compression algorithm does very poorly on a particular data set does not really say much about whether that data set is truly random; it could simply be a data set that is too poor a match for the kind of data that the algorithm was designed for. What we CAN say is that if we DO get any significant compression (and we could probably put a threshold on it), then we have high confidence that the data is NOT truly random since all compression algorithms exploit nonrandomness in the data to achieve compression. But if we don't get any significant compression, about all we can say is that we haven't ruled out the possibility that the data is truly random and, perhaps, we have some reason to suspect that it might be.
Agreed, randomness cannot be proven. It may not even really exist, except as a measure of our own ignorance.
 

bogosort

Joined Sep 24, 2011
696
Because a practical test shows clearly otherwise, the result 'settles' to 1/2 or rather the average of the samples.
To say that settling to the average is antithetical means we must not be talking about the same things or else you have NEVER EVER NEVER EVER done a REAL LIFE practical test to prove or disprove your theoretical 'result'.
DO A GOD DANG TEST WILL YA ?? ha ha :)
I gave you a practical test, a really easy one to perform yourself. Flip a coin 10 times; did you get 5 heads? Likely not. Flip a coin 1,000 times; did you get 500 heads? I'll bet you $100 you didn't. How much more practical of an experiment do you need?
 

MrAl

Joined Jun 17, 2014
13,704
[1] The details are in the fine print.

[2] The odds that it will be within an arbitrarily small range centered on 1/2 will approach unity.

[3] The odds that it will BE 1/2 will approach zero.
Hi,

Not sure what you mean by the last 3rd statement. Why do you make this sound significant?
 

MrAl

Joined Jun 17, 2014
13,704
I gave you a practical test, a really easy one to perform yourself. Flip a coin 10 times; did you get 5 heads? Likely not. Flip a coin 1,000 times; did you get 500 heads? I'll bet you $100 you didn't. How much more practical of an experiment do you need?
Hello,

I dont think you can rely on so few samples. There are runs in these experiments. You can see a run of 10 heads for example.
However, it looks like you either dont want to do a good test or you believe so strongly that it wont work that you feel strongly it would be a waste of time, so then i'll have to repeat the test myself and show you the results. I hope you are not wearing a hat today because from what i hear they dont taste very good :)

Thanks for being a good sport about this though too. I appreciate that you discuss this mostly with intelligence and not with an over emphasis on emotion.
 

bogosort

Joined Sep 24, 2011
696
I dont think you can rely on so few samples. There are runs in these experiments. You can see a run of 10 heads for example.
However, it looks like you either dont want to do a good test or you believe so strongly that it wont work that you feel strongly it would be a waste of time, so then i'll have to repeat the test myself and show you the results.
Did you see the part above about the probability of getting N/2 heads in N samples diminishing as N gets larger? How many samples is good enough for you? How about a billion? The chances of you getting exactly 500 million heads in a billion throws is less than three-thousandths of a percent.

I hope you are not wearing a hat today because from what i hear they dont taste very good
I'm happy to eat my hat if it means that I'm able to correct a mistake in the way I've been thinking. But, so far, it feels like I've been providing quantifiable evidence for my case, while you haven't provided any. Show me the money, as it were.
 

djsfantasi

Joined Apr 11, 2010
9,237
Did you see the part above about the probability of getting N/2 heads in N samples diminishing as N gets larger? How many samples is good enough for you? How about a billion? The chances of you getting exactly 500 million heads in a billion throws is less than three-thousandths of a percent.

I'm happy to eat my hat if it means that I'm able to correct a mistake in the way I've been thinking. But, so far, it feels like I've been providing quantifiable evidence for my case, while you haven't provided any. Show me the money, as it were.
Does anyone else feel that this discussion is pedantic and has no resolution. IMHO, bogosort is taking an untenable position, with which he is claiming undeniable truth. WBahn has stated the problem succinctly in post#77. I agree with bogosort that finding a probability of EXACTLY 0.5 is unlikely.

That’s irrelevant.

The limit of the process is the real story. And it WILL be 0.5, no matter what he says.

The experiment will approach a value of 0.5 with a sample size that approaches infinity. As sample size increases, the range of values around 0.5 decreases decrease to a sufficiently small value. And I don’t know about you, but there comes a point where that range is PRACTICALLY zero.

There’s an old joke we told at Georgia Tech. There was a ball and at the end of the night, they had a special dance. All women were on one side of the ballroom; the men were on the other. At each verse, each approached the opposite side by half of the distance. The mathematicians left. The Engineers stayed, because they new they could get close enough for all practical purposes.

And that is that!
 
Last edited:

Ya’akov

Joined Jan 27, 2019
10,235
There’s an old joke we told at Georgia Tech. There was a ball and at the end of the night, they had a special dance. All women were on one side of the ballroom; the men were on the other. At each verse, each approached the opposite side by half of the distance. The mathematicians left. The Engineers stayed, because they new they could get close enough for all practical purposes.
And the physicists stayed by going home which was an order of magnitude approximation of staying.
 

bogosort

Joined Sep 24, 2011
696
Does anyone else feel that this discussion is pedantic and has no resolution. IMHO, bogosort is taking an untenable position, with which he is claiming undeniable truth. WBahn has stated the problem succinctly in post#77. I agree with bogosort that finding a probability of EXACTLY 0.5 is unlikely.

That’s irrelevant.
On the contrary, the context of this "pedantic" discussion is a practical test for randomness, namely, that if you sum the extreme values in a sequence and divide by 2, the sequence is random if it equals the mean value. In other words, MrAl's test requires that a sequence of coin flips have a mean of 0.5 EXACTLY. My analysis of this test is based on the very practical fact that the vast majority of actual random sequences will not have this property. How this can be interpreted as theoretical pedantry is beyond me.
 

djsfantasi

Joined Apr 11, 2010
9,237
On the contrary, the context of this "pedantic" discussion is a practical test for randomness, namely, that if you sum the extreme values in a sequence and divide by 2, the sequence is random if it equals the mean value. In other words, MrAl's test requires that a sequence of coin flips have a mean of 0.5 EXACTLY. My analysis of this test is based on the very practical fact that the vast majority of actual random sequences will not have this property. How this can be interpreted as theoretical pedantry is beyond me.
Sorry, I just don’t get it. I’ve reread every one of MrAl’s posts and disagree with your statement on his conclusion.
 

WBahn

Joined Mar 31, 2012
32,834
Sorry, I just don’t get it. I’ve reread every one of MrAl’s posts and disagree with your statement on his conclusion.
Look at Post #60: "One test that i've used in the past that i remember is the mean. The mean of an infinite number of samples of a true random variable is 1/2 of the sum of the two extremes (given every number between is generated also). So if you generate numbers 3, 4, and 5, the mean is 4 so after an infinite number of generations you better get 4 exactly."

So here's my PRNG that will pass this test.

C:
int prng()
{
   static int n = 3;
   n = (n - 2) % 3 + 3;
   return n;
}
It generates every number between 3 and 5, inclusive, and after an infinite number of generations the average of the extremes is exactly the mean. (provided infinity is divisible by 3 :D).

Or consider a prng that uses a true rng -- which we will assume exists and is accessible -- that returns a random integer between 0 and one less than the number passed to it as follows:

C:
int prng()
{
   static int history[] = {0,0,0};

   n = 3 + trueRandom(3);

   if (3 * history[n] < 1)
   {
      history[n]++;
      return n + 3;
   }

   history[(n + 1) % 3]++;
   return 3 + ((n + 1) % 3) ;
}
The idea here is that you preferentially return a number if that number is presently underrepresented in the sequence thus far.

Note that you can easily tweak this to prevent overflow in the history array (well, make it astronomically unlikely).

You can also easily tweak it so that the likelihood of the least represented number is increased just slightly while the likelihood of most represented number is decreased just slightly and by the same amount (leaving the likelihood of the third unchanged).

By definition the result is NOT a truly random sequence precisely because the trials are not independent -- the outcome probabilities for the next trial is dependent on the results of prior trials.

If I make the shift in the probabilities small enough, this prng will pass any test you devise that is based on looking at the results matching the expected distributions close enough. The tests that it might fail will be the ones that look at the likelihood that the results match the expected distributions too closely.
 

djsfantasi

Joined Apr 11, 2010
9,237
Look at Post #60: "One test that i've used in the past that i remember is the mean. The mean of an infinite number of samples of a true random variable is 1/2 of the sum of the two extremes (given every number between is generated also). So if you generate numbers 3, 4, and 5, the mean is 4 so after an infinite number of generations you better get 4 exactly."

So here's my PRNG that will pass this test.

C:
int prng()
{
   static int n = 3;
   n = (n - 2) % 3 + 3;
   return n;
}
It generates every number between 3 and 5, inclusive, and after an infinite number of generations the average of the extremes is exactly the mean. (provided infinity is divisible by 3 :D).

Or consider a prng that uses a true rng -- which we will assume exists and is accessible -- that returns a random integer between 0 and one less than the number passed to it as follows:

C:
int prng()
{
   static int history[] = {0,0,0};

   n = 3 + trueRandom(3);

   if (3 * history[n] < 1)
   {
      history[n]++;
      return n + 3;
   }

   history[(n + 1) % 3]++;
   return 3 + ((n + 1) % 3) ;
}
The idea here is that you preferentially return a number if that number is presently underrepresented in the sequence thus far.

Note that you can easily tweak this to prevent overflow in the history array (well, make it astronomically unlikely).

You can also easily tweak it so that the likelihood of the least represented number is increased just slightly while the likelihood of most represented number is decreased just slightly and by the same amount (leaving the likelihood of the third unchanged).

By definition the result is NOT a truly random sequence precisely because the trials are not independent -- the outcome probabilities for the next trial is dependent on the results of prior trials.

If I make the shift in the probabilities small enough, this prng will pass any test you devise that is based on looking at the results matching the expected distributions close enough. The tests that it might fail will be the ones that look at the likelihood that the results match the expected distributions too closely.
Ok. You found the smoking gun. But that was the only strict statement. The posts I remember state that the mean should be within a small but finite range of the mean. Focusing on this one statement, IMHO, had diminished the rest of the discussion.
 

WBahn

Joined Mar 31, 2012
32,834
Ok. You found the smoking gun. But that was the only strict statement. The posts I remember state that the mean should be within a small but finite range of the mean. Focusing on this one statement, IMHO, had diminished the rest of the discussion.
The claim repeated elsewhere, as well.

But note that the majority of my post isn't about that and so let's assume we are talking about getting close to the expected values of whatever metric we are using. I put forth a prng which is demonstrably nonrandom but which I claim (and I could be wrong, but it's what I believe right now) will pass any suite of tests that is based on this approach. The most likely way to detect the nonrandomness is probably to test for it getting too close too quickly to those expected values.

It's interesting that some (many? most?) of the more widely used suites, including diehard, do not check the data for whether all of the possible values occurred with (about) equal frequency. Presumably this is (or at least might be) because such tests are of such poor validity and reliability that they aren't worth performing even though they are far simply and faster to perform than the other tests in the suite.
 

MrAl

Joined Jun 17, 2014
13,704
Because you've stated that the results go to EXACTLY the expected value.
Hello again,

Well that is only when the limit of the number of samples approaches infinity. So would you rather state it as the difference between 1/2 and 1/infinity?
So in other words, it approaches:
0.5+1/inf
or:
0.5-1/inf

which looks like exactly 0.5 to me.
 

MrAl

Joined Jun 17, 2014
13,704
On the contrary, the context of this "pedantic" discussion is a practical test for randomness, namely, that if you sum the extreme values in a sequence and divide by 2, the sequence is random if it equals the mean value. In other words, MrAl's test requires that a sequence of coin flips have a mean of 0.5 EXACTLY. My analysis of this test is based on the very practical fact that the vast majority of actual random sequences will not have this property. How this can be interpreted as theoretical pedantry is beyond me.
Hi again,

Before i produce the actual test results let me run one more thing by you.

Let's use the numbers 1,2,3 as the possible samples. So either of those three numbers can come out of the generator.
The simple formula i suggested was the mean is (1+3)/2=2 but the full fledged formula would be:
mean=(1+2+3)/3=2
and since we dont want to leave out any samples let's use that last one.


Now the probability that 1 would be generated is just as likely as 2 and just as likely as 3.
What that means is that 1 is not systematically generated and neither is 2 or 3, so they must all be generated with the same frequency over a large number of samples. That's because if 1 is generated then 1 suddenly has less probabliity than 2 or 3 otherwise 2 and 3 would not have equal probability of being generated with 1 being systematic.
What this means is that for a large number of samples we would see 1 as many times as 2 and as many times as 3 being generated.
Reducing the set for simplicity, we might see:
1 generated 1 million times
2 generated 1 million times
3 generated 1 million times.
Now of course this is optimal, but we might see let's say 2 generated 1 million +10 times or 1000010 times.
The mean of this is 2.00000666666666 with repeating 6.
Now since 2 was generated 10 more times than 1 or 3, we would expect 1 and 3 to be generated more times in the future than 2 which would then bring the count closer to maybe 2000020 times for all three which brings the mean back to 2 again.
The reason we would see 1 and 3 generated more times is again because 2 being generated can not be systematic because if it was then the generator could not be claimed to be generating numbers with equal probability.

Next i will produce actual test values and what we will see (because i've done this numerous times) is the mean getting closer and close to the presumed value which is the mean as the number of samples goes up. Thus i cant help but think that as the number of samples approaches infinity the mean would be "mean+1/inf" which is just the mean.
If this requirement is not met then one of the numbers must be being produced systematically and that's not random. As we see from above if 2 is produced more times than 1 or 3 we dont get exactly 2.000000000000000 we get 2+err where err is some small error.
Note that this is not the ONLY test in that it is not a "pass or fail" test on the RNG, it's just one test that might be called a first test because if it passes that does not mean that it is definitely random, it's just that if it fails then it cant be. The reason of course is that (2+2+2)/6 is also equal to 2, so a RNG that always produces "2" will also pass the test obviously, but (2+2+3)/6 will not (with 1 never being generated).
 

bogosort

Joined Sep 24, 2011
696
That's because if 1 is generated then 1 suddenly has less probabliity than 2 or 3 otherwise 2 and 3 would not have equal probability of being generated with 1 being systematic.
If I'm understanding you correctly -- that the probabilities change depending on previous outcomes -- then you're not describing a random process. The reason that we do see runs of 10 heads in a row on coin flips is because each flip is independent of the previous; the probability of heads is always 0.5 (for a fair coin). A process that always generates exactly N/2 heads for N flips is the opposite of random.

Next i will produce actual test values and what we will see (because i've done this numerous times) is the mean getting closer and close to the presumed value which is the mean as the number of samples goes up.
If your "big reveal" is that the mean is approached when you take the average mean of an ensemble of sequences, don't bother, as that's not surprising in the least. Likewise, if your big reveal is that the mean is approached as N → ∞, again, don't bother.

You billed this method as a real-world randomness test for a sequence of numbers. This of course implies finite sequences. You acknowledge that your test is susceptible to false positives (a string of "2"s will pass the test), and say that if a sequence fails the test then it "cant be" random. My challenge to you is to produce ten sequences, any length you want, generated by established random or pseudo-random methods, and see how many pass your test. Flip coins, use rand(), count Geiger clicks and transform the results to a uniform distribution, whatever. If you don't have the time, tell me the properties the sequences should have and I can quickly whip something up in MATLAB.
 

bogosort

Joined Sep 24, 2011
696
The probability of flipping heads is 0.5.

However, knowing that the previous outcome was heads, the probability of flipping heads is now 0.25.
This is both false and depressing if you really believe this. I'm hoping that what you really meant to say was that the probability of flipping two heads in a row in two flips is 0.25.
 

WBahn

Joined Mar 31, 2012
32,834
The probability of flipping heads is 0.5.

However, knowing that the previous outcome was heads, the probability of flipping heads is now 0.25.

This is called conditional probability.

Reference:
https://en.wikipedia.org/wiki/Conditional_probability
Are you really trying to claim that the probability of flipping heads on the next flip is determined by what the outcome on the previous flip was?

What will the probability be of flipping heads on the tenth flip if we know the prior nine flips were all heads?

I can't tell for sure how you are getting the 0.25 value. Is this your line of thinking:

  • If I flip a coin twice then there are four possible outcomes: HH, HT, TH, TT.
  • So the probability of getting HH after two flips is 0.25
  • If I flip the coin once and it comes out heads, then on the next flip if it comes out heads I will have HH and that can only happen 25% of the time, so my probability of getting heads on the second flip must therefore be 0.25
If so, then follow that same line of reasoning to determine the probability of getting tails on the next flip. It will also be 0.25 since if you flip tails you then have HT and that, too, can only happen 25% of the time.

Now let's do it correctly.

The probability that you will have HH after two flips given that I have H after one flip is 0.50.
The probability that you will have HT after two flips given that I have H after one flip is 0.50.
The probability that you will have TH after two flips given that I have H after one flip is 0.00.
The probability that you will have TT after two flips given that I have H after one flip is 0.00.
The probability that you will have HH after two flips given that I have T after one flip is 0.00.
The probability that you will have HT after two flips given that I have T after one flip is 0.00.
The probability that you will have TH after two flips given that I have T after one flip is 0.50.
The probability that you will have TT after two flips given that I have T after one flip is 0.50.
 
Top