Analyzing Random numbers

xox

Joined Sep 8, 2017
936
The probability of flipping heads is 0.5.

However, knowing that the previous outcome was heads, the probability of flipping heads is now 0.25.

This is called conditional probability.

Reference:
https://en.wikipedia.org/wiki/Conditional_probability
Conditional probability applies to interdependent variables, not things like fair coin flipping. Again it just boils down to combinatorics. There are simply MORE outcomes which amount to a roughly equal number number of heads and tails and THAT is the only reason why we usually observe it to be so. Doesn't mean that a gazillion coin flips in a row couldn't possibly be all heads. It could definitely happen and that's clearly implied by the laws of probability. Just like you could fling some paint on a wall and the result could be a perfect resemblance of some Van Gogh painting. The only reason we don't generally obsevre that sort of thing is because there are WAY more outcomes which do not lead to that result.
 

MrAl

Joined Jun 17, 2014
13,704
If I'm understanding you correctly -- that the probabilities change depending on previous outcomes -- then you're not describing a random process. The reason that we do see runs of 10 heads in a row on coin flips is because each flip is independent of the previous; the probability of heads is always 0.5 (for a fair coin). A process that always generates exactly N/2 heads for N flips is the opposite of random.


If your "big reveal" is that the mean is approached when you take the average mean of an ensemble of sequences, don't bother, as that's not surprising in the least. Likewise, if your big reveal is that the mean is approached as N → ∞, again, don't bother.

You billed this method as a real-world randomness test for a sequence of numbers. This of course implies finite sequences. You acknowledge that your test is susceptible to false positives (a string of "2"s will pass the test), and say that if a sequence fails the test then it "cant be" random. My challenge to you is to produce ten sequences, any length you want, generated by established random or pseudo-random methods, and see how many pass your test. Flip coins, use rand(), count Geiger clicks and transform the results to a uniform distribution, whatever. If you don't have the time, tell me the properties the sequences should have and I can quickly whip something up in MATLAB.

Hi,

I am not sure you realize what you are implying here. You are implying that there is a systematic bias to a random number generator. This is because a run of all 1's, although it could be long, eventually ends because the probability of any one flip is 0.5 as mentioned.
One consequence of what you suggest is that the probability of calculating the number pi based on the 2 dimensional probability of outcomes limited to a square surface landing within a circle inside the square would usually never come out to the number pi when it always does and gets better and better as the number of samples goes higher and higher. Your implication would never be able to calculate pi because the ratio of those hits landing outside the circle would not be in the right proportion to the number of hits landing inside the circle.

But let's turn this into a practical real life test where we both have something to gain or lose.
I will bet 100 dollars on one outcome and you will bet 100 dollars on another outcome. Now if i am right, i win 100, if you are right, you win 100, and if we are both right (which is possible after all) then we break even.

The bets are placed as follows:
You: you bet that after 10000 flips no head comes up.
Me: I bet that after 10000 flips at least one head comes up.

Now it is certainly TRUE, that you could, possible, win and i can possible loose.
But in reality, who do you think will really win?
We can probably put a number to this too but i'll hold off on that.

In the mean time, here are three different tests, The last test is the best because it shows the frequencies of 1, 2, or 3 coming up as well as the average and the error from what we expect to be the average.

This is the last test but i am showing that first. This is typical but see the others too.
Note the frequencies are shown under respective columns of 1, 2, or 3.
RandomAverage_20190403-3.gif
 

Attachments

Last edited:

WBahn

Joined Mar 31, 2012
32,833
Hi,

I am not sure you realize what you are implying here. You are implying that there is a systematic bias to a random number generator. This is because a run of all 1's, although it could be long, eventually ends because the probability of any one flip is 0.5 as mentioned.
One consequence of what you suggest is that the probability of calculating the number pi based on the 2 dimensional probability of outcomes limited to a square surface landing within a circle inside the square would usually never come out to the number pi when it always does and gets better and better as the number of samples goes higher and higher. Your implication would never be able to calculate pi because the ratio of those hits landing outside the circle would not be in the right proportion to the number of hits landing inside the circle.
How can something that is a rational number EVER come out to be pi, which is provably irrational?

But let's turn this into a practical real life test where we both have something to gain or lose.
I will bet 100 dollars on one outcome and you will bet 100 dollars on another outcome. Now if i am right, i win 100, if you are right, you win 100, and if we are both right (which is possible after all) then we break even.

The bets are placed as follows:
You: you bet that after 10000 flips no head comes up.
Me: I bet that after 10000 flips at least one head comes up.

Now it is certainly TRUE, that you could, possible, win and i can possible loose.
But in reality, who do you think will really win?
We can probably put a number to this too but i'll hold off on that.
How is it possible that you can both be right?

In order for him to be right, there can me NO heads that come up while in order for you to be right there must be AT LEAST ONE head that comes up. Aren't those mutually exclusive outcomes?

How about another bet?

Using the best PRNG you can find (or RNG if you have one), flip a coin 1000 times. If the results are 500 heads and 500 tails, I will pay you $20 but if they are not you pay me just $1. Do this for 1000 rounds.

If you don't like those odds, and since you insist that the results will get closer to 50% for higher numbers of rolls, make it 1,000,000 flips. If the results are 500,000 heads and 500,000 tails I will pay you $200 but if they are not you pay me just $1. Again, do this for 1000 rounds. Consider the fact that: (1) you could win at much as $200,000; (2) the most you risk losing is $1000; and (3) if you are right just five times out of the thousand rounds you come out ahead. Would you take this bet?

DISCLAIMER: We are talking about a hypothetical bet here, not an actual offer. I doubt I would be willing to risk $200,000 no matter how much the odds were in my favor and no matter how much the payout might be. I'm trying to think of the numbers where that would change and, while I'm sure they exist, I'm having a hard time thinking of them. Would I risk $200,000 for the chance to win a million dollars if the odds were a million to one in my favor? My first reaction is yes, but I really don't know if I could bring myself to make that bet if it were actually offered. But I might be talked into actually taking the first bet (the one with 1000 flips and 20:1 payout).
 

bogosort

Joined Sep 24, 2011
696
I feel like I'm in the Twilight Zone here. You quoted my post but it doesn't appear that you actually read it.

Everyone already knows that the sequence mean will approach the distribution mean as N → ∞. That was never in question. I asked you to produce ten random sequences of any length you desire and show how many pass your test of randomness. I admit, it was a rhetorical request, as I already know the outcome: few, if any, will pass.

But, come to think of it, you seem to have unintentionally satisfied my request. Look at your list of 8 random sequences, each an order of magnitude larger than the previous. Do you agree that none of them pass your test for randomness? Do you agree that your test would have rejected all 8 for being non random? Can you take the extra leap and see that any sequence that passes your test is very likely not random?
 

WBahn

Joined Mar 31, 2012
32,833
I feel like I'm in the Twilight Zone here. You quoted my post but it doesn't appear that you actually read it.

Everyone already knows that the sequence mean will approach the distribution mean as N → ∞. That was never in question. I asked you to produce ten random sequences of any length you desire and show how many pass your test of randomness. I admit, it was a rhetorical request, as I already know the outcome: few, if any, will pass.

But, come to think of it, you seem to have unintentionally satisfied my request. Look at your list of 8 random sequences, each an order of magnitude larger than the previous. Do you agree that none of them pass your test for randomness? Do you agree that your test would have rejected all 8 for being non random? Can you take the extra leap and see that any sequence that passes your test is very likely not random?
I don't know that I would go so far as to say that a really long sequence that had exactly 50% heads and 50% tails is very likely not random. It is, after all, the single most likely outcome and has a higher probability than whatever other sequence we actually end up seeing. Further, we could make the same claim about whatever sequence we actually get from a true random process -- the probability that it came out to be exactly that sequence (or, more accurately, that fraction of heads to tails) is very small.

Now, if you ran that test again on a second sequence generated by that same process and it also came up exactly 50% heads and 50% tails, NOW I'm going to be a lot more willing to declare the process to be nonrandom.
 

bogosort

Joined Sep 24, 2011
696
Using the best PRNG you can find (or RNG if you have one), flip a coin 1000 times. If the results are 500 heads and 500 tails, I will pay you $20 but if they are not you pay me just $1. Do this for 1000 rounds.
I'm willing to give him 25:1 on the payout. :)
 

MrAl

Joined Jun 17, 2014
13,704
How can something that is a rational number EVER come out to be pi, which is provably irrational?
Not sure why you are asking this question. Who said that? Wasnt it implied that it would be an approximation to pi that with proper procedure would get closer and closer to pi? That is, more digits. As the 2d samples are produced the number gets closer and closer to pi (actually if multiplied by 4). If it does not, then the process cant be random because the distribution would be systematic with some shape function.

How is it possible that you can both be right?

In order for him to be right, there can me NO heads that come up while in order for you to be right there must be AT LEAST ONE head that comes up. Aren't those mutually exclusive outcomes?
Only if it is performed once. If it is performed twice, we can both win. I would have suggested 10000 experiments for that example.

How about another bet?

Using the best PRNG you can find (or RNG if you have one), flip a coin 1000 times. If the results are 500 heads and 500 tails, I will pay you $20 but if they are not you pay me just $1. Do this for 1000 rounds.
Why would i do that? For such a small sample set it wont be 50/50 right?

If you don't like those odds, and since you insist that the results will get closer to 50% for higher numbers of rolls, make it 1,000,000 flips. If the results are 500,000 heads and 500,000 tails I will pay you $200 but if they are not you pay me just $1. Again, do this for 1000 rounds. Consider the fact that: (1) you could win at much as $200,000; (2) the most you risk losing is $1000; and (3) if you are right just five times out of the thousand rounds you come out ahead. Would you take this bet?
Again, why for such a small sample set? I never said that 100 tries are going to show perfect results, i only said that an infinite number of samples would produce a perfect result. My own test data shows that even 100 million samples does not show perfect results. That's still too small of a sample set but even there you can see the average starting to settle on the mean of all the samples, which was 2.

DISCLAIMER: We are talking about a hypothetical bet here, not an actual offer. I doubt I would be willing to risk $200,000 no matter how much the odds were in my favor and no matter how much the payout might be. I'm trying to think of the numbers where that would change and, while I'm sure they exist, I'm having a hard time thinking of them. Would I risk $200,000 for the chance to win a million dollars if the odds were a million to one in my favor? My first reaction is yes, but I really don't know if I could bring myself to make that bet if it were actually offered. But I might be talked into actually taking the first bet (the one with 1000 flips and 20:1 payout).
[/quote]


Let me also restate what this test shows.
1. It shows that as the number of samples increases the average approaches the mean of all the samples.
2. If the test does show that, it may be a random process.
3. If the test fails, it's not a random process, although it should be performed with a very large number of samples.
 

MrAl

Joined Jun 17, 2014
13,704
I feel like I'm in the Twilight Zone here. You quoted my post but it doesn't appear that you actually read it.

Everyone already knows that the sequence mean will approach the distribution mean as N → ∞. That was never in question. I asked you to produce ten random sequences of any length you desire and show how many pass your test of randomness. I admit, it was a rhetorical request, as I already know the outcome: few, if any, will pass.
If you agree to what i am saying then why do you want a friggin set of sequences? I doubt i could post sequences that are 100 million samples long.

But, come to think of it, you seem to have unintentionally satisfied my request. Look at your list of 8 random sequences, each an order of magnitude larger than the previous. Do you agree that none of them pass your test for randomness? Do you agree that your test would have rejected all 8 for being non random? Can you take the extra leap and see that any sequence that passes your test is very likely not random?
What the heck are you talking about?! ALL 8 *IS* ONE TEST. I cant help but once again you are seeing the trees but missing the forest. I hate that cliche but i dont know what else to say.

Also you are aware that this test shows when a sequence is NOT random right? That's the only definitive result. Everything else merely suggests that it is random.
 

WBahn

Joined Mar 31, 2012
32,833
Again, why for such a small sample set? I never said that 100 tries are going to show perfect results, i only said that an infinite number of samples would produce a perfect result. My own test data shows that even 100 million samples does not show perfect results. That's still too small of a sample set but even there you can see the average starting to settle on the mean of all the samples, which was 2.
Okay, so let's pick a number of samples that perhaps you would consider large enough that the odds of getting a perfect result should be pretty high.

How about 2 x 10^100 (there's something like 10^80 fundamental particles in the known universe).

If we could actually perform this test and if we had a known, ideal, truly random process to perform each flip and if I were to pay you one billion dollars if the results came out 10^100 heads and 10^100 tails while you paid me just one penny if it didn't and we agreed to play this one trillion times, would you take the bet?

If not, if this still isn't a large enough number of flips for you to be confident that the results will come out exactly 50/50 at least one time in a hundred billion, then what greater (but finite) number of flips in each round would it take?
 

WBahn

Joined Mar 31, 2012
32,833
Also you are aware that this test shows when a sequence is NOT random right? That's the only definitive result. Everything else merely suggests that it is random.
So if someone were to run your test and want to use it to show that a sequence is definitively not random, how would you tell them to interpret the results so as to make that determination?
 

bogosort

Joined Sep 24, 2011
696
What the heck are you talking about?! ALL 8 *IS* ONE TEST.
Sweet lord. Let's make this as explicit as possible, because we have apparently been talking about two different categories of tests: one for testing the randomness of a sequence, another for testing the randomness of a generator. In the former, we are asking "Is this particular sequence of numbers randomly chosen from a uniform distribution?" That is, we are looking at a single sequence. In the latter, we are asking "Does this generator produce random sequences?"; in this case, we are looking at ensembles of sequences.

I thought I was very clear that we were talking about the former, single-sequence case (I specifically mentioned "no ensembles") because the OP wondered about the randomness of a billion-digit sequence. But apparently you've been proposing your test as a way to identify a satisfactory PRNG. Is this the case? If so, wow that's frustrating. What a waste of time.
 

djsfantasi

Joined Apr 11, 2010
9,237
SMH

First, the statement that in any test that is a repetition of two possible events (coin flip) each with a probability of occurring 50% of the time and such set contains 2*N samples will result in an equal number of both occurrences is in fact false.

This can be generalized to other proposed scenarios. As far as MrAl’s test, I’d have to calculate the confidence interval and use that to calculate expected value before taking in his bet.

What the statement should be is that the limit will approach 50% as the sample size approached 2*∞.

People are arguing both sides. One is assuming a finite set; the other an infinite set. So, in order to clarify your posts, maybe we can take sides and form teams. We can call team A the “Bounded Ones” and team B are the “Unbounded Team”. That way we can identify who to root for and who to boo! :confused:;):D:rolleyes:
 

WBahn

Joined Mar 31, 2012
32,833
SMH

First, the statement that in any test that is a repetition of two possible events (coin flip) each with a probability of occurring 50% of the time and such set contains 2*N samples will result in an equal number of both occurrences is in fact false.

This can be generalized to other proposed scenarios. As far as MrAl’s test, I’d have to calculate the confidence interval and use that to calculate expected value before taking in his bet.

What the statement should be is that the limit will approach 50% as the sample size approached 2*∞.

People are arguing both sides. One is assuming a finite set; the other an infinite set. So, in order to clarify your posts, maybe we can take sides and form teams. We can call team A the “Bounded Ones” and team B are the “Unbounded Team”. That way we can identify who to root for and who to boo! :confused:;):D:rolleyes:
As the number of flips goes to infinity, the probability that the outcome is within epsilon of 50% approaches unity. At the same time, the probability that the outcome IS 50% approaches zero. These two results are NOT in conflict with each other.

The same non-intuitive result is true for any continuous variable. For instance, the probability that the length of a piece of paper is exactly one meter is identically zero.
 

MrAl

Joined Jun 17, 2014
13,704
Okay, so let's pick a number of samples that perhaps you would consider large enough that the odds of getting a perfect result should be pretty high.

How about 2 x 10^100 (there's something like 10^80 fundamental particles in the known universe).

If we could actually perform this test and if we had a known, ideal, truly random process to perform each flip and if I were to pay you one billion dollars if the results came out 10^100 heads and 10^100 tails while you paid me just one penny if it didn't and we agreed to play this one trillion times, would you take the bet?

If not, if this still isn't a large enough number of flips for you to be confident that the results will come out exactly 50/50 at least one time in a hundred billion, then what greater (but finite) number of flips in each round would it take?
Hi,

Apparently there are a number of misunderstandings going on in this discussion.
For one, you dont seem to get that infinity is NOT a large number. The only time a large number can represent infinity is when the result of some calculation drops off fast not when it drops off gradually. I think this current test shows a gradual drop off not an exopnential drop off so the only thing that probably works is infinity and nothing short of that. Not 10^10, not 10^100, not 10^1000, not a google of zeros after a 1.
So there is no finite number you can choose that will work unless we can show that the error drops off exponentially because if it does not then there will always be an error until the theoretical infinity is reached, and that of course is not possible except in theory.
It has also been suggested in this thread that it might take twice infinity, if that is even possible.
So the only bet is that it happens when the number of samples approaches infinity or possibly 2 times infinity. However, since 2 times infinity is still infinity im not sure if we have to actually state 2*inf.

You seem to be suggesting that with a finite number of samples we will some day see 50/50 if we perform enough tests. Ok so then pick 2 and repeat. You'll eventually see one heads and one tails. That's 50/50.
 

MrAl

Joined Jun 17, 2014
13,704
So if someone were to run your test and want to use it to show that a sequence is definitively not random, how would you tell them to interpret the results so as to make that determination?
Hi,

Well if the error does not decrease overall as the number of samples increases.
Note we might see errors such as:
8,7,6,5,6,4,3,2,1
but you can see that even though we went one time from 5 up to 6 (not decreasing) it eventually decreases.

In the case of the 1,2,3 experiment, we expect to see 2 as average, but if we see the result start to settle at say 2.1, we know it cant be random. So in that case the error will never go below 0.1 after a certain N.

The reason i brought up the circle and square test is because it is a physical thing we can imagine and since area is the key we get an intuitive idea what ia happening right away as we see the areas fill up with hits. If there was some systematic deviation the areas would not fill up the way they would if pure random. Imagine a triangular shape function. That would not distribute over the square properly because much of the square would not be covered after a large number of 2d samples. A non systematic distribution of samples would cover the square uniformly. Hence one way we get pi and the other way we get something very different than pi.
 
Last edited:

MrAl

Joined Jun 17, 2014
13,704
SMH

First, the statement that in any test that is a repetition of two possible events (coin flip) each with a probability of occurring 50% of the time and such set contains 2*N samples will result in an equal number of both occurrences is in fact false.

This can be generalized to other proposed scenarios. As far as MrAl’s test, I’d have to calculate the confidence interval and use that to calculate expected value before taking in his bet.

What the statement should be is that the limit will approach 50% as the sample size approached 2*∞.

People are arguing both sides. One is assuming a finite set; the other an infinite set. So, in order to clarify your posts, maybe we can take sides and form teams. We can call team A the “Bounded Ones” and team B are the “Unbounded Team”. That way we can identify who to root for and who to boo! :confused:;):D:rolleyes:
Hi,

I think i agree except do we really need to state 2*inf because that's just infinity right?
 

MrAl

Joined Jun 17, 2014
13,704
As the number of flips goes to infinity, the probability that the outcome is within epsilon of 50% approaches unity. At the same time, the probability that the outcome IS 50% approaches zero. These two results are NOT in conflict with each other.

The same non-intuitive result is true for any continuous variable. For instance, the probability that the length of a piece of paper is exactly one meter is identically zero.
Hi,

How do you know that epsilon never goes to zero when N goes to infinity?
Of course when we have the constraint of just two states we have to enforce N to be an even number, but then if N is assumed somehow odd then as N goes to infinity it may not matter even though we might think that we have 1 more heads than tails because infinity+1 is still infinity.
 

MrAl

Joined Jun 17, 2014
13,704
Sweet lord. Let's make this as explicit as possible, because we have apparently been talking about two different categories of tests: one for testing the randomness of a sequence, another for testing the randomness of a generator. In the former, we are asking "Is this particular sequence of numbers randomly chosen from a uniform distribution?" That is, we are looking at a single sequence. In the latter, we are asking "Does this generator produce random sequences?"; in this case, we are looking at ensembles of sequences.

I thought I was very clear that we were talking about the former, single-sequence case (I specifically mentioned "no ensembles") because the OP wondered about the randomness of a billion-digit sequence. But apparently you've been proposing your test as a way to identify a satisfactory PRNG. Is this the case? If so, wow that's frustrating. What a waste of time.
Hi,

Not sure what you are talking about.

When the number of samples increases, the error goes down. What is so hard to understand there?
For one thing you cant even perform such a thing without taking results mid stream because that's the only way you could know if the error decreases. To know if the error decreases, you have to calculate the error over and over at least a few times preferably more.
 
Top