Analyzing Random numbers

bogosort · Apr 1, 2019

MrAl said:
Well one interesting example is the digits of pi. They pass every test for true randomness yet in two dimensions we know that pi is the ratio of the circumference of a circle to its diameter. So in one respect the digits are random, yet when we think about it in another way it's perfect and can only be one way.

On the contrary, I'd say that there is nothing random about the digits of pi, as it has tiny Kolmogorov complexity: with a few short words and symbols we can perfectly describe any length sequence from pi, including infinite. A test that certifies a sequence from pi as random is a failed test. In fact, we can easily use induction on sequences from pi to prove that any test for randomness will always give an infinite number of false positives.

The mean of an infinite number of samples of a true random variable is 1/2 of the sum of the two extremes (given every number between is generated also).

I don't know how you calculate the mean or pick the extremes from an infinite sequence, but in the finite case this is almost never true for sequences chosen from a random distribution, whether Gaussian or uniform or whatever. In fact, I'd say that if a sequence has this property, it's a good sign that the sequence is not random.

Ya’akov · Apr 1, 2019

Though it is not precisely addressing this topic, the discussion reminded me of a wonderful video.

In the late 50s and early 60s, the Physical Science Study Committee produced a series of really fantastic films for secondary education. They have some of the best demos I've seen, and they feature distinguished lecturers.

They are all worth watching ("Frames of Reference" is really great) but this one fits here, give it a watch, really good.

nsaspook · Apr 1, 2019

Yaakov said:
Though it is not precisely addressing this topic, the discussion reminded me of a wonderful video.

In the late 50s and early 60s, the Physical Science Study Committee produced a series of really fantastic films for secondary education. They have some of the best demos I've seen, and they feature distinguished lecturers.

They are all worth watching ("Frames of Reference" is really great) but this one fits here, give it a watch, really good.

Yes, I've seen most of them, a great set.
https://archive.org/details/frames_of_reference

Ya’akov · Apr 1, 2019

nsaspook said:
Yes, I've seen most of them, a great set.
https://archive.org/details/frames_of_reference

I love that the scientists actually do the experiments, and they are of such high quality.

I think they'd be great for use today, with commentary about what we've learned since then.

wayneh · Apr 1, 2019

Yaakov said:
In the late 50s and early 60s, the Physical Science Study Committee produced a series of really fantastic films for secondary education. They have some of the best demos I've seen, and they feature distinguished lecturers.

The existence of really great lectures on various topics is what got me thinking about the future of education. I see no reason to make students suffer through sub-par lectures when great presentations of the topic are already in the can.

At the U. of Chicago when I was there, all classes and professors received a rating at the end of every quarter. There were maybe 30 questions about the time spent, the value of the information, the quality of the professor and so on. And believe me, the 5-star professors were head-and-shoulders above the 3-star ones. The 2-star ones got culled.

From that experience on, it seems silly to me to make students sit through a 3-star lecturer when instead you could easily video the 5-star lecturer and be done with it. If the topic evolves a few years later, you refresh and remake it.

Some may say it's better to have a live lecturer. I call BS. A lecture hall environment, where you interact with other students and perhaps a teacher, has value but there's no reason the lecture itself can't be canned.

Ya’akov · Apr 1, 2019

Some of my colleagues recorded their lectures then used the lecture time as recitation instead. This was far better and resulted in much more actual instruction.

MrAl · Apr 2, 2019

bogosort said:
On the contrary, I'd say that there is nothing random about the digits of pi, as it has tiny Kolmogorov complexity: with a few short words and symbols we can perfectly describe any length sequence from pi, including infinite. A test that certifies a sequence from pi as random is a failed test. In fact, we can easily use induction on sequences from pi to prove that any test for randomness will always give an infinite number of false positives.

I don't know how you calculate the mean or pick the extremes from an infinite sequence, but in the finite case this is almost never true for sequences chosen from a random distribution, whether Gaussian or uniform or whatever. In fact, I'd say that if a sequence has this property, it's a good sign that the sequence is not random.

Hi,

Ok do some searches on the web.
If you dont know how to take limits then learn

We are talking random as in random between two limits. If there is equal probability for all samples then the average comes out to one half.
In theory flipping a coin is 50-50 so if we call heads 1 and tails 0 then the average is 1/2. I dont see how you can get around this but feel free to try.

bogosort · Apr 2, 2019

MrAl said:
Ok do some searches on the web.
If you dont know how to take limits then learn

Sigh.

We are talking random as in random between two limits. If there is equal probability for all samples then the average comes out to one half.

Again, this isn't true. You were talking about certifying a sequence as random or not. Well, flip a fair coin 100 times, associating 1 to heads and 0 to tails. You now have a sequence generated by a random variable chosen from a uniform distribution. But if you calculate the mean value, there's a 92% chance that it will not be 0.5. So, your test for randomness fails most of the time. The longer the sequence, the worse your test performs.

Worse still, your test for randomness will certify an infinite number of decidedly not random sequences, namely, every sequence of consecutive positive integers. I don't know about you, but {1, 2, 3, 4, 5, ..., 999, 1000} doesn't look random to me.

xox · Apr 2, 2019

Another useful metric is to observe the degree to which a sample can be compressed. The larger the size of the data and the closer to raw binary (in other words stripped of extraneous formatting and not in some textual output representation such as a list of decimal numbers) the better. I've used this method in the past and confirmed that it agrees with standard statistical tests quite well (Diehard and TestU01 specifically). Algorithms related to arithmetic coding seem to perform the best incidentally.

bogosort · Apr 2, 2019

xox said:
Another useful metric is to observe the degree to which a sample can be compressed.

Indeed, and information compressibility is tied to Kolmogorov complexity. A string of a million 'A's has essentially no complexity, as it can be compressed into a much shorter string, such as 'A{1E6}'. A random string is incompressible, because the shortest description of it is the string itself!

wayneh · Apr 2, 2019

bogosort said:
Indeed, and information compressibility is tied to Kolmogorov complexity. A string of a million 'A's has essentially no complexity, as it can be compressed into a much shorter string, such as 'A{1E6}'. A random string is incompressible, because the shortest description of it is the string itself!

What about the coin flips, though? Every run (a string of heads or tails) could be replaced by the count of the string length. So HHHTTHTTHHH can be represented as 32123.

xox · Apr 2, 2019

MrAl said:
Hi,

Ok do some searches on the web.
If you dont know how to take limits then learn

We are talking random as in random between two limits. If there is equal probability for all samples then the average comes out to one half.
In theory flipping a coin is 50-50 so if we call heads 1 and tails 0 then the average is 1/2. I dont see how you can get around this but feel free to try.

The easiest way to understand it is to view the data in terms of combinatorics. Suppose you were to flip a coin 4 times. The 16 possible outcomes are:

HHHH (4 H's)
HHHT (3)
HHTH (3)
HHTT (2)
HTHH (3)
HTHT (2)
HTTH (2)
HTTT (1)
THHH (3)
THHT (2)
THTH (2)
THTT (1)
TTHH (2)
TTHT (1)
TTTH (1)
TTTT (0)

(I generated the above by simply converting the numbers 0 through 15 to binary and then wrote an H for the every 0 and a T for every 1.)

Count the number of heads from all possible permutations and you get:

0 : 1/16
1 : 4/16 = 1/4
2 : 6/16 = 3/8
3 : 4/16 = 1/4
4 : 1/16

So the probability that half of the tosses turn out to be heads is 3/8 = 0.375 = 37.5%.

xox · Apr 2, 2019

wayneh said:
What about the coin flips, though? Every run (a string of heads or tails) could be replaced by the count of the string length. So HHHTTHTTHHH can be represented as 32123.

The string represents 11 bits of data and the number 32123 roughly 15 bits.

bogosort · Apr 2, 2019

wayneh said:
What about the coin flips, though? Every run (a string of heads or tails) could be replaced by the count of the string length. So HHHTTHTTHHH can be represented as 32123.

Sure, there's compression opportunity in any repetition, but that's more a reflection of the 2-valued nature of coin flips -- where, with only two possible values, we'd expect significant repetition -- than its information content. Out of curiosity, I created four files, each with a million ASCII characters. One was a random sequence of 'H" and 'T' characters; the other was all 'H' characters; another was a sequence of 500,000 "HT" characters; and the last was a million simulated rolls of a die. I then used gzip to compress each.

The coin flip file had a 6x compression ratio, while the all-Hs file and the HT file both had a 986x compression ratio (the HT file was only 2 bytes bigger than the all-Hs file). The die roll file was reduced by a factor of only 2.69, reflecting the fact that there's less repetition in rolls of dice than in coin flips.

MrAl · Apr 2, 2019

bogosort said:
Sigh.

Again, this isn't true. You were talking about certifying a sequence as random or not. Well, flip a fair coin 100 times, associating 1 to heads and 0 to tails. You now have a sequence generated by a random variable chosen from a uniform distribution. But if you calculate the mean value, there's a 92% chance that it will not be 0.5. So, your test for randomness fails most of the time. The longer the sequence, the worse your test performs.

Worse still, your test for randomness will certify an infinite number of decidedly not random sequences, namely, every sequence of consecutive positive integers. I don't know about you, but {1, 2, 3, 4, 5, ..., 999, 1000} doesn't look random to me.

Hi again,

From your recent post the only thing left i can surmise is that you never actually did any testing you just talked about it.
The only thing i have left to say then is do some testing using a RNG and sum the results and compute the average.
What will you see? Can you tell me without doing it first?
Let the number of samples go higher and higher. What can you generalize about the results?
I'd like to hear your theory on this really before you do any testing, but from what you were saying so far you wont be able to come out with the right results unless i misunderstood you somehow.

xox · Apr 2, 2019

MrAl said:
Hi again,

From your recent post the only thing left i can surmise is that you never actually did any testing you just talked about it.
The only thing i have left to say then is do some testing using a RNG and sum the results and compute the average.
What will you see? Can you tell me without doing it first?
Let the number of samples go higher and higher. What can you generalize about the results?
I'd like to hear your theory on this really before you do any testing, but from what you were saying so far you wont be able to come out with the right results unless i misunderstood you somehow.

Not sure what I was thinking either. You're right, the limit obviously has to approach 1/2 eventually.

WBahn · Apr 2, 2019

xox said:
Not sure what I was thinking either. You're right, the limit obviously has to approach 1/2 eventually.

The details are in the fine print.

The odds that it will be within an arbitrarily small range centered on 1/2 will approach unity.

The odds that it will BE 1/2 will approach zero.

xox · Apr 2, 2019

WBahn said:
The details are in the fine print.

The odds that it will be within an arbitrarily small range centered on 1/2 will approach unity.

The odds that it will BE 1/2 will approach zero.

Well naturally, "probability" is really a misnomer. A better name for it would have been "the laws of possibility".

bogosort · Apr 2, 2019

MrAl said:
From your recent post the only thing left i can surmise is that you never actually did any testing you just talked about it.
The only thing i have left to say then is do some testing using a RNG and sum the results and compute the average.
What will you see? Can you tell me without doing it first?
Let the number of samples go higher and higher. What can you generalize about the results?
I'd like to hear your theory on this really before you do any testing, but from what you were saying so far you wont be able to come out with the right results unless i misunderstood you somehow.

In order for N tosses of a fair coin to have a mean of exactly 0.5, you must have N/2 heads. This rules out any sequence with an odd number of tosses. In the case of an even number of tosses, the probability of getting exactly N/2 heads is

nCr(N, N/2) / 2^N

As 2^N grows faster than nCr(N, N/2), the probability of any particular sequence having exactly N/2 heads diminishes as the length of the sequence grows. Indeed, the probability approaches zero as N → ∞.

Recall that you were talking about a practical test for determining whether a given (finite) sequence is random. I've given you "my theory" as to why it is a terrible test for randomness; if you disagree, please explain how my analysis fails.

WBahn · Apr 2, 2019

bogosort said:
Sure, there's compression opportunity in any repetition, but that's more a reflection of the 2-valued nature of coin flips -- where, with only two possible values, we'd expect significant repetition -- than its information content. Out of curiosity, I created four files, each with a million ASCII characters. One was a random sequence of 'H" and 'T' characters; the other was all 'H' characters; another was a sequence of 500,000 "HT" characters; and the last was a million simulated rolls of a die. I then used gzip to compress each.

The coin flip file had a 6x compression ratio, while the all-Hs file and the HT file both had a 986x compression ratio (the HT file was only 2 bytes bigger than the all-Hs file). The die roll file was reduced by a factor of only 2.69, reflecting the fact that there's less repetition in rolls of dice than in coin flips.

This also says something about the gzip compression algorithm.

If you have a file consisting solely of an 8-bit representation for 'H' and an eight bit representation for 'T', then this can trivially be compressed by a factor of 8 (or arbitrarily close thereto) by simply encoding each of the two possibilities in a single bit (combined with a tiny amount of header for the dictionary needed to recover the original file).

This is in no way a criticism of gzip; all compression algorithms are designed with a certain type of data in mind and generally do far from optimal on other types of data. In fact, there exist data sets for any given compression algorithm for which the compressed data actually takes up MORE space than the uncompressed data. In general, those data sets are very much NOT random, so lack of compression alone is not a reliable indicator or randomness -- it can only hint at the possibility.

So just because one compression algorithm does very poorly on a particular data set does not really say much about whether that data set is truly random; it could simply be a data set that is too poor a match for the kind of data that the algorithm was designed for. What we CAN say is that if we DO get any significant compression (and we could probably put a threshold on it), then we have high confidence that the data is NOT truly random since all compression algorithms exploit nonrandomness in the data to achieve compression. But if we don't get any significant compression, about all we can say is that we haven't ruled out the possibility that the data is truly random and, perhaps, we have some reason to suspect that it might be.

If we try lots of different compression algorithms and none of them show any significant compression, then our suspicion that it's truly random will grow, but we can't escape the fact that there are infinitely more "types" of data than there are compression algorithms and so it is highly likely that many demonstrably non-truly random data sets will be poor matches for every compression algorithm yet devised, even if some of those algorithms are specifically designed to compress random-like data sets (though that would certainly be the one whose results would most bolster our opinions).

Thread starter	Similar threads	Forum	Replies	Date
Y	analyzing current limiting switch circuit	Analog & Mixed-Signal Design	34	Feb 5, 2026
Y	analyzing pulse generating circuit	Analog & Mixed-Signal Design	19	Jan 17, 2026
	Problem analyzing a resistor network	Homework Help	19	Oct 23, 2024
C	Need Help Analyzing Silicon Structures in a IC	IC Design	1	May 7, 2024
K	Analyzing Output [ SOLVED ]	Programming & Languages	24	Jan 18, 2024

Analyzing Random numbers

Join our Engineering Community! Sign-in with:

Analyzing Random numbers

bogosort

Ya’akov

nsaspook

Ya’akov

wayneh

Ya’akov

MrAl

bogosort

xox

bogosort

wayneh

xox

xox

bogosort

MrAl

xox

WBahn

xox

bogosort

WBahn

You May Also Like

Novosense’s Isolated CAN Transceiver Offers +/-70 V Bus Fault Protection

Congatec Module Adds NPU-Based AI to Entry-Level x86 Embedded Designs

3 Pocket-Sized Photonics Innovations Power MedTech to Atomic Clocks

More Control, Less Chaos With Microchip’s New All-in-One DSCs