Analyzing Random numbers

Papabravo

Joined Feb 24, 2006
21,225
Why do you care if the distributions of the two populations are the same or different? Having different distributions or the same distributions says nothing about randomness. If what you are driving at is:
How random is a uniform distribution of digits where the occurrence of each digit is equiprobable? If the sequence is generated by an algorithm then it is not random. If you have a natural process that satisfies the equiprobable condition then it is random.
 
Why do you care if the distributions of the two populations are the same or different? Having different distributions or the same distributions says nothing about randomness. If what you are driving at is:
How random is a uniform distribution of digits where the occurrence of each digit is equiprobable? If the sequence is generated by an algorithm then it is not random. If you have a natural process that satisfies the equiprobable condition then it is random.
What I am getting at is integral to the purpose of the thread as I read it (i.e., discuss how you can test a PRNG) and specifically your earlier comment:
The tough question is: "What characteristics do you propose for a sequence that is random?"
When you say "If you have a natural process that satisfies the equiprobable condition then it is random" you are, in my opinion, simply talking around the problem of showing that it is an "equiprobable". You have to be able to describe what "equiprobable" means, which I have done in my first response.

Now, you have to test that it is "equiprobable", otherwise, I must simply trust that you know that a natural process of some kind or another (decay is usually used) is "equiprobable" - and I don't.

That is what testing for randomness is all about as I see it. Further, what I was driving at was a discussion of the characteristics of a theoretical RNG and testing a PRNG.
 
I would argue that the example I gave of using a t-test can be used to help characterize a "good" PRNG. It is not, however, sufficient.

Here is some code that I wrote some 28 years ago for an 8088. It is a standard shift register approach and I certainly did not come up with that:
Code:
;---------------------------------------------------------------------
;  PRNG31.ASM
;  Demonstration of a PseudoRandom Number Generator
;  Uses a 31 stage software shift register
;---------------------------------------------------------------------
; The 31 bit shift register is represented by the data in [shift_reg].
; Seed these bits with a constant initial value to generate repeatable
; sequences.  Alternatively, you can use other sources such as the BIOS
; real time clock 'ticks' counter for the seed.  Any non-zero seed
; will work.
; (see the READ.ME file for more information).
;---------------------------------------------------------------------

  IDEAL
;(We used TASM /la /w /ml and TLINK /T /s)
  DOSSEG
  MODEL  TINY
  DATASEG
signon  db  0dh,0ah,'  Press any key to generate eight bit '
  db  'pseudorandom numbers  [CTRL-C to quit].'
  db  0dh,0ah,0ah,'$'
number  db  '000 $'  ;display string
;The 31 bit shift register.
shift_reg  dd  5a5aa55ah  ;a nice seed
  CODESEG
  ORG  0100h
start:
  mov  ax,02h  ;clear the screen
  int  10h
hello:
  mov  dx,offset signon  ;display message
  mov  ah,09h
  int  21h
;Wait for a keypress and leave if it's CTRL-C.
get_key:
  mov  ah,07h
  int  21h
  cmp  al,03h  ;ctrl-c pressed?
  jz  exit  ;then quit
  mov  di,1a4h  ;21 lines of numbers
get_bits:
;Get eight pseudorandom bits into al.
  mov  cx,08h  ;bit counter
  xor  ax,ax  ;initial to 0
gen_num:
;Determine the result of (bit three xor bit six) of the high
; byte of the shift register and put the result in the carry.
  mov  bx,[WORD HIGH shift_reg]
  and  bh,048h  ;08h or 40h is result=1 else 0
  cmp  bh,08h
  jz  xor1
  cmp  bh,40h
  jz  xor1
  clc  ;xor result=0 so clear carry
  jmp  short shift_bits
xor1:
  stc  ;xor result=1 so set carry
;Rotate the shift register with the cf becoming the lsb.
shift_bits:
  rcl  [WORD LOW shift_reg],1
  rcl  [WORD HIGH shift_reg],1
;The new cf gets rotated into our pseudorandom number.
  rcl  al,1
  loop  gen_num  ;get all eight bits
;Convert al to a leading zero ascii string.
  mov  si,offset number+2  ;lsd of string
  mov  cx,03h  ;3 digits to convert
  mov  dl,0ah  ;divisor
bin2asc:
  xor  ah,ah  ;clear remainder
  div  dl  ;get a remainder
  add  ah,30h  ;make it ascii
  mov  [si],ah  ;store the digit
  dec  si  ;bump the point
  loop  bin2asc
;Send ascii string to standard output device (display).
  mov  dx,offset number
  mov  ah,09h
  int  21h
  dec  di
  jnz  get_bits  ;do another number
  jmp  short hello  ;do another screen
;Webedone.
exit:
  mov  ah,04ch
  int  21h  ;back2dos
  END start
It will satisfy perfectly the S1,S2 t-test that I mentioned earlier. It is, however, weak to useless when you consider the seed. Same seed, same sequence.

So, in addition to the characteristic that a t-test looks at, we can add the characteristic that it should not be predictable.
 

wayneh

Joined Sep 9, 2010
17,498
I come at this from a background in statistics. A crucial and often overlooked part of model building is the examination of the residuals after the model has been fit to a data set. Ideally, the residuals should be nothing more than random error. So you test for normality of distribution, kurtosis and skew. You plot them against every conceivable input variable to verify there are no correlations left. You count zero crossings to check for runs. I’m sure there are a few more tests I can’t recall, like checking for the frequency of certain digits to identify “fake” values.

Only after you’ve passed these standard tests can you begin to think you’ve achieved randomness. Then you have to start plotting in multiple dimensions to look for patterns the simple tests may have missed.
 
I would also put forth the idea that there are no truly random processes, period. If one believes in an ordered universe, that is, one believes that here is causation in all events. It follows, in the opinion of some folks, that truly random processes simply can not exist. There is a reason for the rate of decay of some radioactive junk, regardless of whether or not we understand the process further.

I do think that we can all agree that it is certainly useful to believe that we can have some good PRNGs.
 
I come at this from a background in statistics. A crucial and often overlooked part of model building is the examination of the residuals after the model has been fit to a data set. Ideally, the residuals should be nothing more than random error. So you test for normality of distribution, kurtosis and skew. You plot them against every conceivable input variable to verify there are no correlations left. You count zero crossings to check for runs. I’m sure there are a few more tests I can’t recall, like checking for the frequency of certain digits to identify “fake” values.

Only after you’ve passed these standard tests can you begin to think you’ve achieved randomness. Then you have to start plotting in multiple dimensions to look for patterns the simple tests may have missed.
I understand exactly what you are talking about. That has much to do about assumptions of certain testing procedures. I also have a background in inferential statistics and, more than once have had to test the residual in mixed model ANOVAs to satisfy assumptions of normality (even though ANOVA is well known to be robust to violations in so many cases). I have also demanded that others do similar.

I just did not want to go there straight away.

Edit to add: Plus I have to go see what happened in the tournament - some would say that my bracket represents a truly random process. :)
 

nsaspook

Joined Aug 27, 2009
13,272
I would also put forth the idea that there are no truly random processes, period. If one believes in an ordered universe, that is, one believes that here is causation in all events. It follows, in the opinion of some folks, that truly random processes simply can not exist. There is a reason for the rate of decay of some radioactive junk, regardless of whether or not we understand the process further.

I do think that we can all agree that it is certainly useful to believe that we can have some good PRNGs.
You can put forth that idea of superdeterminism but it's not well founded in current science or experimental evidence.
https://en.wikipedia.org/wiki/Superdeterminism
In the 1980s, John Bell discussed superdeterminism in a BBC interview:[2][3]

There is a way to escape the inference of superluminal speeds and spooky action at a distance. But it involves absolute determinism in the universe, the complete absence of free will. Suppose the world is super-deterministic, with not just inanimate nature running on behind-the-scenes clockwork, but with our behavior, including our belief that we are free to choose to do one experiment rather than another, absolutely predetermined, including the "decision" by the experimenter to carry out one set of measurements rather than another, the difficulty disappears. There is no need for a faster than light signal to tell particle A what measurement has been carried out on particle B, because the universe, including particle A, already "knows" what that measurement, and its outcome, will be.
...
The implications of superdeterminism, if it is true, would bring into question the value of science itself by destroying falsifiability, as Anton Zeilinger has commented:

[W]e always implicitly assume the freedom of the experimentalist... This fundamental assumption is essential to doing science. If this were not true, then, I suggest, it would make no sense at all to ask nature questions in an experiment, since then nature could determine what our questions are, and that could guide our questions such that we arrive at a false picture of nature.[6]
Yes, we have very good PRNGs that are only as good as the truly random seeds for the generator.
 

Thread Starter

Wendy

Joined Mar 24, 2008
23,421
I knew the owner of a pinball machine broker who told me how Bally Manufacturing Co. designed and built slot machines.

Gambling devices are a classic example of a random function generator.
And yet they bias the numbers so statistically the house wins but not too often,
 

402DF855

Joined Feb 9, 2013
271
And yet they bias the numbers so statistically the house wins but not too often,
My guess is they bias the payouts not the elements of game play. Their profits are still "assured".

I once wrote a sudoku puzzle generator and found that after about 800,000 puzzles, the rate of duplicates went up dramatically. So I augmented the standard srand/rand code and reduced/eliminated duplicates.

As an experiment, this morning I generated 10,000 random numbers and tallied the first bit as a proxy for heads and tails. The ratio of # of 1's to the total should be 0.5. I repeated this 10,000 times and wrote the ratio to a file. Then I calculated the FFT of these 10,000 ratios (actually 8,192). I applied this to the standard (Visual Studio) rand(), and then using my augmented code. The enclosed graphs are the result. I don't have the theoretical background to interpret them other than to surmise that the augmented version appears to be obviously "more random".
 

Attachments

Thread Starter

Wendy

Joined Mar 24, 2008
23,421
Actually /2 would be a better measure. I used a quick and dirty method to tun number into a binary equivalent If it is even it is a 0 if it is odd it a 1.
Here is the BASIC algorithm:

10INPUT N
20N=N/2
30 PRINT
30IF N=INT(N) THEN PRINT"0";ELSE PRINT"1";: REM IF n is even print 0 else print 1
40n=INT(INT)IF N >0 THEN 20 : REM THROW AWAY LAST BIT AND START PROCESS FOR NEXT BIT
50 PRINT : END

I really need to learn a new programming language.

/end
 

bogosort

Joined Sep 24, 2011
696
Randomness is all around us.
Randomness is a funny concept. We tend to associate it with predictability, as in the more unpredictable some process is, the more we're likely to call it random. But statistics shows that predictability isn't the thing: casinos make a lot of money predicting how apparently random processes will play out over many trials.

Another example: if we flip 10 fair coins, the probability of flipping the sequence HHTHTHHTTH (where H stands for heads and T for tails) is exactly the same as flipping HHHHHHHHHH, though the former "looks" more random than the latter. Predictability doesn't help here. The best we can do is consider a large number of such events and see if they follow the probability distribution of some other process that we consider random. And while we expect fair coin flips to be a random variable following a uniform distribution, each coin flip can be seen as a complicated yet deterministic event -- there are coin-flipping machines that can be calibrated to always flip heads or tails.

What about the randomness of a single value, as I think you're interested in? Well, as Knuth rhetorically asked, is the number 2 random? But there is a useful way to reason about the "randomness" of a single value, through something called Kolmogorov complexity. It boils down to comparing the size of some value x (as a bit-string) with the smallest possible description of x (e.g., a computer program that can generate x). If the smallest description of x is about the same size as x itself, then we say that x has high complexity; we say that it is effectively random.

For example, I can list a million digits of PI after the 10th digit and get a number that looks quite random. But clearly it is not in any way random as I can describe it in just 9 words! On the other hand, if I get a binary sequence by flipping a million coins, the smallest possible description of it is the number itself. As its size is equal to that of its complexity, we can fairly call it random value.

Anyway, just some food for thought.
 

bogosort

Joined Sep 24, 2011
696
You can put forth that idea of superdeterminism but it's not well founded in current science or experimental evidence.
Note that superdeterminism is not equivalent to believing there is no randomness. Superdeterminism is the much stronger belief that a) there is no such thing as statistical independence (everything is correlated); and b) that every experimental choice ever performed was caused specifically to give the desired outcome (with respect to Bell's theorem).

In other words, one can believe that the universe is deterministic (no randomness) and at the same time not believe in superdeterminism (conspiratorial universe).
 

WBahn

Joined Mar 31, 2012
30,058
And yet they bias the numbers so statistically the house wins but not too often,
They don't bias the random number generator, they bias the game by how they map outcome to winner.

If I have a truly random number generator that picks a card from a 52-card deck and say that if it's black you win and if it's red I win, then the odds are even. But if I say that I win if it's red or if it's an ace, otherwise you win, then I have biased the GAME in my favor, but the random number generator is still the same.
 

WBahn

Joined Mar 31, 2012
30,058
In its simplest form a random generator cane be the simply a one or a zero (Heads/tails). You can check for bias by flipping it 1000 times, If it is truly random you wind up with 50%/50% distribution, you know it is bias free at that point
You actually don't know that at all. If you flipped it 1000 times and it came out 50/50 and then you did that again and it came out 50/50 and you did it a bunch more times and it always came out 50/50, the strongest conclusion would be that the process you are using is anything but random and is somehow forcing the result to end up with equal heads/tails after 1000 flips. If my scratchings are correct, you would expect to see a 50/50 outcome after 1000 flips only 2.5% of the time.

There are lots of tests for randomness. There's the obvious ones in which I look for the gross characteristics to be not too far from the expected values (but how far is too far). I can also perform things like autocorrelations to see if there are periodic patterns that can be detected easily. I can also break the data set into smaller data sets and then measure the distribution of various metrics across those sets. For instance, let's take your coin toss example but make it a million flips. If I go through and sample it into groups of 1000 flips each and find the ratio of heads/tails I can generate lots of data points. I would expect a plot of that data to fairly closely approximate the expected distribution of that metric. I would also expect that different sampling strategies to have no effect on the results -- for instance making sets that only contain even-numbered samples versus odd numbered samples. I can also go through the data and count the number of runs of length 1, 2, 3, and so forth and compare that to the expected results. I can look at the distances between runs of different lengths and compare that. For each of these tests, I can determine the likelihood that a truly random sequence would be further away from the expected results as the observed results are. I can then combine all of these results to come up with a level of confidence that the sequence was produced by a truly random process. I can never actually prove that it was or that it wasn't, only establish the confidence in such a conclusion.
 

Thread Starter

Wendy

Joined Mar 24, 2008
23,421
Random by its nature can be tricky, For a simple coin toss 100 time is enough manual tests for a computer analyses their is no max number. The larger the number of flip the greater chance of 50/50 overall.
 

WBahn

Joined Mar 31, 2012
30,058
Random by its nature can be tricky, For a simple coin toss 100 time is enough manual tests for a computer analyses their is no max number. The larger the number of flip the greater chance of 50/50 overall.
That's simply not true.

First, if there are an odd number of flips, then the chance of getting 50/50 is identically zero, no matter large the number of flips. But let's restrict ourselves to a number of flips that is even.

If I make two flips, then the likelihood of seeing a 50/50 outcome is 50%. On average, half the time I will see an outcome of 50/50 and half the time I won't.

It only goes downhill from there.

If I make four flips, then out of the sixteen possible outcomes, only six of them are 50/50, so 37.5% of the time I will see an outcome of 50/50 and 62.5% of the time I won't.

If I make 1000 flips, then I would expect to see a 50/50 outcome only about 2.5% of the time, while I would expect to not see a 50/50 outcome nearly 97.5% of the time.

The probability of a 50/50 outcome goes down as the sqrt of the number of flips, so if you flip it 100,000 times, you are down to a 0.25% chance of a 50/50 result.
 

jpanhalt

Joined Jan 18, 2008
11,087
About coin flips, Richard Feynman used to demonstrate in his lectures how the result was decidedly non-random. In theory, yes. In practice, no.

I tried, but couldn't find a You Tube of it. He was a very good coin flipper and bongo player.
 
Top