Probability

THE_RB

Joined Feb 11, 2008
5,438
Use the "Binomial Coefficient" to calculate the number of combinations of r objects of n items, where r=2 and n=7.
...
Thanks Djsfantasi (and NorthGuy too). :)

So my chosen method (and Aerb's) was far superior. I was concerned that I might have solved it the "hard" way.


Biggsy100 said:
...
2 - All the balls are returned to the bag and 4 are removed together. What si the probability that both red and green balls are left in the bag?
...
Are you forgetting there are FIVE white balls and SEVEN balls total? :eek:
 

NorthGuy

Joined Jun 28, 2014
611
Are you forgetting there are FIVE white balls and SEVEN balls total? :eek:
To be able to remove 4 white balls from the bag and leave there a green and a red ball, you must have at least 6 balls, but it's also possible with any number of balls above 6 (7 in this case). Of course, if you have more than 6 balls, there will be more balls left in the bag in addition to the greed and red balls (one white in this case).
 

MrAl

Joined Jun 17, 2014
11,494
Hi,

Just to note for the original problem i used a Monty Carlo method. I like to use this idea for problems like this partly just for the fun of it and partly because it mimics exactly the process we would go through and the results we might obtain had we tried to solve this problem experimentally rather than with a prearranged formula.

The results of course get better and better the more times we run the experiment. Here are the results, but be aware these are just typical results not hard and fast constants that will occur for every experiment. Also noteworthy however is that for the larger experiments (higher N) the results start to settle in very very close to the theoretical value and vary by less and less as N increases. Here are some results from N=100 to N=10 million:

-- N, Result
-- 1e2, 25.000
-- 1e3, 19.6
-- 1e4, 21.46
-- 1e5, 20.9995
-- 1e6, 21.0650
-- 10e6, 21.0133

note the result gets close to 21 every time with N=10e6.
 

MrAl

Joined Jun 17, 2014
11,494
That seems like a lot of error at 10e6 to me. I suspect a bit of bias in your RNG...

Hi,

Well the value hovers around the exact theoretical value of 21, within much less than 1 percent, so i think it is reasonable or at least practical anyway. I think that if there was any real bias it would show up as a consistently higher or lower result, which it does not do. For example, another result was 20.99, which is in err by a slightly negative result.
If you'd like to repeat the experiment we can compare notes.

BTW this "21" is obviously the reciprocal of the true result which is "1/21", but showing the reciprocal makes it a little more clear.

The_RB:
Yeah, ha ha.
 

NorthGuy

Joined Jun 28, 2014
611
That seems like a lot of error at 10e6 to me. I suspect a bit of bias in your RNG...
Standard deviation is sqrt(n*p*q) where q is 1-p. With p=1/21 and n = 1e7, it is 673.

MrAl got 21.0133, which translates into 475889 successes, compared to 476190 expected. Deviation is 301, which is quite reasonable. Anything below 1300 would not be statistically significant.
 

joeyd999

Joined Jun 6, 2011
5,287
Standard deviation is sqrt(n*p*q) where q is 1-p. With p=1/21 and n = 1e7, it is 673.

MrAl got 21.0133, which translates into 475889 successes, compared to 476190 expected. Deviation is 301, which is quite reasonable. Anything below 1300 would not be statistically significant.
Thanks for that. I could've done the math, but was otherwise busy (and lazy).

It's funny how intuition and objective reality collide unexpectedly sometimes.
 

MrAl

Joined Jun 17, 2014
11,494
Standard deviation is sqrt(n*p*q) where q is 1-p. With p=1/21 and n = 1e7, it is 673.

MrAl got 21.0133, which translates into 475889 successes, compared to 476190 expected. Deviation is 301, which is quite reasonable. Anything below 1300 would not be statistically significant.
Hi,

Yeah that's a good way of looking at this which puts the errors into proper perspective.

The way i was looking at it was that if we could do an infinite number of tries we would see the error go to zero, which would mean the result would be an exact 21, and since the error for low experiment counts is somewhat high, as we increase the number of experiments done we should see the error decrease.
This is exactly what happens, and in a very organized way. For example, if for some N we get an error of:
22.5
then for ten times more tries we see roughly:
21.15
and then for ten times more tries again we see about:
21.015
and for ten times more again we see:
21.0015

So every time we multiply the number of tries by 10 we see a max_error reduction of about times too. It makes sense that if we had huge number storage ability and a very very large number of tries that we would eventually see 21.0000000000000000000000015 or something like that, which shows that we are seeing a convergence to some constant value. (Note the above examples show the result with positive error only, but it actually goes positive and negative by a small amount over several experiments with the same N).

Just to note, the experiments were run two different ways: one using a sort of organized chaos, and the other way using a method which very closely matches the way the real life experiments would be done, by using an array of numbers 1 though 7 and then shuffling them, then selecting one of the elements of the array randomly, then if that element matched one of the required numbers the array was reduced to only 6 elements by removing the one selected, then randomly selecting a second element and seeing if that matched the second requirement. If a double match is found the success counter is incremented, but if not the counter is not incremented. The average probability is then the number of successes divided by the number of tries.
 

MrAl

Joined Jun 17, 2014
11,494
Hi,

That part of it was not meant to be exact, however that's what it looked like was happening. I'll check it again soon though.
What makes you say it will only decrease by sqrt(N)?
I'd like to look at that too.
 

NorthGuy

Joined Jun 28, 2014
611
That part of it was not meant to be exact, however that's what it looked like was happening. I'll check it again soon though.
What makes you say it will only decrease by sqrt(N)?
I'd like to look at that too.
The standard deviation of the number of successes is sqrt(n*p*q). Since both p and q are fixed, it grows proportional to sqrt(n).

The fraction is count/n, so its error will be sqrt(n*p*q)/n = sqrt(p*q)/sqrt(n). As n grows, it decreases proportional to sqrt(n).
 

MrAl

Joined Jun 17, 2014
11,494
Hi again,

Yes that makes sense. It's interesting though that the error experimentally appears to be random now. It's as though there was some noise in the data and the error depends on where we start rather than how many samples we use. Perhaps the experimental trend is too hard to see.
 

THE_RB

Joined Feb 11, 2008
5,438
Which RNG are you using MrAl?

There are not many software RNGs that will make 10 million samples in base7 (that's approx 40 million bits to generate?) with no bias or patternation.
 
Top