65% of woman agree...

  • Thread starter Deleted member 115935
  • Start date
Status
Not open for further replies.

wayneh

Joined Sep 9, 2010
17,498
To answer the TS's question, if you toss a penny 100 times and record your head/tail ratio.
You'll have to do about 1000 sets of 100 tosses to see that 1 to 2 times out of 1000 sets will get a 67/33 distribution or more.
That’s very far more rare than the TS’s scenario. A test of 100 flips has a SD of 5. So a 67 is 17 above the mean, and that’s 3.4 SD.
 

MrSalts

Joined Apr 2, 2020
2,767
@MrSalts
And your source for how the survey was conducted is? Please fill us in on the details? Moreover, the group was 67, not 100.

Doesn't that number seem a bit odd?
Why would you assume I had a source. It's just math.

if he needs to be spoon fed, then about 19 times out of a 1000 samples of 67 people will result in 43 or more YES answers per 67 (assuming a randomly sampled 50:50 overall population of yes and no answerers from an infinite population)
 
Last edited:

Thread Starter

Deleted member 115935

Joined Dec 31, 1969
0
Thanks MrSalts,

Yes I do need to be "spoon fed" as you so "clearly" put it.

I'm just after an answer ,

As for others asking about sample size, questions asked, et all,

I think we are loosing site of the original question,

At the beginning , I said it started because of the advert, which got me thinking,
but then, to remove 1001 questions, I turned it into a thought experiment,
and as this is a thought experiment, I simplified.

So If I understand @MrSalts right,
if we take 67000 samples we would expect to find "on average" 19 times a sequence of 67 consecutive samples where 43 had Agreed, and 24 had dis agreed.

Thank you for that,
 

MrSalts

Joined Apr 2, 2020
2,767
Thanks MrSalts,

Yes I do need to be "spoon fed" as you so "clearly" put it.

I'm just after an answer ,

As for others asking about sample size, questions asked, et all,

I think we are loosing site of the original question,

At the beginning , I said it started because of the advert, which got me thinking,
but then, to remove 1001 questions, I turned it into a thought experiment,
and as this is a thought experiment, I simplified.

So If I understand @MrSalts right,
if we take 67000 samples we would expect to find "on average" 19 times a sequence of 67 consecutive samples where 43 had Agreed, and 24 had dis agreed.

Thank you for that,
"if we take 67000 samples..."
True as long as you consider them as 1000 blocks of 67 samples. Of those 1000 samples, you'll get 43 or more "agree" 19 times out of the 67000 people asked.

However!
The number of tests meeting your criteria would be higher than 19 if you allowed overlap between blocks of 67 (that is any 67-consecutive interviewees in the 67000. In that case, you actually have (67000 - 66 = 66934) bins of 67 interviewees.
In that case you get many more (1290 groups of 43 "agree" out of the 66934 bins of 67 each). Conveniently, 1290 out of 33934 bins is 1.9% just like 19 out of 1000 is 1.9%

I hope that is clear.
 

Thread Starter

Deleted member 115935

Joined Dec 31, 1969
0
Thank you @MrChips

If I understand you, if we took a continuous stream of samples , and take a rolling window of 67 samples,
in 67000 samples we should have 1290 times that "43 out of 67 agree"
 

Thread Starter

Deleted member 115935

Joined Dec 31, 1969
0
Yes, that is correct.


I think he tagged you Mr by mistake.
wow,
I wonder just how few samples you would have to take to have a 50:50 chance of getting the ratio, it does not seem very many ,
 

Tesla23

Joined May 10, 2009
542
Not sure I agree with the analysis on runs.

The probability of getting >= 43 heads in 67 tosses of a fair coin is 0.01356 (agree with @wayneh), easily confirmed (e.g. Wolfram).

So, if you do 1000 individual surveys of length 67, expect on average 13.56 to provide 43 or greater positive responses to the 50-50 question.

The interesting question that was posed is " How many would we have to ask, to have a 50:50 chance of finding a run of 67 where 43 agree ? "

Doing discrete surveys, for a 50% chance of finding one survey with >= 43, then you need to take, on average 51 surveys. ((1-0.01356)^51 ≈ 0.5) This requires asking 51*67 = 3417 folk.

If you ask N, then you have (N-66) discrete (but overlapping) surveys. If you assume that these are independent then you would be tempted to say that to get the 51 surveys you only need to ask 51+66 = 117 folk, and there would be a 50% chance of a run of 67 answers containing 43 positives. This is wrong as these surveys are far from independent.

I think this is a hard problem, related to the probability of finding runs of heads in a sequence of coin tosses. I suspect that searching for any pattern is similar to searching for a pattern of all heads, The number of different sequences that give 43 heads in 67 tosses is 67C43 = 9.73e17, so I looked for a sequence length where the probability of a run of 67 heads was 1/(2*9.73e17), - the 2 for a 50% probability, and using http://www.gregegan.net/QUARANTINE/Runs/Runs.html suggested around 700 is the right number. So the suggestion is that for a 50% chance of success, you need to ask 700 people, and somewhere in their answers there is a 50% chance that you will find a run of 67 with 43 positive answers.

This is clearly approximate, but interesting, so I ran a quick simulation, and found that in 10,000 trials, each with 700 virtual coin flips, 48.7% of trials gave at least one run of 67 containing 43 or more heads.

So the number is around 700.

My analysis is clearly not exact, In looking for >= 43 heads, I should have said that the number of combinations is 67C43 + 67C44 + .. , but the simulation suggests that the asnwer is around 700. At that point I ran out of time ...
 

Thread Starter

Deleted member 115935

Joined Dec 31, 1969
0
Not sure I agree with the analysis on runs.

The probability of getting >= 43 heads in 67 tosses of a fair coin is 0.01356 (agree with @wayneh), easily confirmed (e.g. Wolfram).

So, if you do 1000 individual surveys of length 67, expect on average 13.56 to provide 43 or greater positive responses to the 50-50 question.

The interesting question that was posed is " How many would we have to ask, to have a 50:50 chance of finding a run of 67 where 43 agree ? "

Doing discrete surveys, for a 50% chance of finding one survey with >= 43, then you need to take, on average 51 surveys. ((1-0.01356)^51 ≈ 0.5) This requires asking 51*67 = 3417 folk.

If you ask N, then you have (N-66) discrete (but overlapping) surveys. If you assume that these are independent then you would be tempted to say that to get the 51 surveys you only need to ask 51+66 = 117 folk, and there would be a 50% chance of a run of 67 answers containing 43 positives. This is wrong as these surveys are far from independent.

I think this is a hard problem, related to the probability of finding runs of heads in a sequence of coin tosses. I suspect that searching for any pattern is similar to searching for a pattern of all heads, The number of different sequences that give 43 heads in 67 tosses is 67C43 = 9.73e17, so I looked for a sequence length where the probability of a run of 67 heads was 1/(2*9.73e17), - the 2 for a 50% probability, and using http://www.gregegan.net/QUARANTINE/Runs/Runs.html suggested around 700 is the right number. So the suggestion is that for a 50% chance of success, you need to ask 700 people, and somewhere in their answers there is a 50% chance that you will find a run of 67 with 43 positive answers.

This is clearly approximate, but interesting, so I ran a quick simulation, and found that in 10,000 trials, each with 700 virtual coin flips, 48.7% of trials gave at least one run of 67 containing 43 or more heads.

So the number is around 700.

My analysis is clearly not exact, In looking for >= 43 heads, I should have said that the number of combinations is 67C43 + 67C44 + .. , but the simulation suggests that the answer is around 700. At that point I ran out of time ...
@Tesla23

Fantastic

Thank you so much,


My "simulations" using Excel were coming up with a number nearer 1000 people ,
but as you say, I ran out of time, and as we know statistics is notorious bad are being simulated unless a "significant" sample is taken , which my sample list was not.
 

wayneh

Joined Sep 9, 2010
17,498
I think this is a hard problem, related to the probability of finding runs of heads in a sequence of coin tosses.
Quite right.

In the business world, the behavior of time-series data is of prime interest. (Double pun not intended!) Once a series of, say, stock prices is de-trended (to remove the growth bias) and the significant events (earnings announcements, news articles, etc.) accounted for, what remains is called a random walk. The question at hand is akin to asking the odds of finding, in a random walk, a 67 day period with 43 up and 24 down. I can easily believe you'd need to wait a couple years (~700 days) before seeing it happen.

I'm absolutely certain that the answer is well known to people in the field, but not me.
 

Tesla23

Joined May 10, 2009
542
Quite right.

In the business world, the behavior of time-series data is of prime interest. (Double pun not intended!) Once a series of, say, stock prices is de-trended (to remove the growth bias) and the significant events (earnings announcements, news articles, etc.) accounted for, what remains is called a random walk. The question at hand is akin to asking the odds of finding, in a random walk, a 67 day period with 43 up and 24 down. I can easily believe you'd need to wait a couple years (~700 days) before seeing it happen.

I'm absolutely certain that the answer is well known to people in the field, but not me.
I thought of the random walk approach, but the main I couldn't find any results on the incremental displacement after a given number (67 here) of steps.

Another approach that came to mind is to low-pass filter a binary data source. If you take bits of binary data (0/1) and LPF them with a sliding window that sums the last 67 bits, We're interested in how often the output of this equals, or exceeds, 43. There is work in stochastic processes on the statistics of the peaks of filtered random data, but I'm not very familiar with the area, Thinking about this clearly shows that the 'surveys' you get by sliding the window by 1 are not independent from the previous ones, The LPF effect only allows the value of the output to change slowly, whereas if you plotted the outputs of a number of truly independent surveys they would vary much more dramatically from survey to survey.
 

MrSalts

Joined Apr 2, 2020
2,767
Thanks for the wolfram link, my model had an error.
I did the complete calculation and some rousing was required because numbers got quite large (obviously) but,

of the binary 67 bit number,
-> approximately (to the nearest million) 147,573,952,589,676,000,000

the number of those combinations with 43 or more high digits is (to the nearest 10,000):
2,001,201,236,565,680,000

To yield, as Wolfram calculated, 1.35599962015%
Or, if you wanted exactly 43 of 67,

it is
972,963,730,453,315,000 of the possible combinations
Or 0.6593059%
 

Thread Starter

Deleted member 115935

Joined Dec 31, 1969
0
Thank you guys,
To me that is amazing ,
Bottom line, as I suspected, its quiet likely that the survey company only had to sample a few hundred people to find a run of 43 out of 67 that agreed...
 

MrSalts

Joined Apr 2, 2020
2,767
Thank you guys,
To me that is amazing ,
Bottom line, as I suspected, its quiet likely that the survey company only had to sample a few hundred people to find a run of 43 out of 67 that agreed...
It was an interesting question.... maybe not an interesting question but the work needed to get to a solution was interesting.
 

wayneh

Joined Sep 9, 2010
17,498
... its quiet likely that the survey company only had to sample a few hundred people to find a run of 43 out of 67 that agreed...
It would be amazing to me if that was actually what happened. If you're going to cheat, you may as well save a ton of money and just make something up. No need to pay for expensive testing.

My guess is that prior testing showed they had an edge on shininess. Perhaps they formulated the product to achieve this. So they gave the green light to expensive testing to "prove" it. They continued adding n subjects until the results showed statistical significance. If I was in charge of the marketing, I would have demanded that level of confidence before spending advertising dollars on shininess. Advertising something you don't deliver is pissing in the wind.
 

Thread Starter

Deleted member 115935

Joined Dec 31, 1969
0
My guess is that they don't cheat , when they can quiet correctly just keep sampling till the find the answer they want.
 

402DF855

Joined Feb 9, 2013
271
Yeah, that's called cheating.
Gamesmanship? Gaming the system? Cheating, lying, and stealing are in fact successful survival strategies, now, as throughout history. Calculating basic probabilities may be complicated, but realizing that you should trust absolutely no one is a simple conclusion to reach.
 
Status
Not open for further replies.
Top