a question about analysis of variance (statistics)

Thread Starter

Werapon Pat

Joined Jan 14, 2018
I'm confused about hypothesis parameters. sometimes they use mu which represents for mean, and sometimes they use trou which represents
for treatment effect , but how do I know which one should I use? it depends on what?
I have attached a file(right down below) that gives an example and the solution.
for the first problem they use mu, but the second one they use trou, why is that?


I would explain it casually like this:

Both mu and trou (as I understand your use) are measures describing a dependent variable that you are interested in...that is what is being measured, like a score, height, weight, etc... Generally, you use mu when you are estimating the population mean and trou when representing the treatment effect mean.

That simple example of lubricants is asking the question, "How likely is it that I got that result if lubricant type does not matter at all?"

ANOVA provides a ratio (F) of two mean squared error terms...that in the case of treatment (meaning what is accounted for by your model - which in this case is that the lubricants make a difference in the score - that is, the type of lubricant has a significant effect on the score) and that in the case which is not accounted for by your model (the type of lubricant has no effect on the score). Together, we act like that sum accounts for all the variance in the population, understanding that we are only estimating that population variance. When F =1 (or <1) your model is doing no better explaining the differences in the observed measurements than not using your model - in which case your treatment effect is never statistically significant. Also, keep in mind that a significant overall treatment effect does not tell you which treatment is better than another, but only that they are not all equivalent and without effect. You would typically (but not always) apply other tests only after seeing a significant treatment effect to find out which was different from another and other kinds of multiple comparisons.

When your F is large, it may be statistically significant. Meaning not only can you account for the differences in the "scores" by your model, but that the difference is larger than you would expect to see by chance alone (not by chance alone being assumed to be a probability < than some value that by convention is .05 or 0.25 or .01 or smaller).

You can always tell the difference between the two MSEs by looking at the degrees of freedom (df). Those df must always add up to the df for the population estimate, and this will {always} be the number of measures -1. That -1 is because, given any mean for all of your measures you can get any value for any of the observed scores except for one of them...that is the price that you pay for estimating that parameter.

For treatments in your lubricant example (a–1 = 2), you will see df as number of treatments-1 ...for the same reasoning (they are kinds of lubricants? I don't know, as I didn't read the English text carefully).

For the error in your example, you see a(n–1) = 21 [note that is not a X (n-1), but rather number of observations (24) - number of treatments (3)]. Which, is representing the df in the case if you don't consider lubricant type in the model at all. That is, no treatment effects.

And together they sum to an–1 = 23. I know from that df alone that you have 24 sample measurements (just as you see in the table).

Confused even more? :) Don't worry, least squares estimation and inferential statistics can be complicated, but they are beautiful techniques, even to the mathematically challenged, like myself.
Last edited: