I'd like to do it using C++. I don't have any experience about this but this method seems to take a long time. Is it okay?Excel will do a regression through a set of points, and it is easy to go back and delete some of the points, and the regression is adjusted with the points deleted.
For the first dataset: 1.3, 1.3, 1.6, 2, 1.8, 7.0, 0.9, 2.1, 5.1If your deviations are 1.3, 1.4, 1.6, 2, 1.5, 7.0, 0.9, 2.1, 5.1
Then yes the magnitude of the deviations gives the outliers away and a simple exclude a deviation greater than X will work.
But if your deviations are 1.3, 1.4, 1.6, 2, 1.5, 0.9, ie close together you should ask for a different test.
Should you exclude 2 in the example?
Well take the average of the deviations.
Then set a limit of deviation from that average and exclude anything beyond that.
So in the example the average deviation is 1.4 and if we set the exclusion limit at 0.5 we would go back and recalculate excluding 2.
The theory of this depends upon the fact that if the data is unbiased the deviations themselves should be normally distributed, so you can use the confidence limits on this to set the exclusion limits.
But first we don't know the line, how can I calculate distance and reject points?the main caveat is execution speed (compute line, check each point if too far from line, reject points that are too far, then repeat whole thing until all points are close enough).
Assume that the dataset here are points with coordinate (x, y).It is an iterative process.
- You compute the line
- You compute the squared deviations
- You find the mean and the variance of the squared deviations
- You eliminate some points
- Goto 1
Since y is a function of x, you are trying to find the equation of a line that represents that relationship. Once you have such a line you compute only the squared deviations of the y coordinate. THE X COORDINATES DON'T HAVE ANY DEVIATIONS. They are the same for the data points and the line. Do you realize how silly the question was?Assume that the dataset here are points with coordinate (x, y).
1. Compute the line: OK by using least square method.
2. Compute the squared deviations: how? with x or y or both
3. Find the mean and the variance of the squared deviations: with x, y or both?
There is a considerable body of statistical theory available for handling data.But there is no statistical justification for doing so. The data are the data.
Introducing confidence limits implies that you have a sound statistical model of the distribution of the deviation.If points are more than 2 or 3 standard deviations away from whatever model fits the other data, it becomes tempting to omit them.
|Thread starter||Similar threads||Forum||Replies||Date|
|L||What's wrong with the two bc327||General Electronics Chat||1|
|G||smps power supply works, but outputs are wrong||Technical Repair||5|
|M||What is wrong in the following circuit?||Microcontrollers||11|
|ESP32 Wrong EEPROM.get value read back from the flash memory||Programming & Languages||5|
|Excluding certain file types from backup in Windows 7 ?||Software & IDEs||7|
by Steve Arar
by Jake Hertz
by Luke James