I am using least square method to find linear equation but getting some problems below.
Is there any idea for this?
Thank you.
Is there any idea for this?
Thank you.
Attachments
-
15.4 KB Views: 350
I'd like to do it using C++. I don't have any experience about this but this method seems to take a long time. Is it okay?Excel will do a regression through a set of points, and it is easy to go back and delete some of the points, and the regression is adjusted with the points deleted.
This appears simple but I have heard it has lots of caveats with the method. Please tell me if you figure it out where are these caveats.As you compute the squared deviations, you apply a threshold test and exclude all points that exceed a certain value.
For the first dataset: 1.3, 1.3, 1.6, 2, 1.8, 7.0, 0.9, 2.1, 5.1If your deviations are 1.3, 1.4, 1.6, 2, 1.5, 7.0, 0.9, 2.1, 5.1
Then yes the magnitude of the deviations gives the outliers away and a simple exclude a deviation greater than X will work.
But if your deviations are 1.3, 1.4, 1.6, 2, 1.5, 0.9, ie close together you should ask for a different test.
Should you exclude 2 in the example?
Well take the average of the deviations.
Then set a limit of deviation from that average and exclude anything beyond that.
So in the example the average deviation is 1.4 and if we set the exclusion limit at 0.5 we would go back and recalculate excluding 2.
The theory of this depends upon the fact that if the data is unbiased the deviations themselves should be normally distributed, so you can use the confidence limits on this to set the exclusion limits.
But first we don't know the line, how can I calculate distance and reject points?the main caveat is execution speed (compute line, check each point if too far from line, reject points that are too far, then repeat whole thing until all points are close enough).
http://en.wikipedia.org/wiki/Distance_from_a_point_to_a_line
It is an iterative process.But first we don't know the line, how can I calculate distance and reject points?
Assume that the dataset here are points with coordinate (x, y).It is an iterative process.
- You compute the line
- You compute the squared deviations
- You find the mean and the variance of the squared deviations
- You eliminate some points
- Goto 1
Yes I did hint at it.Question:
Why did you choose limit 0.5?
Is there a general way for choosing this value?
Since y is a function of x, you are trying to find the equation of a line that represents that relationship. Once you have such a line you compute only the squared deviations of the y coordinate. THE X COORDINATES DON'T HAVE ANY DEVIATIONS. They are the same for the data points and the line. Do you realize how silly the question was?Assume that the dataset here are points with coordinate (x, y).
1. Compute the line: OK by using least square method.
2. Compute the squared deviations: how? with x or y or both
3. Find the mean and the variance of the squared deviations: with x, y or both?
There is a considerable body of statistical theory available for handling data.But there is no statistical justification for doing so. The data are the data.
Introducing confidence limits implies that you have a sound statistical model of the distribution of the deviation.If points are more than 2 or 3 standard deviations away from whatever model fits the other data, it becomes tempting to omit them.
Thread starter | Similar threads | Forum | Replies | Date |
---|---|---|---|---|
Arduino joystick - wrong values | Microcontrollers | 3 | ||
What am i doing wrong here? | PCB Layout , EDA & Simulations | 29 | ||
B | What's wrong with my discharging circuit? | Homework Help | 6 | |
What is wrong with my Timer/Counter? | Microcontrollers | 11 | ||
Excluding certain file types from backup in Windows 7 ? | Software & IDEs | 7 |
by Duane Benson
by Aaron Carman
by Aaron Carman
by Aaron Carman