6th order polynomial expression

MrAl

Joined Jun 17, 2014
13,704
I have a set of data, defined by Y=A+BX+CX^2+DX^3+EX^4+FX^5+GX^6.
. I do a calibration of 24 point sample and then do a polynomial curve fit. X would be a voltage, A,B,C,D,E are constants, and Y is a resulting measurement (in engineering units). Lets say, 24 point sample points are [0.08,0.16,0.24,0.32,0.40,0.48,0.56,0.66,0.76,0.86,0.96,1.6,2.6,3.6,4.8,6.0,7.2,8.4,9.6,10.8,12.0,13.2,14.4,16.6] and do a polynomial curve fit. Now, lets say the X voltage is 0.065 do use that polynomial Y to get the result. Will that give a wrong result and error because at first instant of 24 point sample is 0.08 and polynomial Y is different. Its missed that data sample. I would appreciate if any one can help on this. Thank you.
Hello there,

Curve fitting is almost an art form but there are mathematical techniques that can get you some good results, and since you appear to have a LOT of data points you can even get some decent out-of-range results as long as they are not too far out of range.
I'll present a few different methods here and point out some problems.

The first problem I see is it looks like you have no data point for when x=0 or close to that. You can't always use that point but when you can it helps to predict values between x=0 and when x is equal to your first positive value (0.08 here). If you don't have the data for when x=0, then make one up based on the other data points near that, or else use a method that can predict that data better than a simple polynomial fit.

The second problem I see is you seem to be using a simple polynomial method for the fit. That's probably the worst fit you can use because you can never predict ahead of time what will happen to the function in between the actually logged data points. For example, say you have points (1,1), (2,2), and (3,3). It would make sense to assume that the poly you come up with will give you y=2.5 for x=2.5, yet using a simple polynomial it could end up being just about anything, and when i say anything, i mean anything. If you get lucky it may come out to (2.5,2.6) with the poly you come up with, but there's no way to know that and it may come out to (2.5,9.2) or just about anything else like (2.5, 100), and either of these would be way, way off. That's the problem when fitting with a simple polynomial method fit.

The third problem I see is the proposed polynomial is of high order. The problem that can come up when you go over third order (y=a*x^3+b*x^2+c*x+d) is numerical truncation or rounding. That leads to very bad results also, unless you use higher than usual precision in calculating the coefficients, and you may even have to retain that precision for the actual end formula used to generate the y data values once you get the right polynomial.

So what is the cure for all this.

The first is to try to come up with a value for when x=0 or when x is very small like 0.001 or even smaller if needed.

The second is to use a method that is known to create coefficients that allow the resulting polynomial to be more well-behaved. That is, no extreme y values for in between x values. The most widely used method is the Least Squares Method, although there are some subtle variations to that. For example, a straight up Least Squares, or Maximum Absolute Value method. These kinds of methods provide for a much smoother polynomial function. In many cases a third order would be enough, but I realize that sometimes we want to go for more perfection especially with modern computers that can perform these calculations so fast. Might as well get the most out of it.

The third is to use higher precision for the number crunching, and that will result in coefficients that are more precise. You may be able to round them later, but if not you may have to use those in the actual resulting polynomial which means you would have to retain that precision to the very end, where you could then round.

There are other very notable techniques that use matrixes (see attachment). Some of these will rotate the equations such that a variation in one variable will result in an orthogonal variation in the error variable, and this means you can predict the coefficient(s) that result in the minimum error. This is the method shown in the "Optimizations-01.jpeg" image. This also allows you to recognize redundant variables. For example, you may find that as you vary the coefficient for x^4 you get the same results as for x^2, so you dont need the x^4 term, or you may find that x^4 results are very similar to x^2 results, so you don't need the x^2 term. This helps to simplify the resulting polynomial.

A method I like though is to use a Least Squares type of fit, but include derivatives. Since the derivatives correlate to the smoothness of the curve, the derivatives allow you to produce a curve that is guaranteed to be smooth between data points, and also allow some predictability for points out of range (to some degree). An example of this is shown in the image "Interp_ThreePoint-01.gif".

The straightforward Least Squares Fit equations are shown in the image "LSF-01.jpeg".
Note that for the third order fit that we already have a term with x^6 in it, which means the precision becomes questionable using the normal precision which is around 16 decimal digits. This means you may have to go higher in precision to get useable results. If we went to a 6th order LSF, we would end up with a term that included an x^12 factor, which would mean we almost definitely have to go to higher precision for all the number crunching.
If you do decide to go higher with that method, you can follow how the form of the equations change from first order to third order and expand to 6th order or higher. You should try 3rd order first though and see if you can live with that.

There is also the possibility of a pre-calculation. If your data varies in a way that is known, you can pre-calculate the values used for the least squares fit and then to generate the values with the resulting formula you just use the inverse function along with the regular polynomial. This method works extremely well with NTC thermistor data for example.

Another note here is the possibility of outliers. Outliers can mess up the entire solution. These are values that are too far from the normal and so are excluded from the data set. The method in the image "Optimizations-01.jpeg" can be used to identify these problem values.

Of course these are all empirical methods. They may however help you to discover the underlying physical laws at work though and thus allow you to improve your function, such as in the pre-calculation technique.

The best method is to know the physics behind the data, as this usually provides the best function. It may get quite a bit complicated though because some physical phenomena can be very complex in nature.
 

Attachments

Last edited:

wayneh

Joined Sep 9, 2010
18,104
"All models are wrong, some are useful" - George Box

The "usefulness" or suitability of your model depends on what you want to accomplish with it. Usually you want to estimate - predict - a Y value given an X value (or set of Xs) that is not already in the collected data. IF the model is asked to interpolate, and IF the model fit to existing data is good enough to give an estimate with acceptable precision, then the model is useful. If the fit is ±5% but you need an estimate good to ±1%, your model is not yet useful.

If you need to extrapolate, things get complicated. Your error range of course expands quickly as you leave the modeled space. A solid understanding of the underlying process can greatly improve the range of the model. For instance if your model is the ideal gas law, PV=nRT, it can handle extrapolation far away from your data space. In effect you are incorporating an enormous amount of other people's data into your own model, seeing further because you stand on the shoulders of giants.
 

MrAl

Joined Jun 17, 2014
13,704
Hello again,

Just for the heck of it I tried to get a solution from ChatGPT.
The third order solutions came out very wrong, and at first it tried to hand me a Python program code set to run to get the solution for the coefficients.
I also tried a sixth order least squares fit and it may have actually gotten that one right, but i didnt check it yet because it takes a bit of time to do that.
My general opinion then is not to use ChatGPT for this, unless maybe you intend to run the Python program code it gives you.
I don't like Python though as far as i can tell is it the least human friendly language i have ever seen.
 
Top