polynomial trend line in Excel

Thread Starter

RimfireJim

Joined Apr 7, 2008
22
I'm hoping to use the coefficients of the polynomial regression in an algorithm to apply a correction to measured values from a machine on my production floor, i.e., calibrate it in software. We have been using a first-order correction successfully except on one machine.

I have some calibration data which I have graphed with an X-Y scatterplot in Excel. Looking at it, it is obvious that linear regression is not a very good fit, so I tried some higher-order polynomial trendlines. A 4th order polynomial looks pretty decent, and has an R^2 of 0.8-something. So far, so good. But when I use the coefficients from the 4th-order equation to try to predict a dependent value for an arbitrary independent value, I get a result that is far different than the displayed trendline curve. I tried this for various independent values (angles from 15 to 65 degrees), and the disparity got worse with increasing angle. The original data has dependent values all within a range of less than ±.2, and the trendline stays in that range, but my calculated values based on the trendline coefficients go from .18 to 22 over the same 15-65 degree range! Fitting a 4th order trendline to my calculated values gives exactly the same equation as that of the trendline for my original data, even though the curves bear no resemblance to each other. I tried the same thing using a 2nd order trendline, with similar results.

This doesn't match my experience with linear regression, in Excel or otherwise, at all. I've triple-checked my formulas and even run the calcs on my HP15C. It's like the displayed equation for the trendline is totally bogus. Anybody have any idea what's going on here?

-Jim
 

someonesdad

Joined Jul 7, 2009
1,583
I don't have Excel, so I can't comment on its algorithms. I'll assume you did your due diligence and typed in some simple data and verified the regression algorithm was giving correct results. You don't give us your data or your plots, so it's like asking your mechanic to diagnose your car problems by a short email and without telling him what model of car it is. :)

It sounds like you're using (multiple) linear regression to fit a polynomial. Then, when you use the resulting equation, you get predictions far away from where you expect them to be. You didn't say whether the abscissas were within the area of the input abscissas or whether you were extrapolating. You need to realize that polynomial fits to data virtually always fail outside of some range unless the underlying process is inherently a polynomial in behavior.

Regression fitting is mechanically simple with automated tools, but the act of generating useful fits can be an art form and dependent on knowledge and experience. Are you modeling a process that is inherently stochastic? Or can additional work pinpoint the causes of variation? What's the underlying model? Did you examine the distribution of the residuals? ANOVA? Plot them against the independent and dependent variables? Heteroscedastic? Non-normal? These all come into play in making good regression models.

I'm certainly not trying to be pendantic, but having used regression a bit over the last 40 years makes me a bit wary. The best advice I can give is to get your hands on a copy of Neter and Wasserman's "Applied Linear Statistical Models", a superb and accessible book. It should be in the hands of any scientist/engineer who wants to make predictions from experimental data and use those predictions.
 

Thread Starter

RimfireJim

Joined Apr 7, 2008
22
Thanks for the response.
I'll admit that I didn't make up some data of a known shape to test the algorithm. My original data is only about 12 points, so it is fairly easy to see what's going on shape-wise. In 15 out of 16 calibration cases, the data appears fairly linear, and a first-order (linear) regression has yielded the expected results for slope and offset correction. The equations generated by the Excel trendline in those cases match what I get with my HP15C and Minitab. It's the 16th case, that is clearly curved, that is giving me fits. I just realized I could copy and paste the graphs into Paint to make jpegs out of them - see attachments. The "lie trendline curve" graph is generated from values I calculated at 5 degree increments using the equation from the 4th order trendline in "raw data curves".

I'm not extrapolating beyond the original range. I was just trying to figure out if there was some way I could use the coefficients from the trendline equation to apply a correction to measured data to remove the error I observe from my calibration data, as I do currently with the first-order regression.

I'm afraid you lost me on most of your third paragraph. The errors come from a combination of sources. There is a low level of random error which can be observed by repeatability tests, but the portion I'm trying to correct for is not random and most likely comes from one sensor on the machine. Are you talking about the residuals from the polynomial regression? I think that is my point - on the original graph, the trendline passes nearly through all the points (residuals would be close to zero, no?), but plotting the curve given by the coefficients from that same trendline give a curve that looks totally different, and the residuals would be huge.

Thanks for the tip on Neter and Wasserman. I'll try to locate a copy.
 

Attachments

Thread Starter

RimfireJim

Joined Apr 7, 2008
22
I think the source of the problem has been found: lack of significant figures for the higher order coefficients. The effect of raising my dependent variables to the 3rd and 4th power was significant! (duh . . .:rolleyes:) When I re-formatted the trendline equation to display 8 decimal places instead of the default of 4, then used those values to re-calculate the curve, I got a pretty good match.
 

Attachments

Last edited:
Top