For the math experts: Please see the attached .xls spreadsheet. I've got 2 series of data. The first is the original data, the second is a minor transform of the first. Either of the datasets are valid for my computations, but it seems it is easier to curve fit the second set with a polynomial than the first, as is clear from the graphs. While the second fit is good, it's not as good as I want it. The numbers represent a natural phenomenon (which I am unable to disclose), and my gut tells me there should be a computable curve that fits exactly. Ideally, the entire curve should be able to be computed from a single equation. I've tried splitting the curve into sections, and I've had great success in modeling the first 2/3rds of the curve. But the last 1/3rd always seems to give me problems no matter what I do. FYI, I normally use the LINEST function, but I've shown the trendlines computed by the graph in Excel in the data provided. I'm wondering if one of our math experts can look at the curves and tell me if a different class of equations might do a better job fitting the data for one or both series. Thanks for the help.
Without actually playing with it, I would recommend breaking it into the sum of four terms. The first two terms provides the linear character of the middle part of the data while two exponentials provide the tails.
My first questions would be Were the measurements really made to the accuracy implied by the number of decimal places? If they were how many repeats of the measurements were taken to confirm? Are there any theoretical reasons to assume an asymptote at about x = 0.13 in the first curve or are there any continuity conditions on the curve slopes at the ends of either curve?
All good questions: All values are accurate to ~4 decimal places. This is both by design, and confirmed by cross-checking the data against both theoretical values and multiple instances of the physical apparatus. I have taken the data *dozens* of times. There is definitely a vertical asymptote near x = 0.13. This is due to the nature of the phenomena. Unfortunately, my apparatus is not good enough to measure down to the actual asymptote within the margin of error. I wish I could! AFAIK, and also theoretically, there are no expected discontinuity in the curves. BTW, modeling the precise *shape* of the curve is more important than a one-to-one correspondence of the values. I can adjust for scaling and offset errors, but not for mismatches in the shape. If you look closely at the second curve, the polynomial fit is close, but there are wiggles that extend above and below the actual data.
Where are the "theoretical" values coming from? Why doesn't the theory give you the functional form of the curve leaving you to simply fit the parameters, perhaps according to a least-squares fit?
Apologies for not being clear: The theory (at least as deep as I am capable of taking it) predicts the rough shape of the curve, not the numerical values. The values measured lie within the locus of points that fit the theory. The accuracy of the data has been confirmed via repeated measurements with multiple instances of apparatus and NIST traceable instrumentation. The curves shown are a least-squares polynomial fit. They work well, especially the second series. But as I mentioned earlier, there are "wobbles" in the curve that tell me it's not as good a fit as possible, and I am curious if there are other classes of equations that would better fit the data provided than a standard polynomial.
Okay, but what is the rough shape of the curve that the theory provides? Why can't you develop a mathematical model from that and use least-squares to fit parameters to it? Speaking of which, "what is this transform" that you are referring to?
Yes there are many more available functions that might be fitted. You could simply look at rational polynomial approximation (Pade), that might well give a better fit with less computational effort. An orthogonal series eg Tchebychef. might be better still. The above methods only match the values of the function at known points. Very often the issue of the derivatives need to be matched to obtain a better fit. You have plenty of data for this, so you could explore (cubic) splines. You say there is a real world cut off or zero in the data at about x=0.12, as previously noted. Unfortunately the more tightly you fit a simple polynomial and the higher its order, near a zero, the more wiggly it becomes. The other methods suggested above do not have this problem.
Data based on a physical process have a mathematical model. Define that mathematical model first before attempting to do any curve fitting. Fitting to a polynomial of an order higher than the physical process will result in squiggles.
Not always, by a long way. And even if there is one the computational effort of the real function may well make its use unattractive compared to an approximating one. That is what the calculus of variations and finite elements is all about.
If you reverse the X and Y axes, a sigmoid curve: http://en.wikipedia.org/wiki/Sigmoid_function seems to fit the data rather well:
Beautiful! And I'm slightly embarrassed that I haven't heard of such before. But at least I have a direction now. I will need to research the following, but any knowledgeable assistance would be appreciated: From what I see on the Wikipedia page, there are a class of Sigmoidal Functions. How to go about choosing the appropriate one? How to compute coefficients for least R^2? Can the function be transposed?
The theory predicts the leading edge curve and asymptote, the "linear" middle and the trailing edge curve and *supposed* asymptote. If I already knew the whole equation that models the curve (from a physical phenomena standpoint), I wouldn't need to ask the question! I'd just apply it. I am actually trying to derive the model that mathematically explains the phenomenon.
Since you asked: Assume Y is the vertical axis of the first series, and Y' is the vertical axis of the 2nd series, then: Y' = log2(2^Y + 300) I derived the transformation empirically, based on the data, that allowed the best overall polynomial fit. I assume it works because it "linearizes" the leading curve and eliminates the asymptote. It is also easy to reverse transform. Aside from that explanation, I have no other reason for using the transform.
No. There is a huge difference between a mathematical model of the process and a mathematical model of the data.
One of the Sigmoidal Functions is also called a four parameter fit. http://www.miraibio.com/blog/2010/08/the-4-parameter-logistic-4pl-nonlinear-regression-model/
In a private message, The Electrician recommended the software TableCurve 2D available at: http://www.sigmaplot.com/products/tablecurve2d/tablecurve2d.php I downloaded and tried the "trial" version (actually, full blown version limited for 30 days). In about 2 seconds, the software fitted 3,547 different equations for the Series 2 curve, and ranked them vs. R^2. 51 of the equations had R^2 of 0.999999 or better, and it visually graphs the response of the equation vs. the data. Interestingly, though not necessarily surprising, the Series 1 fitted curves did not match the data as well as Series 2. Very nice. While I typically don't purchase Windows software, I may make an exception in this case. Thank you, The Electrician.
A trick you should always keep in mind is to reverse X and Y and try fitting again--TableCurve will do the reversal with a single click. That's what I did with your series 1 data. Sometimes you can solve the equation for the reversed X and Y axes in closed form. That actually works for the sigmoid TableCurve found.
For those who care, TableCurve 2D gave me this equation for Series 2 as a good fit (R^2 = .99999816): Y=(A+Cx+Ex^2+Gx^3)/(1+Bx+Dx^2+Fx^3) where: A=7.4250189 B=0.45593574 C=11.159249 D=-2.4873822 E=-26.356847 F=1.0763004 G=8.5671721 While this particular equation (and its constants) does not give me insight into the workings of the phenomena, it does give a great fit without the wiggles. And, it was just a five minute job to plug the formula into my PIC application (in .asm!). The end results are phenomenal. Thanks for your help, The Electrician!