DADiSP Worksheet Functions > Function Categories > Curve Fitting > LINFIT
Performs general least squares fitting (multiple linear regression) of arbitrary basis functions.
LINFIT(A, y, "method")
A |
- |
An input array, the basis function design matrix. |
|||||||||
y |
- |
An input series, the Y output. |
|||||||||
"method" |
- |
Optional. A string, the solution method. |
|||||||||
|
|
|
A series, the basis function coefficients.
x = {-1, 1, 1.5, 3};
y = {6, 2, 2.25, 6};
c1 = polyfit(y, x, 2);
c2 = linfit(ravel(ones(length(x),1), x, x^2), y);
c1 == c2 == {3, -2, 1};
The first method uses POLYFIT to fit a quadratic polynomial to the data points. As shown, LINFIT produces the same result. The fitted equation becomes:
W1: grand(100,.01)
W2: 5*W1 + 3*sin(W1) - 2*exp(W1*W1)
W3: ravel(W1, sin(W1), exp(W1*W1))
W4: linfit(W3, W2)
W4 == {5, 3, -2}, the coefficients of W2.
In this example, we wish to find the coefficients of the relation:
Notice that although each term is not necessarily linear, the equation as a whole is a linear combination of terms. LINFIT solves the set of simultaneous equations to yield:
x1 = gnorm(100, 1);
x2 = gsin(100, 1, 1/10);
x3 = 1..100;
x4 = exp((1..100)/100);
y = 3 * x1 + 10 * x2 - x3 + 30 * x4;
d1 = ravel(x1, x2, x3);
d2 = ravel(x1, x2, x3, x4);
W1: linfit(d1, y);label("Coef 1")
W2: linfit(d2, y);label("Coef 2")
W3: d1 *^ W1;label("Fit 1")
W4: d2 *^ W2;label("Fit 2")
W5: {norm(y – W3)^2};label("Chi^2 1")
W6: {norm(y – W4)^2};label("Chi^2 2")
In this example, the data is modeled as a function of three unrelated input series and also as a function of the same three input series plus a fourth series. We wish to find the coefficients of the relations:
W1 and W2 compute the least squares coefficients while W3 and W3 compute the actual fit. The fitting error for each result is displayed in W5 and W6. Because the 2nd fit includes all the components of the input data, the fit error is negligible. This form of least squares fitting is often referred to as multiple linear regression.
Consider the problem of fitting a set of data points to a model that is a linear combination of functions, called basis functions:
The basis functions themselves are not required to be linear, only the dependence on the coefficients ck is linear.
By employing the method of least squares, we wish to find the coefficients that minimize the sum of the squares of the differences between the data points and the fitted results:
Minimization is achieved by taking the partial derivatives with respect to each coefficient and setting the result to 0. For real basis functions this yields:
Thus, for each basis function Xk , we have:
where N is the number of data points and M is the number of basis functions. We define the NxM design matrix:
where the design matrix has the form:
Each column of the design matrix is a basis function evaluated at the known x values.
We also define the output vector y of length N as:
With these matrix definitions, we have:
and in matrix form, the least squares problem becomes:
where c is the vector of basis function coefficients. This formulation of the least squares problem is known as the normal equations.
LINFIT(A, y, "norm") solves for the normal equations directly using CROSSPROD. However, the normal equations are often ill-conditioned and solving for the coefficients directly can be susceptible to roundoff errors.
To investigate an alternative solution to the least squares problem, notice that the original basis function relation:
can be written in matrix form as:
With our matrix definitions, we restate the least squares criteria as finding the coefficient vector c that minimizes:
The quantity is minimized to 0 when:
i.e. the basis function relation. This system of equations is over determined since N, the number of data points, must be greater than M, the number of basis functions.
Using QR decomposition, matrix A is expressed as the product of an NxN orthogonal matrix Q and a NxM matrix R where R is an upper triangular matrix such that the last
where I is the identity matrix. Substituting into the basis function relation and multiplying by QT:
Recall that R is an upper triangular matrix of the form:
where Θ is a (N-M)xM matrix of zeros. By making the following partition:
such that q1 is Mx1 and q2 is (N-M)x1, the least squares problem can be solved by:
This upper triangular system can be solved for c using simple back substitution.
LINFIT(A, y, "qr") or LINFIT(A, y) uses QR decomposition to solve the
Singular value decomposition, SVD, decomposes the design matrix A into three matrix components, U, W and V such that:
where U is an orthonormal matrix that contains the left singular values, W is a diagonal matrix that contains the weighting factor or singular values and V is a orthonormal matrix that contains the right singular values.
With this decomposition, the least squares problem can be solved with:
where wj (the singular values) are the diagonal values of matrix W.
LINFIT(A, y, "svd") uses SVD, singular value decomposition to solve the
See [1] for further details.
See DADiSP/MatrixXL to significantly optimize LINFIT.
[1] Press, Flannery, Teukolsky, Vetterling
Numerical Recipes in C
2nd Edition
Cambridge Press, 1988, 1992
pp. 671-675