Linear and Nonlinear Regression

Last updated
Save as PDF

Page ID: 2342

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\(\newcommand{\longvect}{\overrightarrow}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

Regression analysis is a statistical methodology concerned with relating a variable of interest, which is called the dependent variable and denoted by the symbol y, to a set of independent variables, which are denoted by the symbols \(x_1\), \(x_2\), …, \(x_p\). The dependent and independent variables are also called response and explanatory variables, respectively. The objective is to build a regression model that will enable us to adequately describe, predict, and control the dependent variable on the basis of the independent variables.

Simple Linear Regression Model

The simple linear regression model is a model with a single explanatory variable \(x\) that has a relationship with a response variable y that is a straight line. This simple linear regression model is

\[y=\beta_{0}+\beta_{1}{x}+\varepsilon \label{1} \]

where the intercept \(β_0\) and the slope \(β_1\) are unknown constants and ε is a random error component. The errors are assumed to have mean zero and unknown variance \(σ^2\). Additionally, we usually assume that the errors are uncorrelated. This means that the value of one error does not depend on the value of any other error.

It is convenient to view the explanatory variable \(x\) as controlled by the data analyst and measured with negligible error, while the response variable \(y\) is a random variable. That is, there is a probability distribution for \(y\) at each possible value for \(x\). The mean of this distribution is

\[E(y|x)=\beta_{0}+\beta_{1}{x}\label{2} \]

and the variance is

\[Var(y|x)=Var(\beta_{0}+\beta_{1}{x}+\varepsilon)=\sigma^2\label{3} \]

Thus, the mean of \(y\) is a linear function of \(x\) although the variance of y does not depend on the value of \(x\). Furthermore, because the errors are uncorrelated, the response variables are also uncorrelated.

The parameters \(β_0\) and \(β_1\) are usually called regression coefficients. These coefficients have a simple and often useful interpretation. The slope β1 is the change in the mean of the distribution of y produced by a unit change in \(x\). If the range of data on x includes \(x=0\), then the intercept \(β_0\) is the mean of the distribution of the response variable \(y\) when \(x=0\). If the range of \(x\) does not include zero, then \(β_0\) has no practical interpretation.

Least-squares estimation of the parameters

The method of least squares is used to estimate \(β_0\) and \(β_1\). That is, \(β_0\) and \(β_1\) will be estimated so that the sum of the squares of the differences between the observations y_i and the straight line is a minimum. Equation \ref{1} can be written as

\[y_{i}=\beta_{0}+\beta_{1}x_{i}+\varepsilon_{i}, \;\;\; i=1, 2,..., n\label{4} \]

Equation \ref{1} maybe viewed as a population regression model while Equation \ref{4} is a sample regression model, written in terms of the n pairs of data (\(y_i\), \(x_i\)) (i=1, 2, ..., n). Thus, the least-squares criterion is

\[ S(\beta_0,\beta_1)=\sum_{i=1}^n(y_i-\beta_0-\beta_{1}x_{i})^2\label{5} \]

The least-squares estimators of \(β_0\) and \(β_1\), say \(\hat{\beta}_0\) and \(\hat{\beta}_1\), must satisfy

\[ \dfrac{\partial{S}}{\partial{\beta_0}}=-2\sum_{i=1}^n(y_i-\hat{\beta}_0-\hat{\beta}_1x_{i})=0\label{6} \]

\[ \dfrac{\partial{S}}{\partial{\beta_1}}=-2\sum_{i=1}^n(y_i-\hat{\beta}_0-\hat{\beta}_1x_{i})x_i=0\label{7} \]

Simplifying these two equations yields

\[ n\hat{\beta}_0+\hat{\beta}_1\sum_{i=1}^nx_i=\sum_{i=1}^ny_i\label{8} \]

\[ \hat{\beta}_0\sum_{i=1}^nx_i+\hat{\beta}_1\sum_{i=1}^nx_i^2=\sum_{i=1}^ny_ix_i\label{9} \]

Equations \ref{8} and \ref{9} are called the least-squares normal equations, and the general solution for these simultaneous equations is

\[\hat{\beta}_0=\dfrac{1}{n}\sum_{i=1}^ny_i-\dfrac{\hat{\beta}_1}{n}\sum_{i=1}^nx_i\label{10} \]

\[ \hat{\beta}_1=\dfrac{\sum_{i=1}^ny_ix_i-\dfrac{1}{n}(\sum_{i=1}^ny_i)(\sum_{i=1}^nx_i)}{ \sum_{i=1}^nx_i^2-\dfrac{1}{n}(\sum_{i=1}^nx_i)^2} \label{11} \]

In Equations \ref{10} and \ref{11}, \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are the least-squares estimators of the intercept and slope, respectively. Thus the fitted simple linear regression model will be

\[ \hat{y}=\hat{\beta}_0+\hat{\beta}_1x\label{12} \]

Equation \ref{12} gives a point estimate of the mean of y for a particular x.

Given the averages of \(y_i\) and \(x_i\) as

\[\bar{y}=\dfrac{1}{n} \sum_{i=1}^ny_i \nonumber \]

and

\[\bar{x}=\dfrac{1}{n} \sum_{i=1}^nx_i \nonumber \]

the denominator of Equation \ref{11} can be written as

\[S_{xx}= \sum_{i=1}^nx_i^2-\dfrac{1}{n}(\sum_{i=1}^nx_i)^2=\sum_{i=1}^n(x_i-\bar{x})^2 \label{13} \]

and the numerator of that can be written as

\[S_{xy}= \sum_{i=1}^ny_ix_i-\dfrac{1}{n}(\sum_{i=1}^ny_i)(\sum_{i=1}^nx_i)=\sum_{i=1}^ny_i(x_i-\bar{x})\label{14} \]

Therefore, Equation \ref{11} can be written in a convenient way as

\[\hat{\beta}_1=\dfrac{S_{xy}}{S_{xx}}\label{15} \]

The difference between the observed value y_i and the corresponding fitted value \(\hat{y}\) is a residual. Mathematically the ith residual is

\[e_i=y_i-\hat{y}_i=y_i-(\hat{\beta}_0+\hat{\beta}_1x_i), \;\;\; i=1,2,..., n\label{16} \]

Residuals play an important role in investigating model adequacy and in detecting departures from the underlying assumptions.

Nonlinear Regression

Nonlinear regression is a powerful tool for analyzing scientific data, especially if you need to transform data to fit a linear regression. The objective of nonlinear regression is to fit a model to the data you are analyzing. You will use a program to find the best-fit values of the variables in the model which you can interpret scientifically. However, choosing a model is a scientific decision and should not be based solely on the shape of the graph. The equations that fit the data best are unlikely to correspond to scientifically meaningful models.

Before microcomputers were popular, nonlinear regression was not readily available to most scientists. Instead, they transformed their data to make a linear graph, and then analyzed the transformed data with linear regression. This sort of method will distort the experimental error. Linear regression assumes that the scatter of points around the line follows a Gaussian distribution, and that the standard deviation is the same at every value of \(x\). Also, some transformations may alter the relationship between explanatory variables and response variables. Although it is usually not appropriate to analyze transformed data, it is often helpful to display data after a linear transform, since the human eye and brain evolved to detect edges, but not to detect rectangular hyperbolas or exponential decay curves.

Nonlinear Least-Squares

Given the validity, or approximate validity, of the assumption of independent and identically distributed normal error, one can make certain general statements about the least-squares estimators not only in linear but also in nonlinear regression models. For a linear regression model, the estimates of the parameters are unbiased, are normally distributed, and have the minimum possible variance among a class of estimators known as regular estimators. Nonlinear regression models differ from linear regression models in that the least-squares estimators of their parameters are not unbiased, normally distributed, minimum variance estimators. The estimators achieve this property only asymptotically, that is, as the sample sizes approach infinity.

One-parameter Curves

\[y=log(x-\alpha)\label{17} \]

The statistical properties in estimation of this model are good, so the model behaves in a reasonably close-to-linear manner in estimation. An even better-behaved model is obtained by replacing \(α\) by an expected-value parameter, to yield

\[y=log[x-x_1+exp(y_1)]\label{18} \]

where y₁ is the expected value corresponding to \(x=x_1\), where \(x_1\) should be chosen to be somewhere within the observed range of the \(x\) values in the data set.

\[y=\dfrac{1}{1+\alpha x}\label{19} \]

When \(α <0\), there is a vertical asymptote occurring at \(x=-1/α\).

\[y=exp(x-\alpha)\label{20} \]

This model is, in fact, a disguised intrinsically linear model, since it may be reparameterized to yield a linear model. That is, replacing \(α\) by an expected value parameter y₁, corresponding to \(x=x_1\), yields

\[y=y_1exp(x-x_1)\label{21} \]

which is clearly linear in the parameter \(y_1\).

References

Motulsky, H. The GraphPad Guide to Nonlinear Regression; Graphpad Software, 1996
David, A. Ratkowsky HANDBOOK OF NONLINEAR REGRESSION MODELS; Dekker, 1990
Montgomery, D. C., Peck, E. A., Vining, G. G. Introduction to Linear Regression Analysis; Wiley-Interscience 2006

Contributors and Attributions

Hirofumi Kobayashi (UC Davis)

Search

Text Color

Text Size

Margin Size

Font Type