Figure 1: Iterative method to fit Linear Regression
Linear Regression is a supervised machine learning algorithm. Since this is a regression algorithm, it makes real valued (continuous valued) predictions.
In constrast a classification algorithm makes descrete valued predictions.
Linear Regression builds a linear relation (with constant slope) between feature values and target variables.
Types of Linear Regressions
- In case of only one feature in Linear Regression, it is called Simple Linear Regression. Equation for this can be written as: $$ \hat{y} = \theta_1 * x_1 + \theta_2 $$
- Where $\hat{y}$ is predicted value, $x_1$ is feature, $\theta_1$, $\theta_2$ are model parameters.
-
In case of more than one feature, it is called Multiple Linear Regression. Equation for this can be written as: $$ \hat{y} = \theta_1 * x_1 + \theta_2 * x_2 + … + \theta_n * x_n $$
- In vector notation we can write it as: $ \hat{y} = \theta^T \cdot X $
- $ \theta = \begin{bmatrix} \theta_1 \\ \theta_2 \\ .. \\ \theta_n \end{bmatrix} $ and $ X = \begin{bmatrix} x_1 \\ x_2 \\ .. \\ x_n \end{bmatrix} $
- $\hat{y}$ is predicted value, $\theta_1$, $\theta_2$,…,$\theta_n$ are model parameters, $x_1$, $x_2$,…, $x_n$ are feature values.
- If more than one target is being predicted, it is called Multivariate Linear Regression. Equation for this can be written as: $$ \hat{y}_1,\hat{y}_2,…,\hat{y}_n = \theta_1 * x_1 + \theta_2 * x_2 + … + \theta_n * x_n $$
- In vector notation we can write it as: $ Y = \theta^T \cdot X $
- $ \theta = \begin{bmatrix} \theta_1 \\ \theta_2 \\ .. \\ \theta_n \end{bmatrix} $ , $ Y = \begin{bmatrix} \hat{y}_1 \\ \hat{y}_2 \\ .. \\ \hat{y}_n \end{bmatrix} $ and $ X = \begin{bmatrix} x_1 \\ x_2 \\ .. \\ x_n \end{bmatrix} $
- $\hat{y}_1$, $\hat{y}_2$,…, $\hat{y}_n$ are predicted values, $\theta_1$, $\theta_2$,…,$\theta_n$ are model parameters, $x_1$, $x_2$,…, $x_n$ are feature values.
Approaches to fit Linear Regression
Fitting the Linear Regression model is to find the model parameters ($\theta$ vector). There are different approaches to fit Linear Regression. The most famous ones are:
- Normal Equation: Using normal equation, we can find all the values of model parameters directly and all at once. The equation is defined as: $$ \theta = (X^T \cdot X)^{-1} \cdot X^T \cdot y $$ $ \theta $ is model parameter vector, X is feature values vector and y is target values vector
-
Iterative method to reduce cost function approach reduces the cost function iteratively. Cost function is measure of how far are the predictions from actual target values. We use Mean Squared Error (MSE) as the cost function for Linear Regression. It is dfined as $$ MSE = \frac{1}{m}\sum_{i=1}^{m}(\hat{y}_i - y_i)^2 $$ $ \hat{y}_i $ is ith predicted value computed as $ \hat{y} = \theta^T \cdot X $
$ y_i$ is ith target value, m is total number of data points.
At each iteration, the $\theta$ values are adjusted so that the MSE cost function reduced to minimum value.To minimize cost function, we can use Gradient Descent or similar optimization algorithms.
Figure 1 shows how after each iteration, the Mean Squared Error value reduces and the prediction values reach close to target values.