Linear Regression in machine learning

In this article of Vikrama Tech you will get to know about linear regression is. How linear regression helps in predicting values based on certain data.

Linear regression is first and most basic algorithm used in machine learning. Linear regression is not a new algorithm in the industry rather it has been used for decades now in the fields of Weather broadcasting, predicting sales, predicting productions and many other predictions. It is actually a mathematical and statistical tool for getting something important information from a piece of data.

Now there are two things which needs to addressed before going further in linear regression. These are:

Independent values:These are the value(s) which are not dependent on other values or data. In most cases these type of values are the input values.
Dependent values:These are the value(s) which are dependent on other values or data. In most cases these type of values are the predicted values or the output.

To understand them well, see an example:

Independent value(speed)(km/hr)	Dependent value(time)(hr)
10	2
30	1:45
50	1
80	0:30

Now certain points can be concluded from this table:

Firstly we identified a pattern here that is whenever the speed is increased time decreases.
Second is after properly applying linear regression algorithm on this data we will able to predict value based on given input For eg, say if you want to take out time when the speed is 150km/hr.

Now coming to applied linear regrssion. What everybody wants to learn is how to apply linear regression to a piece of data.

Linear regression purpose is to find a best line which completely fits to the points which you have gathered from the dataset.This line will actually help you to find the best value of y based on the value of x. In the following graph you can see the plotted dataset. You can get pubicly free dataset from any public website.

The general hypothesis of linear regression is same what we studied in elementary school's straight line formula. It is stated as: ax + b

From the above hypothesis x is the input value from dataset, a is slope of the line in respect to the x and b is the intercept. Now the important thing is the best fit line is drawn correctly when we have the minmum/optimum a and b. Most of the time in optimizing a model is consumed because of these values. We need optimum values for a,b to best fit line. Then the model predicts perfectly.

Now we draw a line approximately through the plotted points. It may seem that the line is best fit but it might not be the case. Best fit looking line can also predict wrong output. So right now we can just draw a simple line through the points.

Now you can see in graph above that line is bit distant away from the points. So this line is not the best fit. How to make this line a best fit. So here comes an another algorithm named Mean squared error. This algorithm states: 1/n . Σ(predicted_i - y_i)² for all the points. This algorithm is also known as Cost funtion because this algorithm gives you the cost of the entire dataset.

Now its time to optimize our regression model to minimize the a and b from the above hypothesis. Here comes another algorithm which is really essential in machine learning known as Gradient descent. Gradient descent is a first-order iterative optimization algorithm for finding a local optima of a differentiable function. To find a local optima of a function using gradient descent, we take steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point.

Now as we know to take out the difference between points we always derivative/differentiation. So getting the derivative in terms of θ₀ :

1/n . Σ2(y_i - (θ₀x_i + θ₁)) . (-x_i)

Similarly getting derivative in terms of θ₁ :

-2/n . Σ2(y_i - (θ₀x_i + θ₁))

Once you put all the x values in the hypothesis for all the training set from the dataset you will get all the derivatives.

Now lastly you have a learning rate which tells how fast you are moving towards the optimum point. It can be as small as 0.0001 till as high as 1. So new θ₀ = θ₀ - L . derivative(θ₀). Similarly for θ₁ = θ₁ - L . derivative(θ₁)

All done. I will create an article on applied linear regression program in python and octave soon. Stay tuned!!