# Linear Regression with Gradient Descent Maths, Implementation and Example Using Scikit-Learn

We all know the famous Linear Regression algorithm, it is probably the oldest known algorithm in the world used in statistics and other fields. If you are not familiar with Linear Regression, check out this article first as it will help you in understanding the concepts of Linear Regression with Gradient Descent much better.

A gradient is an increase or decrease in the magnitude of the property(weights). In our case, as the gradient decreases our path becomes smoother. Gradient descent might seem like a terrifying concept but all it is doing is updating the weights and slope of the features in every single iteration.

Remember the equation for Linear Regression

`#Linear Regression`

Y=b0+b1*x

where b0 is the slope

b1 is the weight of independent feature x

Now there are a few additional equations you will have to know with Gradient Descent (i)**Weight** optimization and (ii)**delta**(error of the model).

#weight optimization for b0

w(t+1)=w(t)-alpha*delta#weight optimization for b1

w(t+1)=w(t)-alpha*delta*xwhere

w(t+1) is the new updated weight

w(t) is the old weight

alpha is the learning rate (0.1,0.001,0.0001)

delta is the error#error

error=p(i)-y(i)

where p(i)= b0+b1*x

and y(i) is the dependent value of that iteration

The weight optimization for b0 and b1 is almost the same, except in b1 we multiply it with “x”, just like we do in linear regression. Recall in the last paragraph I said that we update the values of weight and slope in each iteration so, in the first iteration **b0=0,b1=0** and **learning rate** will be **0.01**.

X(independent)=[1,2,4,3,5] and Y(dependent)=[1,3,3,2,5]

#Iteration 1

b0(t)=0, b1(t)=0, x=1 and y(i)=1#calculate p(i)

p(i)=b0+b1*x --> 0+0*1 -->0#calculate error

error=p(i)-y(i)-->0-1=-1#calculate b0(t+1) new weight knowing that b0(t)=0

b0(t+1)=b0(t)-alpha*error --> 0-0.01*-1 --->0.01#calculate b1(t+1) new weight knowing that b1(t)=0

b1(t+1)=b1(t)-alpha*error --> 0.-0.01*-1*1 -->0.01

The newly updated weights are b0(t)=0.01 and b1(t)=0.01. We will use this weight for the next iteration where x=2 and y=3

#iteration 2#calculate p(i)

p(i)=b0+b1*x --> 0.01+0.01*2 --->0.03#calculate error

error=p(i)-y(i)--> 0.03-3 --->-2.97#calculate b0(t+2) knowing b0(t+1)

b0(t+2)=b0(t+1)-alpha*error --> 0.01-0.01*-2.97 --->0.0397#calculate b1(t+2) knowing b1(t+1)

b1(t+2)=b1(t+1)-alpha*error*x --> 0.01-0.01*-2.97*2 --->0.0694#iteration 3

x=4,y=3,b0(t+2)=0.0397,b1(t+2)=0.0694p(i)=0.3173

error=-2.6827

b0(t+3)=0.066527

b1(t+3)=0.176708

We need to keep calculating the new slope and weights until we get the least residual when we plug b0 and b1 in Y=b0+b1*x. I hope you understood how the weights are updated as it is crucial to the working of the Gradient descent. Here is a python program to perform the same calculations.

z=1

for i,j in zip(x,y):

if(z==1):

b_0=0

b_1=0

if(z>=2):

b_0=b_01

b_1=b_11

Y=b_0+b_1*i

error=Y-j

b_01=b_0-0.01*error

b_11=b_1-0.01*error*i

print(str(z)+"'st Iteration||||",'B0:',round(b_01,4),'||| B1:',round(b_11,4))

z=z+1##OUTPUT

1'st Iteration|||| B0: 0.01 ||| B1: 0.01

2'st Iteration|||| B0: 0.0397 ||| B1: 0.0694

3'st Iteration|||| B0: 0.0665 ||| B1: 0.1767

4'st Iteration|||| B0: 0.0806 ||| B1: 0.2188

5'st Iteration|||| B0: 0.1188 ||| B1: 0.4101

Just to check for our curiosity if updating the weights indeed gives us a more accurate prediction. We will plug in the second last row value from the data where x=3 and y=2.

#iteration 0

b0=0 and b1=1

Y=0+0*3 -->0#iteration 1

b0=0.01 and b1=0.01

Y=0.01+0.01*3 -->0.04#iteration 2

b0=0.0397

b1=0.0694

Y=0.0397+0.0694*3 -->0.2479#iteration 3

b0=0.066527

b1=0.176708

Y=0.066527+0.176708*3 -->0.596651#iteration 4

b0=0.0806

b1=0.2188

Y=0.0806+0.2188*3 -->0.737#iteration 5

b0=0.1188

b1=0.4101

Y=0.1188+0.4101*3 -->1.3491

As you can see with each iteration the error in the prediction is decreasing and we are getting closer to the predicted value. If we continue this process for say 20 iterations, we will have a more precise prediction. We can apply the gradient descent algorithm using the scikit learn library. It provides us with SGDClassfier and SGDRegressor algorithms. Since this is a Linear Regression tutorial I will show you how to use SGDRegressor to make predictions. The dataset we will be using is the “**Boston House Price Dataset**”.

#importing libraries

from sklearn import linear_model

from sklearn.datasets import load_boston

from sklearn.decomposition import PCA

from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import cross_val_score

from sklearn.metrics import r2_score, mean_squared_error

from sklearn.pipeline import Pipeline

import numpy as np

import pandas as pd#loading the data

X=load_boston()

data=pd.DataFrame(X.data,columns=X.feature_names)

Y=X.target#Creating A Model

pipe=[]

pipe.append(('SC',StandardScaler()))

pipe.append(('PCA',PCA(n_components=8)))

pipe.append(('SGD',linear_model.SGDRegressor(alpha=0.1,learning_rate='adaptive',max_iter=300,penalty='elasticnet')))

model=Pipeline(pipe)#cross validation score

cv_results = cross_val_score(model, data, Y, cv=5)

msg = "%s: %f (%f)" % ('SGDRegressor', cv_results.mean(), cv_results.std())

print(msg)#output:

SGDRegressor:0.492394 (0.260191)#Parameters to tune for SGDRegressorparams={

'alpha':[0.1,0.01,0.001,0.0001,0.00001],

'learning_rate':['constant','optimal','invscaling','adaptive'],

'max_iter':[100,300,600,1000,1200,1500,2000],

'penalty':['l2','l1','elasticnet']

}

According to the cross-validation score, we have an **average mean** of **0.492394** and average **standard deviation** of **0.260191**. This is a satisfactory score but remember we didn't perform any of EDA methods on the data expect for **PCA **and feature Scaling using **StandardScaler**. There you have it guys, I have provided you with the basic information to get you started with SGDRegressor.

Please do give me feedback on how you liked this article. Also if you found any mistakes or think I can make some improvements to the article your opinion is welcome. Thank you for reading.

Links:

MachineLearningMastery:https://machinelearningmastery.com/implement-linear-regression-stochastic-gradient-descent-scratch-python/

Scikit-learn:https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html

Github:https://github.com/nitin689