Linear regression is amongst the simplest supervised learning techniques that you will come across in machine learning. Multiple linear regression follows pretty much the same concept of simple linear regression, however, there is one major difference here – multiple input features as compared to just a single one in simple linear regression.
Today, we will dive into it and see how we can implement multiple linear regression from scratch. Here’s how the blog will be organized:
- · Introduction to Multiple Linear Regression
- · Mathematical Concepts Beneath Multiple Linear Regression
- · Implementation from Scratch
- · Scikit-Learn Implementation
- · Wrap-Up
Note: The notebook and dataset for this tutorial can be found on my GitHub here.
Introduction to Multiple Linear Regression
As already mentioned, the underlying concept is quite similar to simple linear regression. The only difference is that it can handle multiple input features instead of just a single one. That’s the reason why multiple linear regression is often used in practical situations since they often consist of multiple features.
Here are some of the requirements/assumptions of the algorithm:
- · Linear Assumption – the relationship between the input and output variables is linear
- · No Noise – the variables do not contain noise; so highly affected by outliers
- · No Collinearity – overfitting can occur if variables are correlated
- · Normal Distribution – more accurate predictions if variables are normally distributed
- · Rescaled Inputs – scaled inputs are used to make the predictions more accurate
When you train a multiple linear regression model, you are basically trying to figure out the best possible coefficients for the equation of a straight line, that will fit on the given dataset. These variables are calculated iteratively using gradient descent.
Gradient Descent calculates the derivates of each coefficient and iteratively updates them, according to the learning parameters you set. Learning rate determines how big of a step you take towards the optimal value. So, a lower learning rate could make the process slower while a higher one may be prone to miss best values.
Mathematical Concepts Beneath Multiple Linear Regression
The mathematical equation we’re dealing with here is quite simple, merely the equation of a line. However, the process might not be as simple.
Now is when it gets a little complicated. There is no single coefficient for the slope, instead, there’s an entire matrix which we denote as w – stands for weights. A single intercept value exists which is denoted as b – stands for bias.
The next in turn comes the cost function. This a major player in the game and we use it to measure the error – that we aim to minimize as much as possible. While you can use other cost functions as well, mean squared error is the most common which we’ll be using here.
The formula for the cost function is:
Simply put, it shows us the average squared difference between the actual y values and the predicted y values. Further expanding the predicted y, we get:
Now is the time to get to the Gradient Descent I mentioned before. Well, it uses partial derivates for each parameter to find optimal weights and bias. Here’s how we can find derived mean squared error formulas, with respect to each parameter:
Now, we need to update the values based on the derivatives we calculated. You just have to subtract the old weight values from the product of the learning rate and the derivative calculated. These two summarized formulas would be enough for it.
In the equation above, the alpha parameter represents the learning rate. That’s all, now it’s time to implement all this math using Python and see how it turns out.
Implementation from Scratch
Before we go any further, let’s import the necessary libraries to handle the data and visualizations that we’ll need later.
That’s all set. Now we need to import the dataset. I’ve chosen the Google Stock Pricing dataset for this tutorial to make it a practical use case.
As you might be able to tell, the data is not fully ready to be trained so I will make a few changes including minor preprocessing. If you want to know more about preprocessing and it’s importance, make sure you check my blog on that out here.
I’ll set the date as in the index and made a new column that shows the percentage change in the stock prices using the closing and opening prices. We also need to define the feature and target arrays and take care of the missing values in the dataset. Lastly, I will preprocess the values using scikit-learn’s built-in library.
Let’s name the feature array as X and the target variables as y.
The last step here is to split the data into training and testing arrays, so we can later test the performance of our model. Let’s get done with that as well.
That’s it! We’re done with preparing the data and ready to move forward to coding our algorithm. If you understood the math, I explained above, coding the algorithm should be pretty straightforward for you. Let’s make a class called LinearRegression using all the formulas we discussed.
We will use the following methods to implement the class:
__init__() – the constructor for our class that will contain the learning rate, number of iterations, weights, and the bias.
_mean_squared_error (y, y_hat) – a method used for cost function.
fit (X, y) – function to implement gradient descent in order to optimize weights and bias.
Predict (X) – the function that applies the line equation to predict given input.
Now, let’s create an instance of the class and see how our algorithm performs on the data we prepared earlier.
Let’s see how the predictions look like.
The list goes on… These are actually the optimized weights calculated by our algorithm.
This is how the bias looks like:
Let’s plot the loss and see how well we have trained the parameters and optimized them.
This is quite an ideal scenario since the loss starts at a high value but with the increasing number of iterations, it drops at a very low value. However, we still don’t know if we’re achieving the lowest loss possible since different learning rates have different losses.
So, to find the most efficient learning rate, we’ll plot the losses of different models that have varying learning rates. Let’s see.
The graph makes it evident that a learning rate of 0.5 works best in this case. So, let’s retrain the model using a learning rate of 0.5.
The MSE value for the testing dataset is:
This marks the end of our model building, training, and evaluation, all from scratch without using any external library. Now, it’s time to compare our implementation from scikit-learn and see what’s the difference.
To know how good our model, in reality, is, we’ll train a similar model on the same dataset, but this time using sklearn. The results will tell us how good we were. Let’s get to it.
We’ve seen how easy it is to build a multiple linear regression model from scratch in Python. If you understand the underlying math, I’d say it won’t take you more than a couple of hours at max. However, multiple linear regression is a very simple one and this doesn’t mean you should code every algorithm on your own.
Well, even if something is easy to build yourself, I wouldn’t suggest building it from scratch every time. Rather, you should use the available libraries and save time, since the performance is quite comparable. You witnessed how it took a few lines of code to build the model using scikit-learn, right?
However, you should know the basics of every algorithm you’re using as they often come in handy when you’re trying to optimize models, which is a vital quality of a good data scientist.
The next articles on this series of algorithms from scratch in Python can be found here.
Ernesto.Net is a Top corporate IT Training, and consulting company in Fort Lauderdale, Florida which works on all major Data Science and Blockchain requirements. We specialize in Data Science, Blockchain, and Big Data.
Our clientele includes Multiple Fortune 500 companies we have been featured on NASDAQ, ABC, NBC, CBS, and Fox.
Visit our official website https://ernesto.net for more information.