![](https://static.wixstatic.com/media/9a3369_0ffb3d22a65745729e86ba05ffc79665~mv2.jpg/v1/fill/w_980,h_465,al_c,q_85,usm_0.66_1.00_0.01,enc_auto/9a3369_0ffb3d22a65745729e86ba05ffc79665~mv2.jpg)
Here are the links to my previous articles on ML series
Part I
Part II
Part III
Part IV
Part V
We know how to implement a Simple Linear Regression algorithm in Python & Octave using vectorized as well as normal multiplication methods.
The advantage of using a vectorized approach is that you don't need to change much of your code when you implement the same for multiple variables.
Let us see how to implement a Multivariant Linear regression algorithm in Octave. The below example is for two input features, however, the below code will work for any number of input features except the 3D scatter graph one.
Simple Linear Regression Example (Multiple Input Variable)
Here is the link to our dataset containing housing price data
Here is a snapshot of our data
2104,3,399900
1600,3,329900
2400,3,369000
1416,2,232000
3000,4,539900
1985,4,299900
1534,3,314900
1427,3,198999
We have a comma-separated dataset with sizes in Sq. Ft. in the first column and the number of bedrooms in the second. The third column contains the price of the house in $.
Here are the steps our algorithm will be using
Algorithm Steps
Load data
Visualize data
Feature Normalization - New
Write Cost Function
Run Gradient Descent
Plot Graphs
Run Predictions
1. Load Data
function [] = multi()
% Load Data which has size, no of bedrooms, prices
data = load('../data/housingprice.txt');
% Transfer Data in x & y variables
totalColumns = size(data,2);
X = data(:,1: totalColumns-1);
y = data(:, totalColumns);
m = size(X,1) % Total Rows
end
The code to load data remains the same with slight modification in order to make it more generic. This generic code first calculates a total number of columns and then loads all the columns except the last one in variable matrix X and loads the last column in vector y.
2. Visualize the data
Let us try to visualize the data. Now we have two feature variables and one output variable, we can plot our data on a scatter 3D graph. If you have more input features, you may need to find different ways to visualize your data, however, the below code works for only two input features with one output variable.
function plotDataGraph(input,output)
scatter3(input(:,1),input(:,2),output, 20,output,'s');
xlabel('Size (Sq. Ft.)')
ylabel('Bedrooms')
zlabel('Price')
end
function [] = multi()
% Load Data which has size, no of bedrooms, prices
data = load('../data/housingprice.txt');
% Transfer Data in x & y variables
totalColumns = size(data,2);
X = data(:,1: totalColumns-1);
y = data(:, totalColumns);
m = size(X,1) % Total Rows
plotDataGraph(X,y);
end
After running the above code, the following graph will be displayed. We have X1 on one axis and X2 on the other and the third axis representing housing price.
![](https://static.wixstatic.com/media/9a3369_f7043204ec204bffa459400671695104~mv2.png/v1/fill/w_573,h_418,al_c,q_85,enc_auto/9a3369_f7043204ec204bffa459400671695104~mv2.png)
3. Feature Normalization
The input features available with us are on different scales, for example, area ranges from 600-3000 and the number of bedroom ranges from 1-10.
As discussed in my last article, we need to make sure that our data belongs to the same scale in order to run gradient descent efficiently. Therefore we need to implement feature normalization in order to bring down area and bedrooms on a similar scale.
We will be using the mean normalization method to normalize our data, here is the octave code to do the same
function [x_normalized, mu, sigma] = NormaliseFeature(X)
mu = mean(X);
sigma= std(X);
x_normalized = X;
x_normalized = (x_normalized-mu)./sigma;
end
We have used inbuilt functions like mean and standard deviation in order to perform feature normalization. The above function takes any number of features and returns a normalized version of all the features. We will also return mu and sigma in order to use them in the future. We will discuss the same in the latter part of this article.
Below is a snapshot of how normalized features looks like
1.3001e-01 -2.2368e-01
-5.0419e-01 -2.2368e-01
5.0248e-01 -2.2368e-01
-7.3572e-01 -1.5378e+00
1.2575e+00 1.0904e+00
Now both of our input features are on a similar scale. We can go ahead with implementing the cost function.
3. Cost Function
The vectorized version of the cost function we implemented in our last example remains unchanged. Here is the code
function J = calculateCost(input, output, theta)
m = length(output);
J = 0;
hypothesis_vector= zeros(m,1);
hypothesis_vector= input*theta';
J = sum((hypothesis_vector-output).^2)/(2*m);
end
The cost is the sum of all the square root functions of the difference between our hypothesis and actual output value divided by 2*m.
Let us call this cost function from our main function
function [] = multi()
% Load Data which has size, no of bedrooms, prices
data = load('../data/housingprice.txt');
% Transfer Data in x & y variables
totalColumns = size(data,2);
X = data(:,1: totalColumns-1);
y = data(:, totalColumns);
m = size(X,1) % Total Rows
plotDataGraph(X,y);
% Normalise Features
[X,mu, sigma] = NormaliseFeature(X);
% Add Column 1 For Vectorized Multiplication
X = [ones(m, 1) X]
% Thetas Array = Total Rows X Number Of Features
totalFeatures = size(X,2);
theta = zeros(1,totalFeatures);
% Calculat Cost
J = calculateCost(X,y,theta);
end
You should expect an output of 6.5592e+10.
4. Run Gradient Descent (GD)
To come up with the optimum value of theta, we need to run gradient descent using some initial values of theta, alpha, and a number of iterations. Below is the code for our gradient descent which is again the same as the vectorized version of simple linear regression
function [theta,J_history] = runGradientDescent(input, output,
theta, alpha, iterations)
m = length(output);
J_history = zeros(iterations, 1);
% Run Iterations and Update Thetas Simultaneously
for iter = 1:iterations
hypothesis = input * theta' - output;
newX = hypothesis' * input;
theta = theta - ((alpha/m) * newX);
J_history(iter) = calculateCost(input, output, theta);
end
end
Let us call this gradient descent function from our main function
function [] = multi()
% Load Data which has size, no of bedrooms, prices
data = load('../data/housingprice.txt');
% Transfer Data in x & y variables
totalColumns = size(data,2);
X = data(:,1: totalColumns-1);
y = data(:, totalColumns);
m = size(X,1); % Total Rows
plotDataGraph(X,y);
% Normalise Features
[X,mu, sigma] = NormaliseFeature(X);
% Add Column 1 For Vectorized Multiplication
X = [ones(m, 1) X];
% Thetas Array = Total Rows X Number Of Features
totalFeatures = size(X,2);
theta = zeros(1,totalFeatures);
% Calculat Cost
J = calculateCost(X,y,theta)
% Run Gradient Descent
alpha = 0.01
iterations = 400;
[theta,J_history] =
runGradientDescent(X,y,theta,alpha,iterations);
The gradient descent function will return optimum theta values and cost history for four hundred iterations.
We can use this cost history variable to plot a cost graph against the number of iterations to see if our cost is converging to a minimum.
5. Plot Cost Graph
Let us now plot a cost graph with the number of iterations on the x-axis and cost on y
function plotCostGraph(iterations, cost_history)
figure;
plot(1:iterations, cost_history, '-b', 'LineWidth', 2);
xlabel('Number of iterations');
ylabel('J(θ)');
end
The cost graph will produce a graph like below
![](https://static.wixstatic.com/media/9a3369_c75cb5d9d24b40e1854da14aaf963809~mv2.png/v1/fill/w_640,h_480,al_c,q_85,enc_auto/9a3369_c75cb5d9d24b40e1854da14aaf963809~mv2.png)
As we see, the cost is going down with the number of iterations and is converging at a minimum value with very very minimal effect going forward. This means that our iterations are good enough to go and we have received optimum theta values.
6. Run Predictions
Now we have received optimal theta values, our model is ready to perform predictions on new input examples. Let us write a simple function to perform predictions for our new set of data
function [prediction] = predictProfit(input, theta)
inputCount = size(input,1);
featureCount = size(input,2);
prediction = [ones(inputCount,1),input] * theta';
end
The prediction code is almost the same as our hypothesis code, just a slight change to add an additional column in input data for Theta0. Now it's time to run predictions on our model.
If you remember our feature normalization function, we return mu and sigma values which correspond to mean and stand deviations respectively on the input data set. Now before running predictions on a new set of data, we need to normalize this data as well. We will therefore be using the same values of mu and sigma so that it comes down to the same scale as the input one
% Predit Now
prediction_inputs = [1268 3;1000 2 ; 3000 5; 2000 8] % Area &
prediction_inputs_normalized =
NormalizeInput(prediction_inputs,mu,sigma);
prediction_housing_price =
predictProfit(prediction_inputs_normalized,theta)
You should expect the following housing prices
2.4120e+05
2.0262e+05
4.6899e+05
3.5753e+05
Here is the entire source for the above implementation
In my next article, we will implement the same using Python.
Happy Reading...
Comentários