ML Simplified - Part 6 (Multivariate Octave Example)

Sachin Tah

Dec 15, 20205 min read

Here are the links to my previous articles on ML series

Part I

https://www.sachintah.com/post/machine-learning-simplified-part-i

Part II

https://www.sachintah.com/post/machine-learning-simplified-part-ii

Part III

https://www.sachintah.com/post/machinelearning_linear_regression_octave

Part IV

https://www.sachintah.com/post/machine-learning-simplified-part-iv-simple-linear-regression-in-python

Part V

https://www.sachintah.com/post/machine-learning-part-v-multivariate-regression

We know how to implement a Simple Linear Regression algorithm in Python & Octave using vectorized as well as normal multiplication methods.

The advantage of using a vectorized approach is that you don't need to change much of your code when you implement the same for multiple variables.

Let us see how to implement a Multivariant Linear regression algorithm in Octave. The below example is for two input features, however, the below code will work for any number of input features except the 3D scatter graph one.

Simple Linear Regression Example (Multiple Input Variable)

Here is the link to our dataset containing housing price data

Here is a snapshot of our data


2104,3,399900
1600,3,329900
2400,3,369000
1416,2,232000
3000,4,539900
1985,4,299900
1534,3,314900
1427,3,198999

We have a comma-separated dataset with sizes in Sq. Ft. in the first column and the number of bedrooms in the second. The third column contains the price of the house in $.

Here are the steps our algorithm will be using

Algorithm Steps

Load data
Visualize data
Feature Normalization - New
Write Cost Function
Run Gradient Descent
Plot Graphs
Run Predictions

1. Load Data


function [] = multi()
   % Load Data which has size, no of bedrooms, prices
   data = load('../data/housingprice.txt');
   % Transfer Data in x & y variables
   totalColumns = size(data,2);
   X = data(:,1: totalColumns-1);
   y = data(:, totalColumns);    
   m = size(X,1) % Total Rows 	
end

The code to load data remains the same with slight modification in order to make it more generic. This generic code first calculates a total number of columns and then loads all the columns except the last one in variable matrix X and loads the last column in vector y.

2. Visualize the data

Let us try to visualize the data. Now we have two feature variables and one output variable, we can plot our data on a scatter 3D graph. If you have more input features, you may need to find different ways to visualize your data, however, the below code works for only two input features with one output variable.


function plotDataGraph(input,output)
  scatter3(input(:,1),input(:,2),output, 20,output,'s');
  xlabel('Size (Sq. Ft.)')
  ylabel('Bedrooms')
  zlabel('Price')
end
 
function [] = multi()
   % Load Data which has size, no of bedrooms, prices
   data = load('../data/housingprice.txt');
   % Transfer Data in x & y variables
   totalColumns = size(data,2);
   X = data(:,1: totalColumns-1);
   y = data(:, totalColumns);    
   m = size(X,1) % Total Rows 
 
   plotDataGraph(X,y); 
 end

After running the above code, the following graph will be displayed. We have X1 on one axis and X2 on the other and the third axis representing housing price.

3. Feature Normalization

The input features available with us are on different scales, for example, area ranges from 600-3000 and the number of bedroom ranges from 1-10.

As discussed in my last article, we need to make sure that our data belongs to the same scale in order to run gradient descent efficiently. Therefore we need to implement feature normalization in order to bring down area and bedrooms on a similar scale.

We will be using the mean normalization method to normalize our data, here is the octave code to do the same


function [x_normalized, mu, sigma] = NormaliseFeature(X)
  mu = mean(X);
  sigma= std(X);
  x_normalized = X;
  x_normalized = (x_normalized-mu)./sigma;
end

We have used inbuilt functions like mean and standard deviation in order to perform feature normalization. The above function takes any number of features and returns a normalized version of all the features. We will also return mu and sigma in order to use them in the future. We will discuss the same in the latter part of this article.

Below is a snapshot of how normalized features looks like


   1.3001e-01  -2.2368e-01
  -5.0419e-01  -2.2368e-01
   5.0248e-01  -2.2368e-01
  -7.3572e-01  -1.5378e+00
   1.2575e+00   1.0904e+00

Now both of our input features are on a similar scale. We can go ahead with implementing the cost function.

3. Cost Function

The vectorized version of the cost function we implemented in our last example remains unchanged. Here is the code


function J = calculateCost(input, output, theta)
    m = length(output);
    J = 0;
    hypothesis_vector= zeros(m,1);
    hypothesis_vector= input*theta';
    J = sum((hypothesis_vector-output).^2)/(2*m);  
end

The cost is the sum of all the square root functions of the difference between our hypothesis and actual output value divided by 2*m.

Let us call this cost function from our main function


 function [] = multi()
    % Load Data which has size, no of bedrooms, prices
    data = load('../data/housingprice.txt');
    % Transfer Data in x & y variables
    totalColumns = size(data,2);
    X = data(:,1: totalColumns-1);
    y = data(:, totalColumns);    
    m = size(X,1) % Total Rows 
 
    plotDataGraph(X,y); 
    % Normalise Features
    [X,mu, sigma] = NormaliseFeature(X);
 
    % Add Column 1 For Vectorized Multiplication
    X = [ones(m, 1) X]
 
    % Thetas Array = Total Rows X Number Of Features
    totalFeatures = size(X,2);    
    theta = zeros(1,totalFeatures);
 
    % Calculat Cost
    J = calculateCost(X,y,theta);
 end

You should expect an output of 6.5592e+10.

4. Run Gradient Descent (GD)

To come up with the optimum value of theta, we need to run gradient descent using some initial values of theta, alpha, and a number of iterations. Below is the code for our gradient descent which is again the same as the vectorized version of simple linear regression


function [theta,J_history] = runGradientDescent(input, output, 
                            theta, alpha, iterations)
 
   m = length(output); 
   J_history = zeros(iterations, 1);  
   
   % Run Iterations and Update Thetas Simultaneously
  for iter = 1:iterations 
    hypothesis = input * theta' - output;
    newX = hypothesis' * input;
    theta = theta - ((alpha/m) * newX);
    J_history(iter) = calculateCost(input, output, theta);
  end
end

Let us call this gradient descent function from our main function


    
function [] = multi()
    % Load Data which has size, no of bedrooms, prices
    data = load('../data/housingprice.txt');
     % Transfer Data in x & y variables
     totalColumns = size(data,2);
     X = data(:,1: totalColumns-1);
     y = data(:, totalColumns);    
     m = size(X,1); % Total Rows 
 
     plotDataGraph(X,y); 
     % Normalise Features
    [X,mu, sigma] = NormaliseFeature(X);
 
     % Add Column 1 For Vectorized Multiplication
     X = [ones(m, 1) X];
 
     % Thetas Array = Total Rows X Number Of Features
     totalFeatures = size(X,2);    
     theta = zeros(1,totalFeatures);
 
     % Calculat Cost
     J = calculateCost(X,y,theta)
 
     % Run Gradient Descent
     alpha = 0.01
     iterations = 400;

    [theta,J_history] = 
                runGradientDescent(X,y,theta,alpha,iterations);

The gradient descent function will return optimum theta values and cost history for four hundred iterations.

We can use this cost history variable to plot a cost graph against the number of iterations to see if our cost is converging to a minimum.

5. Plot Cost Graph

Let us now plot a cost graph with the number of iterations on the x-axis and cost on y


function plotCostGraph(iterations, cost_history)
    figure;
    plot(1:iterations, cost_history, '-b', 'LineWidth', 2);
    xlabel('Number of iterations');
    ylabel('J(θ)');
end

The cost graph will produce a graph like below

As we see, the cost is going down with the number of iterations and is converging at a minimum value with very very minimal effect going forward. This means that our iterations are good enough to go and we have received optimum theta values.

6. Run Predictions

Now we have received optimal theta values, our model is ready to perform predictions on new input examples. Let us write a simple function to perform predictions for our new set of data


function [prediction] = predictProfit(input, theta)
    inputCount = size(input,1);
    featureCount = size(input,2);
    prediction = [ones(inputCount,1),input] * theta';
end

The prediction code is almost the same as our hypothesis code, just a slight change to add an additional column in input data for Theta0. Now it's time to run predictions on our model.

If you remember our feature normalization function, we return mu and sigma values which correspond to mean and stand deviations respectively on the input data set. Now before running predictions on a new set of data, we need to normalize this data as well. We will therefore be using the same values of mu and sigma so that it comes down to the same scale as the input one


 % Predit Now
 prediction_inputs = [1268 3;1000 2 ; 3000 5; 2000 8]    % Area & 
 prediction_inputs_normalized =    
                 NormalizeInput(prediction_inputs,mu,sigma);
 prediction_housing_price =                                 
                predictProfit(prediction_inputs_normalized,theta)

You should expect the following housing prices


   	2.4120e+05
	2.0262e+05
   	4.6899e+05
   	3.5753e+05

Here is the entire source for the above implementation

In my next article, we will implement the same using Python.

Happy Reading...

Tech talks with sachin tah

ML Simplified - Part 6 (Multivariate Octave Example)

Recent Posts

Comentários