top of page
Search

ML Simplified - Part 6 (Multivariate Octave Example)

Writer's picture: Sachin TahSachin Tah


Here are the links to my previous articles on ML series


Part I

Part II

Part III

Part IV

Part V



We know how to implement a Simple Linear Regression algorithm in Python & Octave using vectorized as well as normal multiplication methods.


The advantage of using a vectorized approach is that you don't need to change much of your code when you implement the same for multiple variables.


Let us see how to implement a Multivariant Linear regression algorithm in Octave. The below example is for two input features, however, the below code will work for any number of input features except the 3D scatter graph one.


Simple Linear Regression Example (Multiple Input Variable)

Here is the link to our dataset containing housing price data



Here is a snapshot of our data



2104,3,399900
1600,3,329900
2400,3,369000
1416,2,232000
3000,4,539900
1985,4,299900
1534,3,314900
1427,3,198999

We have a comma-separated dataset with sizes in Sq. Ft. in the first column and the number of bedrooms in the second. The third column contains the price of the house in $.


Here are the steps our algorithm will be using


Algorithm Steps

  1. Load data

  2. Visualize data

  3. Feature Normalization - New

  4. Write Cost Function

  5. Run Gradient Descent

  6. Plot Graphs

  7. Run Predictions

1. Load Data



function [] = multi()
   % Load Data which has size, no of bedrooms, prices
   data = load('../data/housingprice.txt');
   % Transfer Data in x & y variables
   totalColumns = size(data,2);
   X = data(:,1: totalColumns-1);
   y = data(:, totalColumns);    
   m = size(X,1) % Total Rows 	
end
   

The code to load data remains the same with slight modification in order to make it more generic. This generic code first calculates a total number of columns and then loads all the columns except the last one in variable matrix X and loads the last column in vector y.


2. Visualize the data


Let us try to visualize the data. Now we have two feature variables and one output variable, we can plot our data on a scatter 3D graph. If you have more input features, you may need to find different ways to visualize your data, however, the below code works for only two input features with one output variable.



function plotDataGraph(input,output)
  scatter3(input(:,1),input(:,2),output, 20,output,'s');
  xlabel('Size (Sq. Ft.)')
  ylabel('Bedrooms')
  zlabel('Price')
end
 
function [] = multi()
   % Load Data which has size, no of bedrooms, prices
   data = load('../data/housingprice.txt');
   % Transfer Data in x & y variables
   totalColumns = size(data,2);
   X = data(:,1: totalColumns-1);
   y = data(:, totalColumns);    
   m = size(X,1) % Total Rows 
 
   plotDataGraph(X,y); 
 end
   

After running the above code, the following graph will be displayed. We have X1 on one axis and X2 on the other and the third axis representing housing price.



3. Feature Normalization


The input features available with us are on different scales, for example, area ranges from 600-3000 and the number of bedroom ranges from 1-10.


As discussed in my last article, we need to make sure that our data belongs to the same scale in order to run gradient descent efficiently. Therefore we need to implement feature normalization in order to bring down area and bedrooms on a similar scale.


We will be using the mean normalization method to normalize our data, here is the octave code to do the same



function [x_normalized, mu, sigma] = NormaliseFeature(X)
  mu = mean(X);
  sigma= std(X);
  x_normalized = X;
  x_normalized = (x_normalized-mu)./sigma;
end

We have used inbuilt functions like mean and standard deviation in order to perform feature normalization. The above function takes any number of features and returns a normalized version of all the features. We will also return mu and sigma in order to use them in the future. We will discuss the same in the latter part of this article.

Below is a snapshot of how normalized features looks like



   1.3001e-01  -2.2368e-01
  -5.0419e-01  -2.2368e-01
   5.0248e-01  -2.2368e-01
  -7.3572e-01  -1.5378e+00
   1.2575e+00   1.0904e+00
  

Now both of our input features are on a similar scale. We can go ahead with implementing the cost function.


3. Cost Function

The vectorized version of the cost function we implemented in our last example remains unchanged. Here is the code



function J = calculateCost(input, output, theta)
    m = length(output);
    J = 0;
    hypothesis_vector= zeros(m,1);
    hypothesis_vector= input*theta';
    J = sum((hypothesis_vector-output).^2)/(2*m);  
end
  

The cost is the sum of all the square root functions of the difference between our hypothesis and actual output value divided by 2*m.


Let us call this cost function from our main function



 function [] = multi()
    % Load Data which has size, no of bedrooms, prices
    data = load('../data/housingprice.txt');
    % Transfer Data in x & y variables
    totalColumns = size(data,2);
    X = data(:,1: totalColumns-1);
    y = data(:, totalColumns);    
    m = size(X,1) % Total Rows 
 
    plotDataGraph(X,y); 
    % Normalise Features
    [X,mu, sigma] = NormaliseFeature(X);
 
    % Add Column 1 For Vectorized Multiplication
    X = [ones(m, 1) X]
 
    % Thetas Array = Total Rows X Number Of Features
    totalFeatures = size(X,2);    
    theta = zeros(1,totalFeatures);
 
    % Calculat Cost
    J = calculateCost(X,y,theta);
 end
 

You should expect an output of 6.5592e+10.


4. Run Gradient Descent (GD)


To come up with the optimum value of theta, we need to run gradient descent using some initial values of theta, alpha, and a number of iterations. Below is the code for our gradient descent which is again the same as the vectorized version of simple linear regression



function [theta,J_history] = runGradientDescent(input, output, 
                            theta, alpha, iterations)
 
   m = length(output); 
   J_history = zeros(iterations, 1);  
   
   % Run Iterations and Update Thetas Simultaneously
  for iter = 1:iterations 
    hypothesis = input * theta' - output;
    newX = hypothesis' * input;
    theta = theta - ((alpha/m) * newX);
    J_history(iter) = calculateCost(input, output, theta);
  end
end

Let us call this gradient descent function from our main function



    
function [] = multi()
    % Load Data which has size, no of bedrooms, prices
    data = load('../data/housingprice.txt');
     % Transfer Data in x & y variables
     totalColumns = size(data,2);
     X = data(:,1: totalColumns-1);
     y = data(:, totalColumns);    
     m = size(X,1); % Total Rows 
 
     plotDataGraph(X,y); 
     % Normalise Features
    [X,mu, sigma] = NormaliseFeature(X);
 
     % Add Column 1 For Vectorized Multiplication
     X = [ones(m, 1) X];
 
     % Thetas Array = Total Rows X Number Of Features
     totalFeatures = size(X,2);    
     theta = zeros(1,totalFeatures);
 
     % Calculat Cost
     J = calculateCost(X,y,theta)
 
     % Run Gradient Descent
     alpha = 0.01
     iterations = 400;

    [theta,J_history] = 
                runGradientDescent(X,y,theta,alpha,iterations);
    

The gradient descent function will return optimum theta values and cost history for four hundred iterations.


We can use this cost history variable to plot a cost graph against the number of iterations to see if our cost is converging to a minimum.


5. Plot Cost Graph


Let us now plot a cost graph with the number of iterations on the x-axis and cost on y



function plotCostGraph(iterations, cost_history)
    figure;
    plot(1:iterations, cost_history, '-b', 'LineWidth', 2);
    xlabel('Number of iterations');
    ylabel('J(θ)');
end 
    


The cost graph will produce a graph like below



As we see, the cost is going down with the number of iterations and is converging at a minimum value with very very minimal effect going forward. This means that our iterations are good enough to go and we have received optimum theta values.


6. Run Predictions


Now we have received optimal theta values, our model is ready to perform predictions on new input examples. Let us write a simple function to perform predictions for our new set of data



function [prediction] = predictProfit(input, theta)
    inputCount = size(input,1);
    featureCount = size(input,2);
    prediction = [ones(inputCount,1),input] * theta';
end

The prediction code is almost the same as our hypothesis code, just a slight change to add an additional column in input data for Theta0. Now it's time to run predictions on our model.


If you remember our feature normalization function, we return mu and sigma values which correspond to mean and stand deviations respectively on the input data set. Now before running predictions on a new set of data, we need to normalize this data as well. We will therefore be using the same values of mu and sigma so that it comes down to the same scale as the input one



 % Predit Now
 prediction_inputs = [1268 3;1000 2 ; 3000 5; 2000 8]    % Area & 
 prediction_inputs_normalized =    
                 NormalizeInput(prediction_inputs,mu,sigma);
 prediction_housing_price =                                 
                predictProfit(prediction_inputs_normalized,theta)

You should expect the following housing prices



   	2.4120e+05
	2.0262e+05
   	4.6899e+05
   	3.5753e+05
   

Here is the entire source for the above implementation



In my next article, we will implement the same using Python.


Happy Reading...





397 views0 comments

Recent Posts

See All

Comentários


bottom of page