Machine Learning Simplified - Part III (Simple Linear Regression in Octave)

Sachin Tah

Dec 4, 20206 min read

Updated: Dec 5, 2020

In my previous article, I discussed how to minimize your cost function using Gradient Descent. Please follow the below links to refer to the previous articles from this series

Part I

https://www.sachintah.com/post/machine-learning-simplified-part-i

Part II

https://www.sachintah.com/post/machine-learning-simplified-part-ii

In this article, we will try to implement a simple linear regression example first in Octave and then in Python. The reason for having a look at Octave is that it will help us in understanding the code of ML algorithm by implementing it using our own code.

Introduction to Octave

Please follow this link to know more about Octave programming language.

https://www.sachintah.com/post/octave-basics

Simple Linear Regression Example (One Variable)

I will be using the same example provided by Andrew NG in his course. So would like to thank him for the same.

Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities. We have a dataset containing population and profit in different cities. Here is our data file

Here is a snapshot of our data

6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483

So we have a comma-separated dataset with Population (in 10,000s) in the first column and Profit ($ 10,000s) in the second column. Our first step should be to load this data into our program and try to visualize its contents. Here are the steps we will be performing to create our first ML algorithm using Linear Regression for one variable.

Algorithm Steps

Load data in variables
Visualize the data
Write a Cost function
Run Gradient descent for some iterations to come up with values of theta0 and theta1
Plot your hypothesis function to see if it crosses most of the data
Run predictions and see results

Octave Implementation

1. Load data in variables

Octave has the option of directly loading file data into a matrix with just one simple command. Once loaded we will print the data in the console window by just writing the name of the variable.

function [] = LinearRegression()
    % Load Data
    data = load('franchisedata.txt');
    data
end

Data will be loaded in 2 x 2 matrix. Now we need to separate data into x and y variables. x being input, y output, and m being total examples in the dataset.

function [] = LinearRegression()
    % Load Data
    data = load('franchisedata.txt');
    % Transfer Data in x & y variables
    x = data(:,1);
    y = data(:,2);
    m = size(x,1);
end

2. Visualize the data

Let us visualize this data by plotting a graph. With Octave, this can be done with just one line of code.

function [] = LinearRegression()
    % Load Data
    data = load('franchisedata.txt');
    
    % Transfer Data in x & y variables
    x = data(:,1);
    y = data(:,2);
    m = size(x,1);
    
    % Plot Your Data
    plot(x, y, 'bs', 'MarkerSize', 5);
    ylabel('Profit in $10,000s');
    xlabel('Population of City in 10,000s');    
end

After running this code, the following graph will be displayed

Now we need to find a hypothesis function that will draw a straight line that will be best for our data.

3. Write a Cost function

Our next task is to write a cost function so that we can use the same while performing Gradient descent and store results for later use. Before we calculate cost we need to first find h(θ).

h(θ) = theta0 + theta1 * x1;

We need to calculate the hypothesis for all values of x and sum it up to come up with a cost value for theta0 and theta1.

This can be done using a normal method or using a vectorized method where we will operate directly on the matrix. To understand better, let us first implement the cost function using the normal method. Also, remember that in Octave matrix indices starts from 1, not 0


function J = calculateCost(input, output, theta)
    m = length(output);
    J = 0; cost = 0;
    for(i=1:m)
        hypothesis= zeros(20,1);
        hypothesis(i) = theta(1) + theta(2)*input(i);  
        cost = (hypothesis(i) - output(i))^2;
        J = J + cost;
    end
    J = J/(2*m)
end

our main function will look like this


function [] = LinearRegression()
    % Load Data
    data = load('franchisedata.txt');
    
    % Transfer Data in x & y variables
    x = data(:,1);
    y = data(:,2);
    m = size(x,1);
    theta = zeros(1, 2);
    
    % Plot Your Data
    plot(x, y, 'bs', 'MarkerSize', 5);
    ylabel('Profit in $10,000s');
    xlabel('Population of City in 10,000s');    
    
    % Calculate Cost Normally
    J1 = calculateCost(x,y,theta); % Cost Value 32.073
end

In order to calculate the cost using the vectorized method, we need to change our function a little bit

h(θ) = theta0*x0 + theta1 * x1;

Considering x0 =1, our above function will result in the same value as the original one. x0 is a new feature we added to our data set with the value 1. Let us modify our dataset

X = [ones(m,1),x];

This will add one more column to our input dataset x, now our input dataset will look like

X = [ 1 6.1101 1 5.5277 1 8.5186 1 7.0032 1 5.8598 1 8.3829 1 7.4764

]

Now calculating hypothesis is simple, just multiple X with the transpose of theta.

h(θ) = x * θ';

You can try out solving this equation in order to understand how this is calculating the values of the entire hypothesis matrix in one go. Our cost function will look like


function J = calculateCostVectorize(input, output, theta)
  m = length(output);
  J = 0;
  hypothesis_vector= zeros(20,1); % Initialize to 0
  hypothesis_vector= input*theta'; 
  % Matrix Operation to calculate total Cost
  J = sum((hypothesis_vector - output).^2)/(2*m);   
end

Let us modify the main function to call this cost function


function [] = LinearRegression()
    % Load Data
    data = load('franchisedata.txt');
    
    % Transfer Data in x & y variables
    x = data(:,1);
    y = data(:,2);
    m = size(x,1);
    theta = zeros(1, 2);
    
    % Plot Your Data
    plot(x, y, 'bs', 'MarkerSize', 5);
    ylabel('Profit in $10,000s');
    xlabel('Population of City in 10,000s');    
    
    % Calculate Cost Normally
    J1 = calculateCost(x,y,theta); % Cost Value 32.073
    % Calculate Cost Using Matrized Method
    X = [ones(m,1),x]; % New Input with extra feature
    J2 = calculateCostVectorize(X,y,theta);
end

I am using both the functions so that you can understand the underlying steps properly.

4. Run Gradient Descent (GD)

To come up with the right values of theta0 and theta1 which provide minimum cost, we need to run a gradient descent algorithm for some number of iterations. Below is our gradient descent algorithm for calculating theta values

We need to start with some initial values of theta and alpha and run gradient descent. Let us consider our initial values as


  theta = zeros(1, 2); % theta = [0 0]
  iterations = 1500;
  alpha = 0.01;

Our GD function will look like


 function [theta, costs] = runGradientDescent(input, output, 
                           theta, alpha, iterations)
 
    m = length(output); 
    costs = zeros(iterations, 1);

    % Run Gradient Descent For 1500 iterations
    for iter = 1:iterations
        % Initialize h(θ)
      	hypothesis= zeros(m,1);
        % Calculate h(θ) with new Theta Value
        hypothesis= input*theta' - output;
        % Calculate Theta0 
        theta(1) = theta(1) - alpha * (1/m) * sum(hypothesis);
        % Calculate Theta1
        theta(2) = theta(2) - alpha * (1/m) * sum(((hypothesis)'          
                                                 *input(:,2)));
        % Store All Cost in Array For Later Use
 	costs(iter) = calculateCostVectorize(input,output,theta);        
    end
end

Running above GD function for 1500 iterations, theta comes out to be


theta =
  -3.6303   1.1664

5. Plot graph of Hypothesis function

Finally, we have received our theta values, let us calculate the hypothesis with these values and try to plot the hypothesis on our existing graph to see if it fits well on our data.


function [] = LinearRegression()
    % Load Data
    data = load('franchisedata.txt');
    
    % Transfer Data in x & y variables
    x = data(:,1);
    y = data(:,2);
    m = size(x,1);
    theta = zeros(1, 2);
    
    % Plot Your Data
    plot(x, y, 'bs', 'MarkerSize', 5);
    ylabel('Profit in $10,000s');
    xlabel('Population of City in 10,000s');    
    
    % Calculate Cost 
    X = [ones(m,1),x]; % New Input with extra feature
    J = calculateCostVectorize(X,y,theta);
    
    % Initialize GD Constants
    iterations = 1500;
    alpha = 0.01;
    theta = runGradientDescent(X, y, theta, alpha, iterations);    
    
    % Plot Hypothesis Graph with New theta values
    hold on % This will not clear the original plot
    hypothesis= X * theta'; 
    plot(x, hypothesis, 'r-');
end

The following graph will be displayed

Looks like our function is able to converge with 1500 iterations. Practically you may need to run GD multiple times in order to see how it converges.

6. Run predictions & see results

Let us add one simple function to predict profit values with the population as input. Prediction is nothing but our hypothesis function with calculated values of theta.


function[prediction] = predictProfit(input, theta)
  prediction = [1,input] * theta';
  prediction = prediction * 10000;
  % Draw This as a Point on The Graph
 plot(input, prediction/10000, 'rs-', 'MarkerSize', 5);
end

our main function will look like this


function [] = LinearRegression()
    % Load Data
    data = load('franchisedata.txt');
    
    % Transfer Data in x & y variables
    x = data(:,1);
    y = data(:,2);
    m = size(x,1);
    theta = zeros(1, 2);
    
    % Plot Your Data
    plot(x, y, 'bs', 'MarkerSize', 5);
    ylabel('Profit in $10,000s');
    xlabel('Population of City in 10,000s');    
    
    % Calculate Cost 
    X = [ones(m,1),x]; % New Input with extra feature
    J = calculateCostVectorize(X,y,theta);
    
    % Initialize GD Constants
    iterations = 1500;
    alpha = 0.01;
    theta = runGradientDescent(X, y, theta, alpha, iterations);    
    
    % Plot Hypothesis Graph with New theta values
    hold on % This will not clear the original plot
    hypothesis= X * theta'; 
    plot(x, hypothesis, 'r-');
    
    % Predit Values Now using Model Theta Values
    population_one = 3.5;
    profit_one = predictProfit(population_one,theta);
 
    population_two = 12;
    profit_two = predictProfit(population_two,theta);
end

Here is the entire source code of the above program

In my next article, I will try to implement the same solution using Python. Subsequent articles will contain how to deal with multiple input variables.

Happy Reading...

Tech talks with sachin tah

Machine Learning Simplified - Part III (Simple Linear Regression in Octave)

Recent Posts