In my previous article, I discussed how to minimize your cost function using Gradient Descent. Please follow the below links to refer to the previous articles from this series
Part I
Part II
In this article, we will try to implement a simple linear regression example first in Octave and then in Python. The reason for having a look at Octave is that it will help us in understanding the code of ML algorithm by implementing it using our own code.
Introduction to Octave
Please follow this link to know more about Octave programming language.
Simple Linear Regression Example (One Variable)
I will be using the same example provided by Andrew NG in his course. So would like to thank him for the same.
Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities. We have a dataset containing population and profit in different cities. Here is our data file
Here is a snapshot of our data
6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483
So we have a comma-separated dataset with Population (in 10,000s) in the first column and Profit ($ 10,000s) in the second column. Our first step should be to load this data into our program and try to visualize its contents. Here are the steps we will be performing to create our first ML algorithm using Linear Regression for one variable.
Algorithm Steps
Load data in variables
Visualize the data
Write a Cost function
Run Gradient descent for some iterations to come up with values of theta0 and theta1
Plot your hypothesis function to see if it crosses most of the data
Run predictions and see results
Octave Implementation
1. Load data in variables
Octave has the option of directly loading file data into a matrix with just one simple command. Once loaded we will print the data in the console window by just writing the name of the variable.
function [] = LinearRegression()
% Load Data
data = load('franchisedata.txt');
data
end
Data will be loaded in 2 x 2 matrix. Now we need to separate data into x and y variables. x being input, y output, and m being total examples in the dataset.
function [] = LinearRegression()
% Load Data
data = load('franchisedata.txt');
% Transfer Data in x & y variables
x = data(:,1);
y = data(:,2);
m = size(x,1);
end
2. Visualize the data
Let us visualize this data by plotting a graph. With Octave, this can be done with just one line of code.
function [] = LinearRegression()
% Load Data
data = load('franchisedata.txt');
% Transfer Data in x & y variables
x = data(:,1);
y = data(:,2);
m = size(x,1);
% Plot Your Data
plot(x, y, 'bs', 'MarkerSize', 5);
ylabel('Profit in $10,000s');
xlabel('Population of City in 10,000s');
end
After running this code, the following graph will be displayed
Now we need to find a hypothesis function that will draw a straight line that will be best for our data.
3. Write a Cost function
Our next task is to write a cost function so that we can use the same while performing Gradient descent and store results for later use. Before we calculate cost we need to first find h(θ).
h(θ) = theta0 + theta1 * x1;
We need to calculate the hypothesis for all values of x and sum it up to come up with a cost value for theta0 and theta1.
This can be done using a normal method or using a vectorized method where we will operate directly on the matrix. To understand better, let us first implement the cost function using the normal method. Also, remember that in Octave matrix indices starts from 1, not 0
function J = calculateCost(input, output, theta)
m = length(output);
J = 0; cost = 0;
for(i=1:m)
hypothesis= zeros(20,1);
hypothesis(i) = theta(1) + theta(2)*input(i);
cost = (hypothesis(i) - output(i))^2;
J = J + cost;
end
J = J/(2*m)
end
our main function will look like this
function [] = LinearRegression()
% Load Data
data = load('franchisedata.txt');
% Transfer Data in x & y variables
x = data(:,1);
y = data(:,2);
m = size(x,1);
theta = zeros(1, 2);
% Plot Your Data
plot(x, y, 'bs', 'MarkerSize', 5);
ylabel('Profit in $10,000s');
xlabel('Population of City in 10,000s');
% Calculate Cost Normally
J1 = calculateCost(x,y,theta); % Cost Value 32.073
end
In order to calculate the cost using the vectorized method, we need to change our function a little bit
h(θ) = theta0*x0 + theta1 * x1;
Considering x0 =1, our above function will result in the same value as the original one. x0 is a new feature we added to our data set with the value 1. Let us modify our dataset
X = [ones(m,1),x];
This will add one more column to our input dataset x, now our input dataset will look like
X = [ 1 6.1101 1 5.5277 1 8.5186 1 7.0032 1 5.8598 1 8.3829 1 7.4764
]
Now calculating hypothesis is simple, just multiple X with the transpose of theta.
h(θ) = x * θ';
You can try out solving this equation in order to understand how this is calculating the values of the entire hypothesis matrix in one go. Our cost function will look like
function J = calculateCostVectorize(input, output, theta)
m = length(output);
J = 0;
hypothesis_vector= zeros(20,1); % Initialize to 0
hypothesis_vector= input*theta';
% Matrix Operation to calculate total Cost
J = sum((hypothesis_vector - output).^2)/(2*m);
end
Let us modify the main function to call this cost function
function [] = LinearRegression()
% Load Data
data = load('franchisedata.txt');
% Transfer Data in x & y variables
x = data(:,1);
y = data(:,2);
m = size(x,1);
theta = zeros(1, 2);
% Plot Your Data
plot(x, y, 'bs', 'MarkerSize', 5);
ylabel('Profit in $10,000s');
xlabel('Population of City in 10,000s');
% Calculate Cost Normally
J1 = calculateCost(x,y,theta); % Cost Value 32.073
% Calculate Cost Using Matrized Method
X = [ones(m,1),x]; % New Input with extra feature
J2 = calculateCostVectorize(X,y,theta);
end
I am using both the functions so that you can understand the underlying steps properly.
4. Run Gradient Descent (GD)
To come up with the right values of theta0 and theta1 which provide minimum cost, we need to run a gradient descent algorithm for some number of iterations. Below is our gradient descent algorithm for calculating theta values
We need to start with some initial values of theta and alpha and run gradient descent. Let us consider our initial values as
theta = zeros(1, 2); % theta = [0 0]
iterations = 1500;
alpha = 0.01;
Our GD function will look like
function [theta, costs] = runGradientDescent(input, output,
theta, alpha, iterations)
m = length(output);
costs = zeros(iterations, 1);
% Run Gradient Descent For 1500 iterations
for iter = 1:iterations
% Initialize h(θ)
hypothesis= zeros(m,1);
% Calculate h(θ) with new Theta Value
hypothesis= input*theta' - output;
% Calculate Theta0
theta(1) = theta(1) - alpha * (1/m) * sum(hypothesis);
% Calculate Theta1
theta(2) = theta(2) - alpha * (1/m) * sum(((hypothesis)'
*input(:,2)));
% Store All Cost in Array For Later Use
costs(iter) = calculateCostVectorize(input,output,theta);
end
end
Running above GD function for 1500 iterations, theta comes out to be
theta =
-3.6303 1.1664
5. Plot graph of Hypothesis function
Finally, we have received our theta values, let us calculate the hypothesis with these values and try to plot the hypothesis on our existing graph to see if it fits well on our data.
function [] = LinearRegression()
% Load Data
data = load('franchisedata.txt');
% Transfer Data in x & y variables
x = data(:,1);
y = data(:,2);
m = size(x,1);
theta = zeros(1, 2);
% Plot Your Data
plot(x, y, 'bs', 'MarkerSize', 5);
ylabel('Profit in $10,000s');
xlabel('Population of City in 10,000s');
% Calculate Cost
X = [ones(m,1),x]; % New Input with extra feature
J = calculateCostVectorize(X,y,theta);
% Initialize GD Constants
iterations = 1500;
alpha = 0.01;
theta = runGradientDescent(X, y, theta, alpha, iterations);
% Plot Hypothesis Graph with New theta values
hold on % This will not clear the original plot
hypothesis= X * theta';
plot(x, hypothesis, 'r-');
end
The following graph will be displayed
Looks like our function is able to converge with 1500 iterations. Practically you may need to run GD multiple times in order to see how it converges.
6. Run predictions & see results
Let us add one simple function to predict profit values with the population as input. Prediction is nothing but our hypothesis function with calculated values of theta.
function[prediction] = predictProfit(input, theta)
prediction = [1,input] * theta';
prediction = prediction * 10000;
% Draw This as a Point on The Graph
plot(input, prediction/10000, 'rs-', 'MarkerSize', 5);
end
our main function will look like this
function [] = LinearRegression()
% Load Data
data = load('franchisedata.txt');
% Transfer Data in x & y variables
x = data(:,1);
y = data(:,2);
m = size(x,1);
theta = zeros(1, 2);
% Plot Your Data
plot(x, y, 'bs', 'MarkerSize', 5);
ylabel('Profit in $10,000s');
xlabel('Population of City in 10,000s');
% Calculate Cost
X = [ones(m,1),x]; % New Input with extra feature
J = calculateCostVectorize(X,y,theta);
% Initialize GD Constants
iterations = 1500;
alpha = 0.01;
theta = runGradientDescent(X, y, theta, alpha, iterations);
% Plot Hypothesis Graph with New theta values
hold on % This will not clear the original plot
hypothesis= X * theta';
plot(x, hypothesis, 'r-');
% Predit Values Now using Model Theta Values
population_one = 3.5;
profit_one = predictProfit(population_one,theta);
population_two = 12;
profit_two = predictProfit(population_two,theta);
end
Here is the entire source code of the above program
In my next article, I will try to implement the same solution using Python. Subsequent articles will contain how to deal with multiple input variables.
Happy Reading...
Comments