Neural Networks 101- Epoch 3
Hello folks, in this article we will implement a complete Logistic Regression model from scratch. Let’s get started.
What we learnt until now
In the previous articles we have seen the partial derivatives and understood the gradient descent algorithm.
Here’s what we have until now:
What we are going to do
We will be building a binary classifier from scratch using python which classifies if a given number is a 7 or not. For that, we will be using the MNIST dataset. As we already know, the MNIST dataset consists of greyscale images of hand written digits which are 64x64
pixels. The image can be represented as a 64x64
dimensional matrix.
We will use Google Colaboratory’s sample MNIST dataset. It is a csv file which consists of 20,000 instances of training data and 10,000 instances of testing data. Let’s begin coding
The Python Implementation
Loading the dataset
Since we are going to be implementing a binary classifier which identifies a 7, the dataset is edited to contain target variables 1 and 0.
We use the NumPy where
function to do so
df = pd.read_csv("/content/sample_data/mnist_train_small.csv",header=None,dtype=float)
dftest = pd.read_csv("/content/sample_data/mnist_test.csv",header=None,dtype=float)
Implementing Sigmoid function
def sigmoid(z):
Compute the sigmoid of z
z -- A scalar or numpy array of any size.
s -- sigmoid(z)
return s
Implementing helper functions
def initialize_with_zeros(dim):
This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
dim -- size of the w vector we want (or number of parameters in this case)
w -- initialized vector of shape (dim, 1)
b -- initialized scalar (corresponds to the bias) of type float
return w, b
Implementing Forward and Back propagation
def propagate(w, b, X, Y):
Implement the cost function and its gradient for the propagation explained above
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b
m = X.shape[1]
A = sigmoid(,X)+b)
cost = -1/m*(np.sum(,np.log(A+1e-5).T),np.log(1-A+1e-5).T)))
dw = 1/m*(
db = 1/m*(np.sum(A-Y))
cost = np.squeeze(np.array(cost))
grads = {"dw": dw,
"db": db}
return grads, cost
Implement the optimizing function
def optimize(w, b, X, Y, num_iterations=100, learning_rate=0.009, print_cost=False):
This function optimizes w and b by running a gradient descent algorithm
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of shape (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
num_iterations -- number of iterations of the optimization loop
learning_rate -- learning rate of the gradient descent update rule
print_cost -- True to print the loss every 100 steps
params -- dictionary containing the weights w and bias b
grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
w = copy.deepcopy(w)
b = copy.deepcopy(b)
costs = []
for i in range(num_iterations):
# Cost and gradient calculation
grads, cost = propagate(w,b,X,Y)
# Retrieve derivatives from grads
dw = grads["dw"]
db = grads["db"]
# update rule
w = w-learning_rate*dw
b = b-learning_rate*db
# Record the costs
if i % 100 == 0:
# Print the cost every 100 training iterations
if print_cost:
print ("Cost after iteration %i: %f" %(i, cost))
params = {"w": w,
"b": b}
grads = {"dw": dw,
"db": db}
return params, grads, costs
Implementing prediction function
def predict(w, b, X):
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
m = X.shape[1]
Y_prediction = np.zeros((1, m))
w = w.reshape(X.shape[0], 1)
A = sigmoid(,X)+b)
for i in range(A.shape[1]):
# Convert probabilities A[0,i] to actual predictions p[0,i]
if A[0, i] > 0.5 :
Y_prediction[0,i] = 1
Y_prediction[0,i] = 0
return Y_prediction
Creating the final Model
def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
Builds the logistic regression model by calling the function you've implemented previously
X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
print_cost -- Set to True to print the cost every 100 iterations
d -- dictionary containing information about the model.
# initialize parameters with zeros
# w, b = ...
# Gradient descent
# params, grads, costs = ...
# Retrieve parameters w and b from dictionary "params"
# w = ...
# b = ...
# Predict test/train set examples
# Y_prediction_test = ...
# Y_prediction_train = ...
m = X_train.shape[0]
w, b = initialize_with_zeros(m)
params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
w = params["w"]
b = params["b"]
Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)
# Print train/test Errors
if print_cost:
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
d = {"costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train" : Y_prediction_train,
"w" : w,
"b" : b,
"learning_rate" : learning_rate,
"num_iterations": num_iterations}
return d
Model Testing and Evaluation
Now that we've seen how to implement the model, let us test our model on the dataset we have created:
logistic_regression_model = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations=2000, learning_rate=0.001, print_cost=True)
The model has a training accuracy 98.665% and test accuracy of 98.24%, which is not really bad for a single neuron. But hold on! This happened due to the fact that there were way more non 7 digits than 7 digits. This resulted in a skewed dataset.
And that is it. We have successfully implemented the Logistic regression model in python. In the coming articles we will see how to implement a complete neural network.
About Preetham
Hi, I'm Preetham, a student pursuing a Bachelor's degree in Artificial Intelligence and Machine Learning. As an aspiring data scientist, I'm passionate about exploring the cutting-edge of machine learning and AI research. I believe that the most effective way to learn is by teaching others, which is why I decided to start this blog series.