Neural Networks 101- Epoch 3

Hello folks, in this article we will implement a complete Logistic Regression model from scratch. Let’s get started.

What we learnt until now

In the previous articles we have seen the partial derivatives and understood the gradient descent algorithm.

Here’s what we have until now:

z=w^TX+b\\

\hat{y}=\sigma(z)\\

L(y,\hat{y})=-[ylog(\hat{y})-(1-y)log(1-\hat{y})]\\

dw_n=\frac{\delta L}{\delta w_n}\\

db=\frac{\delta L}{\delta b}\\

w_n=w_n-\alpha dw_n\\

b=b-\alpha db\\

\frac{\delta L}{\delta w_n}=dz.x_n\\

\frac{\delta L}{\delta b}=dz\\

dz=\frac{\delta L}{\delta z}=(\hat{y}-y)\\

J(w,b) = \frac{1}{m}\sum_{i=1}^{n}L(y,\hat{y})\\

What we are going to do

We will be building a binary classifier from scratch using python which classifies if a given number is a 7 or not. For that, we will be using the MNIST dataset. As we already know, the MNIST dataset consists of greyscale images of hand written digits which are 64x64 pixels. The image can be represented as a 64x64 dimensional matrix.

We will use Google Colaboratory’s sample MNIST dataset. It is a csv file which consists of 20,000 instances of training data and 10,000 instances of testing data. Let’s begin coding

The Python Implementation

Loading the dataset

Since we are going to be implementing a binary classifier which identifies a 7, the dataset is edited to contain target variables 1 and 0.

We use the NumPy where function to do so

df = pd.read_csv("/content/sample_data/mnist_train_small.csv",header=None,dtype=float)
dftest = pd.read_csv("/content/sample_data/mnist_test.csv",header=None,dtype=float)

#train_set_y
train_set_y=df.iloc[:,:1].values.T
train_set_y=np.where(train_set_y==7,1,0)
print(train_set_y)


#train_set_x_orig
train_set_x=df.iloc[:,1:].values.T
print(train_set_x)

#test_set_y
test_set_y=dftest.iloc[:,:1].values.T
test_set_y=np.where(test_set_y==7,1,0)
print(test_set_y)

#test_set_x_orig
test_set_x=dftest.iloc[:,1:].values.T
print(test_set_x)

print(train_set_x.shape)
print(train_set_y.shape)
print(test_set_x.shape)
print(test_set_y.shape)

Implementing Sigmoid function

def sigmoid(z):
"""
Compute the sigmoid of z

Arguments:
z -- A scalar or numpy array of any size.

Return:
s -- sigmoid(z)
"""

s=1/(1+np.exp(-z))

return s

Implementing helper functions

def initialize_with_zeros(dim):

"""
This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.

Argument:

dim -- size of the w vector we want (or number of parameters in this case)

Returns:
w -- initialized vector of shape (dim, 1)
b -- initialized scalar (corresponds to the bias) of type float
"""

w=np.zeros((dim,1),dtype=float)
b=0.0

return w, b

Implementing Forward and Back propagation

def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b
    """
    
    m = X.shape[1]
    
    # FORWARD PROPAGATION (FROM X TO COST)
    A = sigmoid(np.dot(w.T,X)+b)
    cost = -1/m*(np.sum(np.dot(Y,np.log(A+1e-5).T)+np.dot(1-Y,np.log(1-A+1e-5).T)))

    # BACKWARD PROPAGATION
    dw = 1/m*(X.dot((A-Y).T))
    db = 1/m*(np.sum(A-Y))
    
    cost = np.squeeze(np.array(cost))

    
    grads = {"dw": dw,
             "db": db}
    
    return grads, cost

Implement the optimizing function

def optimize(w, b, X, Y, num_iterations=100, learning_rate=0.009, print_cost=False):
    """
    This function optimizes w and b by running a gradient descent algorithm
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of shape (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- True to print the loss every 100 steps
    
    Returns:
    params -- dictionary containing the weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
    """
    
    w = copy.deepcopy(w)
    b = copy.deepcopy(b)
    
    costs = []
    
    for i in range(num_iterations):
        # Cost and gradient calculation
        grads, cost = propagate(w,b,X,Y)

        
        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]
        
        # update rule
        w = w-learning_rate*dw
        b = b-learning_rate*db

        
        # Record the costs
        if i % 100 == 0:
            costs.append(cost)
        
            # Print the cost every 100 training iterations
            if print_cost:
                print ("Cost after iteration %i: %f" %(i, cost))
    
    params = {"w": w,
              "b": b}
    
    grads = {"dw": dw,
             "db": db}
    
    return params, grads, costs

Implementing prediction function

def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    
    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''
    
    m = X.shape[1]
    Y_prediction = np.zeros((1, m))
    w = w.reshape(X.shape[0], 1)
    A = sigmoid(np.dot(w.T,X)+b)
    
    for i in range(A.shape[1]):
        
        # Convert probabilities A[0,i] to actual predictions p[0,i]
        
        if A[0, i] > 0.5 :
            Y_prediction[0,i] = 1
        else:
            Y_prediction[0,i] = 0
    
    return Y_prediction

Creating the final Model

def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
    """
    Builds the logistic regression model by calling the function you've implemented previously
    
    Arguments:
    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to True to print the cost every 100 iterations
    
    Returns:
    d -- dictionary containing information about the model.
    """ 
    # initialize parameters with zeros 
    # w, b = ...
    # Gradient descent 
    # params, grads, costs = ...
    
    # Retrieve parameters w and b from dictionary "params"
    # w = ...
    # b = ...
    
    # Predict test/train set examples
    # Y_prediction_test = ...
    # Y_prediction_train = ...
    
    m = X_train.shape[0]
    w, b = initialize_with_zeros(m)
    
    params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    w = params["w"]
    b = params["b"]
    
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    # Print train/test Errors
    if print_cost:
        print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
        print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

Model Testing and Evaluation

Now that we've seen how to implement the model, let us test our model on the dataset we have created:

logistic_regression_model = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations=2000, learning_rate=0.001, print_cost=True)

The model has a training accuracy 98.665% and test accuracy of 98.24%, which is not really bad for a single neuron. But hold on! This happened due to the fact that there were way more non 7 digits than 7 digits. This resulted in a skewed dataset.

Conclusion

And that is it. We have successfully implemented the Logistic regression model in python. In the coming articles we will see how to implement a complete neural network.

Neural Networks 101- Epoch 3

Neural Networks 101- Epoch 3

What we learnt until now

What we are going to do

The Python Implementation

Loading the dataset

Implementing Sigmoid function

Implementing helper functions

Implementing Forward and Back propagation

Implement the optimizing function

Implementing prediction function

Creating the final Model

Model Testing and Evaluation

Conclusion

About Preetham