Hello World! (Of Tensorflow, GPU Computing, and Deep Learning)

Posted on Thu 16 February 2017 in Projects

Multilayer Perceptron

So I recently upgraded my desktop and now that I have a gtx 1070 card, I really wanted to get my hands on deep learning using my gpu. The library that I have decided to go with is Tensorflow. Tensorflow is a graph computation/deep learning library developed by Google. Tensorflow also has support for running on the gpu so that we can train larger and faster networks. This notebook will run through will include building a multilayer perceptron model on the classic MNIST dataset.

The MNIST dataset includes samples of handwritten digits from 1 - 10. The objective is to train our network so that it can determine the digit in a given image.

In [1]:
import tensorflow as tf
In [2]:
import matplotlib.pyplot as plt
%matplotlib inline
In [3]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz

Some examples of what each image looks like in the MNIST dataset:

In [4]:
plt.imshow(mnist.train.images[3].reshape((28,28)))
Out[4]:
In [5]:
plt.imshow(mnist.train.images[0].reshape((28,28)))
Out[5]:

A multilayer perceptron model consists of an input layer, a hidden layer (or multiple hidden layers), and an output layer.

title

Our model will contain 3 hidden layers with the rectified linear unit (ReLU) function as our activation function. The ReLU function is defined as: $$ f(x) = \begin{cases} 1 & x > 0 \\ 0 & otherwise \end{cases} $$

In [6]:
n = 28 * 28
num_classes = 10
x = tf.placeholder("float", [None, n])
y = tf.placeholder("float", [None, num_classes])

Define model:

In [7]:
def mlp(x, weights, bias):
    # Hidden Layer 1
    layer1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer1 = tf.nn.relu(layer1)
    
    # Hidden Layer 2 
    layer2 = tf.add(tf.matmul(layer1, weights['h2']), biases['b2'])
    layer2 = tf.nn.relu(layer2)
    
    # Hidden Layer 3 
    layer3 = tf.add(tf.matmul(layer2, weights['h3']), biases['b3'])
    layer3 = tf.nn.relu(layer3)
    
    # Output Layer 
    output_layer = tf.matmul(layer3, weights['out']) + biases['out']
    
    return output_layer
    

Define our weights and biases:

In [8]:
# Define weights and biases
n_hidden1 = 256
n_hidden2 = 256
n_hidden3 = 256

weights = {
    'h1': tf.Variable(tf.random_normal([n, n_hidden1])), 
    'h2': tf.Variable(tf.random_normal([n_hidden1, n_hidden2])),
    'h3': tf.Variable(tf.random_normal([n_hidden2, n_hidden3])),
    'out': tf.Variable(tf.random_normal([n_hidden3, num_classes]))
}

biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden1])), 
    'b2': tf.Variable(tf.random_normal([n_hidden2])), 
    'b3': tf.Variable(tf.random_normal([n_hidden3])),
    'out': tf.Variable(tf.random_normal([num_classes]))
}
In [19]:
batch_size = 256
training_epochs = 25
predictions = mlp(x, weights, biases)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = predictions, labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate = 0.01).minimize(cost)

We will use the stochastic gradient adam optimization algorithm to optimize our weights and biases with batch size 100. The stochastic gradient descent will randomly sample 100 of the images in our training set and train our model using 50 epochs to update our weights (1 pass forward in the network, and 1 pass backwards using backpropagation). The data has already been split into a training and testing set as part of a built in feature in Tensorflow. We just need to train our model and determine its accuracy by comparing it to the test set.

In [20]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        for i in range(total_batch):
            
            batch_x, batch_y = mnist.train.next_batch(batch_size)
                        
            _, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
                                                          y: batch_y})
            
            avg_cost += c / total_batch

        print ("Epoch:", '%0d' % (epoch+1), "cost=", "{:.3f}".format(avg_cost))

    
    pred = tf.equal(tf.argmax(predictions, 1), tf.argmax(y, 1))
    acc = tf.reduce_mean(tf.cast(pred, "float"))
    print ("Accuracy:", acc.eval({x: mnist.test.images, y: mnist.test.labels}) * 100)
Epoch: 1 cost= 678.088
Epoch: 2 cost= 100.495
Epoch: 3 cost= 50.950
Epoch: 4 cost= 33.545
Epoch: 5 cost= 22.439
Epoch: 6 cost= 14.376
Epoch: 7 cost= 12.968
Epoch: 8 cost= 12.089
Epoch: 9 cost= 11.579
Epoch: 10 cost= 10.635
Epoch: 11 cost= 11.402
Epoch: 12 cost= 12.854
Epoch: 13 cost= 12.214
Epoch: 14 cost= 10.492
Epoch: 15 cost= 9.614
Epoch: 16 cost= 9.490
Epoch: 17 cost= 9.889
Epoch: 18 cost= 10.120
Epoch: 19 cost= 9.650
Epoch: 20 cost= 9.716
Epoch: 21 cost= 7.642
Epoch: 22 cost= 8.516
Epoch: 23 cost= 8.134
Epoch: 24 cost= 7.146
Epoch: 25 cost= 6.589
Accuracy: 96.749997139

With 50 epochs, a 3 hidden layer multilayer perceptron optimized with the adam optimizer, we achieve a 96.7% accuracy. This accuracy is actually not that great compared to what other models can do. One way to improve on this model is introduce convolution and pooling layers, more commonly known as a convolutional neural network, which is what I will be working on next to get a better accuracy.