# Machine Learning - SS18

Ludwig-Maximilians-Universität München
* Lecturer: Prof. Dr. Volker Tresp
* Assistant: Christian Frey, Julian Busch
* Tutor: Changkun Ou <hi@changkun.us>

# 5th Tutorial - 05/17/18

In this tutorial, we will create a image classifier for a 'Simpsons' character dataset. This time we will implement the classifier in TensorFlow.

## The Simpsons Character Classifiction in TensorFlow

In the following, we use "The Simpsons Character Data" provided by the user 'alexattia' on kaggle (source to the data: https://www.kaggle.com/alexattia/the-simpsons-characters-dataset/data) 

We provide a slighty preprocessed data, which will be used in the following. 

<b>Download Link</b> of the preprocessed data: 
+ Dataset: http://www.dbs.ifi.lmu.de/~frey/MLSS18/the_simpsons_char_dataset/dataset.h5
+ Labels: http://www.dbs.ifi.lmu.de/~frey/MLSS18/the_simpsons_char_dataset/labels.h5

Store the files in exactly the same folder as this notebook (Otherwise you can also adjust the paths in the following cells)

#### Character Data
First, we attach to each of the characters in the dataset a unique id which will be the class label for a specific char.

In [1]:
map_characters= {
    0: 'abraham_grampa_simpson',
    1: 'apu_nahasapeemapetilon',
    2: 'bart_simpson',
    3: 'charles_montgomery_burns',
    4: 'chief_wiggum',
    5: 'comic_book_guy',
    6: 'edna_krabappel',
    7: 'homer_simpson',
    8: 'kent_brockman',
    9: 'krusty_the_clown',
    10:'lisa_simpson',
    11:'marge_simpson',
    12:'milhouse_van_houten',
    13:'moe_szyslak',
    14:'ned_flanders',
    15:'nelson_muntz',
    16:'principal_skinner',
    17:'sideshow_bob'
}

#### Load dependencies

In [2]:
import numpy as np
np.random.seed(42)
import tensorflow as tf

  from ._conv import register_converters as _register_converters


#### Load data
Like in the preceeding notebook, we will first load the data. 
In order to load the data, the library h5py has to be installed. If you haven't installed it yet, you can use the pip command:
+ pip install h5py

In [3]:
pic_size = 64

In [4]:
import h5py

def load_data():
    # in case the file is stored not in the same folder as this notebook, please adjust the path
    h5f = h5py.File('dataset.h5','r+')
    X_train = h5f['X_train'][:]
    X_test = h5f['X_test'][:]
    h5f.close()    

    # in case the file is stored not in the same folder as this notebook, please adjust the path
    h5f = h5py.File('labels.h5','r+')
    y_train = h5f['y_train'][:]
    y_test = h5f['y_test'][:]
    h5f.close()  
    
    X_train = X_train.astype('float32') / 255.
    X_test = X_test.astype('float32') / 255.

    return X_train, X_test, y_train, y_test

In [5]:
X_train, X_test, y_train, y_test = load_data()

### Set parameters for each layer

First, we will define the hyperparameters for our network. Please note that the architecture is the same as the one in the keras solution. Keras has already some inherent methods for initializing the weight matrices. For TensorFlow, we have to define this initialization explicitly. One common initializer is the $xavier\_initializer()$, which can be used in the following to initialize each of the weight matrices of our neural network. It automatically determines the scale of initialization based on the number of input and output neurons.

In [6]:
### Set neural network hyperparameters
epochs = 20
batch_size = 128
wt_init = tf.contrib.layers.xavier_initializer() # weight initializer

# input layer (64x64)
# n_input = 4096
n_input = 64*64

# first convolutional layer
n_conv_1 = 32
k_conv_1 = 3

# second convolutional layer
n_conv_2 = 64
k_conv_2 = 3

# max pooling layer:
pool_size = 2
mp_layer_dropout = .25

# dense layer:
n_dense = 128
dense_layer_dropout = .5

# output layer:
n_classes = len(map_characters)

#### Define placeholder Tensors for inputs and labels

Next, we will **define 2 placeholders**. One for our input data (design matrix) $x$, and the other one $y$ being used for our output layer defining the output labels (=character class).

In [7]:
# definition of x
x = tf.placeholder(tf.float32, [None, pic_size, pic_size, 3])
# definition of y
y = tf.placeholder(tf.float32, [None, n_classes])

#### Define types of layers
In the next cell we define 3 types of layers, namely a $dense$ layer, a $conv2d$ layer and a $maxpooling2d$-layer.

+ $Dense$: this layer performs a matrix multiplication of the incoming data matrix and its weight matrix. Secondly, we will add the bias to the result of the multiplication (Hint: use tf.add($\cdot, \cdot$) and tf.matmul($\cdot, \cdot$)) to perform the operations. As activation function we will use a ReLu and return the result. In total, we have $ReLU(W \cdot x + b)$


+ $Conv2d$: in this layer we define a convolutional layer. Note that we also need to define a $stride\_length$ for the convolution (Hint: see tf.nn.conv2d()). After having performed the convolutional, we need to add the bias to the result. Again, we use the ReLU function as the activation function of our convolutional 2d layer.


+ $maxpooling2d$: here we will define a maxpooling layer. For a maxpooling, we also have to define a kernel_size defining the 'window' of the pooling function. (Hint: see tf.nn.max$\_$pool()) for further information. Here we will use $p\_size$ for each dimension of the image (notice the $data\_format$) 

The parameters are given and shall be used as given. We will define the size of the weight matrices etc. shortly.

In [8]:
# dense layer with ReLU activation:
def dense (x, W, b):
    return tf.nn.relu(tf.add(tf.matmul(x, W), b))

# convolutional layer with ReLU activation:
def conv2d(x, W, b, stride_length=1):
    return tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(x, W, strides=[1, stride_length, stride_length, 1], padding='SAME'), b))

# max-pooling layer:
def maxpooling2d(x, p_size):
    return tf.nn.max_pool(x, ksize=[1, p_size, p_size, 1], strides=[1, p_size, p_size, 1], padding='SAME')


#### Design neural network architecture

Now we design our neural network. Therefore, we will use a network consisting of 2 convolutional layers, 1 maxpooling layer and 2 dense layers. For the hidden layers we will use our functions defined above. We can store the weight matrices and biases in dictionaries. For now we concentrate on the architecture, next we will take care of the dictionaries and also set up the right sizes of the matrices used in our neural network.

Dictionaries (assume the dictionary are already defined; the definition of the dics will be in the next cell): 
+ the weight matrices are stored in a dictionary called 'weights' (see parameter list). The keys of the dictionary are as follows: '$W\_c1$' for the first convolutional layer, '$W\_c2$' for the second convolutional layer, '$W\_d1$' for the first dense layer and '$W\_out$' for the output layer. (of course you can also use other keys if you'd like to)

+ Similarily, we define a dictionary containing the biases (see parameter list). We call the dictionary 'biases'. The keys of the dictionary are as follows: '$b\_c1$' as the bias for the first convolutional layer, '$b\_c1$' for the bias of the second convolutional layer; '$b\_d1$' for the bias of the first dense layer and '$b\_out$' for the bias of the output layer.

Hence, we can just attach the entries when calling our layer functions defined in the previous cells as parameters to the functions. 

Architecture:
* 1: Convolutional layer with 32 neurons and a kernel_size of $3 \times 3$. The activation function we use is a rectified linear unit (ReLU). Also, we define the input shape to be 64 x 64 x 3, as the training images' size is 64x64 with 3 colors channels (RGB)

* 2: Next, we define the second convolutional layer consisting of 64 neurons and also a kernel size of 3 x 3, The activation function is again a rectified linear unit (ReLU)

* 3: The next layer is a max pooling layer with a window of 2 x 2, and we use a dropout of .25.

* 4: The next step is to use a dense layer. For that reason we fist flatten out the images such that they are represented as a 1d vector.

* 5: We use a dense layer with 128 neurons and ReLU again as the activation function

* 6: For brief of efficiency, we use also a dropout with .5

* 7: The last layer is the output layer. Therefore, we use a dense layer with the number of classes as the number of neurons, i.e., $W\cdot x + b$

In [9]:
def net(x, weights, biases, n_in, mp_psize, mp_dropout, dense_dropout):
    # first convolutional layer
    with tf.name_scope("conv1"):
        conv_1 = conv2d(x, weights['W_c1'], biases['b_c1'])
    
    # second convolutional layer
    with tf.name_scope("conv2"):
        conv_2 = conv2d(conv_1, weights['W_c2'], biases['b_c2'])

    # maxpool layer
    with tf.name_scope("maxpool1"):
        pool_1 = maxpooling2d(conv_2, mp_psize)

    # dropout layer
    with tf.name_scope("dropout1"):
        dropout_1 = tf.nn.dropout(pool_1, 1 - mp_dropout)

    # dense layer (first we have to flatten out the output of the previous layer)
    with tf.name_scope("dense1"):
        flat_1 = tf.reshape(dropout_1, [-1, weights['W_d1'].get_shape().as_list()[0]])
        dense_1 = dense(flat_1, weights['W_d1'], biases['b_d1'])
    
    # dropout layer
    with tf.name_scope("dropout2"):
        dropout_2 = tf.nn.dropout(dense_1, 1 - dense_dropout)
    
    # output layer
    with tf.name_scope("output"):
        return tf.add(tf.matmul(dropout_2, weights['W_out']), biases['b_out'])

#### Define dictionaries for storing weights and biases for each layer

By now, we used the dictionaries as a black-box. We now will define them explicitly. 

+ The biases are defined by zero vectors with the size defined by the number of neurons for each of the layer. Again, we have the entries (keys) '$b\_c1$', '$b\_c2$', '$b\_d1$', '$b\_out$' for the dictionary containing the bias vectors.


+ Take good care of the shape of the weight tensor. For a detailed information about the conv2d function, please refer to: https://www.tensorflow.org/api_docs/python/tf/nn/conv2d . Let's take a more detailed view on the first convolutional layer. The expected dimension for the filter parameter for the convolutional layer is a 4 D tensor having the shape [filter_height, filter_width, in_channels, out_channels]. The filter height and width are defined by the hyperparameter $k\_conv\_1$. The input channels is according to the color encoding 3. And we will define the number of output channels to be the number of neurons on the second layer, i.e., $n\_conv\_2$. 


+ In order to compute the number of inputs to the dense layer, we have to compute the output of the maxpool-layer. Therefore, we know that the images sizes are 64x64 which are maxpooled with a pool size (p$\_$size) of 2. Hence, in one dimension we get 64/2 = 32. The number of input neurons to the dense layer (d1) is then calculated by taking this number in each dimension of the picture 32x32. Now we know the pooled image size. This result is then multiplied by the number of neurons on the second convolutional layer: 32 x 32 x number$\_$neurons$\_$secondConvLayer.

In [10]:
# definition of dict for biases
bias_dict = {
    'b_c1': tf.Variable(tf.zeros([n_conv_1])),
    'b_c2': tf.Variable(tf.zeros([n_conv_2])),
    'b_d1': tf.Variable(tf.zeros([n_dense])),
    'b_out': tf.Variable(tf.zeros([n_classes]))
}

# calculate number of inputs to dense layer:
full_square_length = np.sqrt(n_input)
pooled_square_length = int(full_square_length / pool_size)
dense_inputs = pooled_square_length ** 2 * n_conv_2

# definition of dict for weights
weight_dict = {
    'W_c1': tf.get_variable('W_c1', [k_conv_1, k_conv_1, 3, n_conv_1], initializer=wt_init),
    'W_c2': tf.get_variable('W_c2', [k_conv_2, k_conv_2, n_conv_1, n_conv_2], initializer=wt_init),
    'W_d1': tf.get_variable('W_d1', [dense_inputs, n_dense], initializer=wt_init), 
    'W_out': tf.get_variable('W_out', [n_dense, n_classes],  initializer=wt_init)
}

#### Build model
Now, we are ready to build the model by calling the net() function from above with the parameters defined above.

In [11]:
# definition for our predictions
predictions = net(x, weight_dict, bias_dict, n_input,
                     pool_size, mp_layer_dropout, dense_layer_dropout)

#### Define model's loss and its optimizer
Also, we define our cost function where we use again the softmax cross entropy (Hint: tf.nn.softmax_cross_entropy_with_logits($\cdot, \cdot$)).
As optimizer we will use the ADAM method in order to minimize our cost function. (Hint: tf.train.AdamOptimizer())

In [12]:
# definition of the cost function
cost = tf.reduce_mean (tf.nn.softmax_cross_entropy_with_logits_v2(logits=predictions, labels=y))

# defintion of the optimizer
optimizer = tf.train.AdamOptimizer().minimize(cost)

#### Define evaluation metrics
Next, we also define some evaluation metrics. What we want to have is the percentage of correct predictions made by our neural network. Therefore, we can use the $tf.equal(\cdot, \cdot)$ in order to check if the arguments by our true class labels (y) and the predictions made by our network are the same. In order to get a value in percentage we can use the $tf.reduce\_mean(\cdot)$ function. Note that the reduce function expects numeric values (hint: tf.cast(.))

In [13]:
# defintion of accuracy (in percentage)
correct_pred = tf.equal(tf.argmax(predictions, 1), tf.argmax(y,1))
acc_pct = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) * 100

#### Create operation for variable initialization
Like we have already seen in the introduction to TensorFlow, we also need to define a global initializer. 

In [14]:
# definition of initializer
init_op = tf.global_variables_initializer()

#### Configure history log
To create a logging of the performance of our neural network we can use the tf.summary operations. We will just have a short view on how to use TensorBoard here.

In [15]:
tf.summary.scalar("cost", cost)
tf.summary.scalar("accuracy_percentage", acc_pct)

<tf.Tensor 'accuracy_percentage:0' shape=() dtype=string>

#### Train the network in a session
As, we use batches for the training, we also have to explicitly define a $next\_batch()$ function. As, we want to concentrate on TensorFlow and its functionality, we provide one solution on how to define such a function.

In [16]:
def next_batch(num, data, labels):
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[i] for i in idx]
    labels_shuffle = [labels[i] for i in idx]
    return np.asarray(data_shuffle), np.asarray(labels_shuffle)

#### TRAIN!

Finally, we are all set up to train our neural network in a TensorFlow session. The steps more explicitly:
+ first we define a session
+ next we will run our global initializer for all the variables in our network
+ next we create a iteration for the number of epochs.
+ in each epoch we iterate for the mini-batches; in more detail: we will get the next_batch() and we will use this batch as the input for our network (feed the batch in the neural network). We will run the optimizer, the calculation of the cost function and the computation of the accuracy in percentage.
+ for a more detailed output, we can also aggregate the cost and the accuracy in percentage and print it on the console (c.f. verbose in keras)
+ test the model according to the cost function and accuracy on the test set

In [17]:
with tf.Session() as session:
    # run initializer
    session.run(init_op)

    # Define FileWriter for logs
    train_writer = tf.summary.FileWriter( './logs/1/train ', session.graph)
    
    # iterate epochs and batches and run statement for TensorFlow calculations
    # use  merge = tf.summary.merge_all() to get values for tf.summaries defined aboove
    # and attach it to param list in session.run() to actually retrieve the results
    # with train_writer.add_summary(summary, cnt) we can write for each batch within 
    # an epoch the results to the log files. Note that the counter cnt is an indicator
    # for the log-id
    cnt = 0
    for epoch in range(5):
        avg_cost = avg_acc_pct = 0.0        
        n_batches = int(len(X_train) / batch_size)
        for i in range(n_batches):
            batch_x , batch_y = next_batch(batch_size, X_train, y_train)
            
            cnt+=1
            merge = tf.summary.merge_all()
            
            summary, _ , batch_cost, batch_acc = session.run([merge, optimizer, cost, acc_pct],
                                               feed_dict={x:batch_x, y:batch_y})

            train_writer.add_summary(summary, cnt)
            
            # aggregate cost and acc for each batch in epoch
            avg_cost += batch_cost / n_batches
            avg_acc_pct += batch_acc / n_batches
            
        
        # verbose
        print ("Epoch {:03}: cost = {:.3f} , acc = {:.2f} %".format(
            epoch+1, avg_cost, avg_acc_pct))

    print("Training Complete.")
    
    # Test model within session
    print ("Test Model...")
    test_cost = cost.eval({x: X_test, y: y_test})
    test_accuracy_pct = acc_pct.eval({x: X_test, y: y_test})
    
    print("Test Cost: {:.3f}".format(test_cost))
    print("Test Accuracy: {:.2f} %".format(test_accuracy_pct))

Epoch 001: cost = 2.739 , acc = 15.96 %
Epoch 002: cost = 2.121 , acc = 34.64 %
Epoch 003: cost = 1.712 , acc = 46.38 %
Epoch 004: cost = 1.422 , acc = 54.50 %
Epoch 005: cost = 1.256 , acc = 59.22 %
Training Complete.
Test Model...
Test Cost: 1.557
Test Accuracy: 51.64 %


A short introduction and to show the Interface of TensorBoard, please refer to the tutorials
# End of this Tutorial