Deep Learning on Bottles, Because I Had to Use Tensorflow on Something

Skills

Data Science, Python, Tensorflow, Deep Learning

Objective

This was a class project dedicated towards using a Convolutional Neural Network to learn the fullness of some bottles. The full code is in the GitHub repository.

Architecture

The network was built using a simplified version of VGG16 with less filters and layers. Each image is of size 128x128x3. There are several important components that were used in the design of this network.

Convolutional Layer

The convolutional layer scans a filter window over each image to learn important features base on causal information from nearby pixels. The filter was sized to capture enough of the data without losing too much information. Filter size was chosen based on computational run time and accuracy.

Dropout

A dropout of 20% of the neurons was chosen after the convolutional layers in order to regularize the network. This was the recommended amount suggested by Gregory Hinton [https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf] so as to not lose too much information after the convolutional layer.

Batch Normalization

Batch normalization is used to stabilize the “covariant shift” or the compounding effects of layers further up the network. It also allows for a higher learning rate and provides a small regularization effect.

L2 Regularization

This is used to limit weight gradients that get too large by adding a regularization term to the loss. Lambda was chosen to not have too large of an effect on the weights.

ReLU Activation

ReLU was used because it is a nonlinear function that was found to greatly accelerate (e.g. a factor of 6 in Krizhevsky et al.) the convergence of stochastic gradient descent compared to the sigmoid/tanh functions. This is likely due its linear, non-saturating form and inexpensive operations both forward and in back propagation.

Pooling Layer

Max pool was chosen as the pooling layer in order to reduce the number of computations by downsampling the size of the training data while retaining important information. Max pool was mostly chosen due to prior literature, but has been shown to be easily replaceable using a convolutional layer [https://arxiv.org/pdf/1412.6806.pdf]

Image Generator

The image generator was used to make it easier to generate batches and save memory for training. The class function I wrote includes other data augmentation methods to increase the data such as clipping, rotating, flipping, and adding noise.

Results

Using mini-batches of 20 (not 32 or another exponent of 2 due to some memory issues) with an Adam optimizer, I got an accuracy of 85.0%.

References