White Paper

How To Build A Deep Learning Classification System For Less Than $1000 USD


Deep learning is set to alter the machine vision landscape in a big way. It is enabling new applications and disrupting established markets. As a product manager with FLIR, I have the privilege of visiting companies across a diverse range of industries; every company I visited this year is working on deep learning. It’s never been easier to get started, but where do you begin? This article will provide an easy-to-follow guide to building a deep learning inference system for less than $1000.

What is deep learning inference?

Inference is the use of a deep-learning-trained neural network to make predictions on new data. Inference is far better at answering complex and subjective questions than traditional rules-based image analysis. By optimizing networks to run on low-power hardware, inference can be run “on the edge”, near the data source. This eliminates the system’s dependence on a central server for image analysis, leading to lower latency, higher reliability, and improved security. 

Selecting the hardware

The goal of this guide is to build a reliable, high-quality system to deploy in the field. While it is beyond the scope of this guide, combining traditional computer vision techniques with deep learning inference can deliver high accuracy and computational efficiency by leveraging the strengths of each approach. The Aaeon UP Squared-Celeron-4GB-32GB single-board computer has the memory and CPU power required for this approach. Its X64 Intel CPU runs the same software as traditional desktop PCs, simplifying development compared to ARM-based, single-board computers (SBCs).

The code that enables deep learning inference uses branching logic; dedicated hardware can greatly accelerate the execution of this code. The Intel® Movidius™ Myriad™ 2 Vision Processing Unit (VPU) is a very powerful and efficient inference accelerator. Its small size and low power consumption enable it to be integrated into the Intel® Neural Compute Stick or the Aaeon AI Core micro PCIE add-on board for Aaeon UP2 SBCs.

Accurate, high-confidence inference is dependant on high-quality input data. The FLIR BFS-U3-16S2C-C camera features a Sony Pregius IMX273 sensor for clear images even in challenging lighting conditions. Blackfly S cameras have a rich set of onboard features for precision triggering, and image pre-processing.

Setting up the software

There are many free tools available for building, training and deploying deep learning inference models. The Ubuntu 16.04 operating system (http://releases.ubuntu.com/16.04/), supported by the widest range of tools, will be used for this project. It is backed by a large and active userbase providing a wealth of support resources. This example uses an array of free and open source software. Installation instructions for each software package are available on their respective websites. This guide assumes you are familiar with the basics of the Linux console.

Fig. 1. Deep learning inference workflow and the associated tools for each step.

TensorFlow is a popular opensource software library widely used for deep learning applications. It provides a simple-to-use Python API, enabling users to easily build and train deep neural networks. Installation instructions are available from TensorFlow (https://www.tensorflow.org/install/install_linux).  As the Up Squared board does not have a GPU, the variant without Nvidia GPU support should be installed.

Bazel is a free tool used to build the required TensorFlow tools for graph conversion. Installation instructions are available from the developer:https://docs.bazel.build/versions/master/install-ubuntu.html

Converting a neural network to Movidius™ format and uploading it to the Myriad 2 VPU requires the Neural Compute Stick Software Development Kit (NCSDK). Installation instructions for Linux are available from Intel® (https://github.com/movidius/ncsdk).

FLIR’s Spinnaker SDK (https://www.ptgrey.com/support/downloads)is a GenICam API library used to control FLIR machine vision cameras.

This example uses Google’s excellent TensorFlow for Poets tutorial (https://codelabs.developers.google.com/codelabs/tensorflow-for-poets) as a starting point. Clone the git repository for this tutorial to download the scripts used in this example.

git clone https://github.com/googlecodelabs/tensorflow-for-poets-2
cd tensorflow-for-poets-2

Build, train and deploy a flower classifier 

In this example, you will train a neural network to classify several different types of common flowers. To do this, we use MobileNet. MobileNet is a type of deep neural network which achieves accurate results and high power efficiency on mobile devices. It is ideal for deployment to the Myriad 2 VPU.

Download a high-quality dataset

The most important requirement for training a neural network is having a good set of labelled training data. Both quality and quantity are important. In addition to image quality, the labels on the data must also be accurate and free of noise. Noisy label data occurs when some labels are less relevant to the images than others.

Fig. 2. Well labelled (A) and poorly labelled (B) training data for “flowers”. While both sets of images contain flowers, the images in set B are less relevant to the “flower” label than set A.

Depending on the quality and quantity of the training data, pre-processing may be required. The dimensions of the dataset images can have a big impact on the time required to train a network, and the speed which operate at once deployed. All images should have the same aspect ratio and dimensions.

Optional pre-processing can enhance the speed of training and the accuracy of the final model. Normalization of data ensures each image has a similar distribution of pixel brightness values.

Fig. 3.  Image data normalised using the Y = (x - x.mean()) / x.std() method can yield significant improvements in training speed and accuracy.

Data augmentation quickly expands the size of a dataset by applying affine transformations such as scaling, rotating, and shearing to images. Augmentation can improve the performance of networks trained on a limited dataset by exposing it to more variations during training.

Fig. 4. Affine transformations (B) of a source image (A) can quickly generate additional variants of images to expand training datasets.

The dataset used here is a widely used example. The quality and quantity of the data are such that it can be used without further processing. Download the dataset from TensorFlow.org.

curl http://download.tensorflow.org/example_images/flower_photos.tgz \
    | tar xz -C tf_files

Train your model using transfer learning

To train a neural network on the dataset, use transfer learning, which takes a pre-trained image classifier and retrains it to recognize new objects using a base task as a starting point. This is much faster than training a network from scratch. Many of the hidden layers in a deep neural network are used for feature extraction and don’t need to be retrained from the base task to adapt the network to the new task.

Accuracy is balanced against speed by adjusting the input image size and the size parameters of the model. Larger networks with higher resolution input images are more accurate, but they require more computing power and run slower. For this example, use input images of 224 px x 224 px and a model that is 0.5 times the size of the largest possible MobileNet model.


To monitor the progress of training, launch the TensorBoard tool.

tensorboard --logdir tf_files/training_summaries &

Now you are ready to start retraining your MobileNet.

python -m scripts.retrain \
  --bottleneck_dir=tf_files/bottlenecks \
  --how_many_training_steps=500 \
  --model_dir=tf_files/models/ \
  --summaries_dir=tf_files/training_summaries/"${architecture}" \
  --output_graph=tf_files/retrained_graph.pb \
  --output_labels=tf_files/retrained_labels.txt \
  --architecture="${architecture}" \

This process will take time; you can keep an eye on it using the TensorBoard tool. Once the model has been retrained it is saved as tf_files/retrained_graph.pb. You can test it on a sample image from the training dataset.

python -m scripts.label_image \
    --graph=tf_files/retrained_graph.pb  \

To run the model on new images that are not part of the original data set, point the --image flag to the location of the image you want to use. If the new image is a different size than the images in the dataset, you will need to add the --input_size=${IMAGE_SIZE} flag to the previous shell command or resize it manually.

Optimize and deploy your model to a Movidius™ Myriad 2 VPU

Before it can be deployed to the Intel® Movidius™ Myriad™ 2, the retrained model must be optimized for Intel’s NCSDK. This is done using TensorFlow and Bazel.

/home/username/Public/Projects/tensorflow/tensorflow/bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=/home/username/Public/Projects/tf_poets/tf_files/retrained_graph.pb \
--out_graph=/home/username/Public/Projects/tf_poets/tf_files/optimized_graph.pb \
--inputs='input' \
--outputs='final_result' \
strip_unused_nodes(type=float, shape="1,224,224,3")
remove_nodes(op=Identity, op=CheckNumerics, op=PlaceholderWithDefault)

Bazel makes it easy to benchmark the performance of your model. This step is not required, but it can be useful to help you optimize more complex models.

bazel run tensorflow/tools/benchmark:benchmark_model

 The resulting graph file can now be converted into a Movidius™ supported format and uploaded to the Myriad 2 VPU..

mvNCCompile -s 12 tf_files/optimized_graph.pb -in=input -on=final_result

Running Inference on camera input

Now that the graph file has been uploaded to the VPU, you are ready to run inference on the images captured by the FLIR Blackfly S camera. This script (https://flir.box.com/s/v3idtzmruojmtcbzy7xamja7pgbey03z) calls the Spinnaker API to acquire images which are resized to match the network input and normalized. The images are then passed to the Movidius VPU where they are classified based on the neural network you just trained..

python3 lic2.py --graph graph_filename  --labels labels_camera_3classes.txt

Conclusion & Next Steps

There has never been a better time to get started with deep learning. The Intel® Movidius™ Myriad™ 2 VPU provides an easy path to deploying high-accuracy deep learning inference to the edge. With inexpensive hardware and free tools from Google and Intel®, it is possible to build an inference-based inspection system for under $1000.

In the near future, deploying a deep learning will be even easier and with a smaller footprint. At this year’s VISION Stuttgart show, FLIR will be demonstrating the industry’s first inference enabled camera. The upcoming FLIR Firefly® will have the Myriad™ 2 processor pre-integrated into the camera. You will be able to deploy your trained neural network for on-camera inference.