Build your own Image Classifier in less time than it takes to bake a pizza

Using the TensorFlow Inception model as a base to retrain a custom set of image classifications

Robots & Pencils
RoboPress

--

In the past couple of years, large companies including Google, Facebook, Microsoft, and Amazon have been releasing libraries, frameworks, and services that enable other businesses to build machine learning (ML)models. What’s great about these frameworks is that it’s now cheaper and faster to run a machine learning experiment for your business.

Building useful machine learning models often takes a lot of data — thousands of examples — as well as a lot of time to prep the data in a format that is appropriate for the system. The content needs to be carefully curated and high quality. This isn’t always easy to come by.

With the availability of ML libraries and frameworks, you can take a model that has already been trained on a large dataset and retrain it with your smaller dataset to prototype your own ML use case.

It’s akin to buying a uncooked pizza with pre-made dough, sauce, and cheese and then bringing it home, adding your own toppings and cooking it in your oven.

With Google’s release of TensorFlow, the world was given access to a powerful library for numerical computation using data flow graphs in order to conduct machine learning and deep neural networks research. It can solve problems like handwriting recognition, image recognition, language translation, and speech recognition. For this example, we focused on image recognition and how we can use TensorFlow to explore how these “recognizers” work in practice.

The TensorFlow site provides a great suite of tutorials. I used the image retraining article How to Retrain Inception’s Final Layer for New Categories to start building a custom image classifier based on Inception. Inception is a codename for a deep convolutional neural network (CNN) architecture that achieves the new state of the art for classification and detection on the ImageNet dataset.

The advantage of doing image retraining, instead of training a classifier from scratch, is that we can take advantage of Transfer Learning.

Transfer learning is less a technique and more of a useful attribute of deep learning networks that we can leverage to train models with less data. For example, a deep neural network trained to translate between English and French can be retrained to translate to another language with less data because the higher level layers capture mathematical relations or abstractions about words.

Rarely does someone have access to a large and varied dataset sufficient to train a CNN from scratch. We only have to retrain the final layer in neural network that really distinguishes our categories of images while leveraging all the intermediate layers that were trained on ImageNet to recognize higher level abstract image patterns.

Source: A Beginner’s Guide To Understanding Convolutional Neural Networks

Setting up TensorFlow: Add your ingredients

  • First you need to have Python 3.5.x and Pip installed.
  • Install TensorFlow, following this guide.
  • Clone the TensorFlow repository:
    git clone https://github.com/tensorflow/tensorflow.git
  • Navigate to the directory:
    cd ./tensorflow/tensorflow/examples/image_retraining/, which contains the source required to run a retraining process against the Inception model.

Prepare your training data: Assemble the pizza

In order to teach your model to recognize new categories of images, we need to collect a set of images per category. We will use a portion of these to retrain Inception to recognize the new categories, and reserve a smaller portion to test our model’s accuracy with. A quick and easy way to acquire images of a category is to use Google Image search, and a Chrome extension called Gallerify.

Apple or Ball?

Inspired by a conversation about how ML is like teaching a young child in some ways with Jamie, our communications director and mother of a toddler, I wanted to build a classifier that can distinguish between red balls and red apples. Her toddler had learned what a ball was but was pointing at an apple saying “ball”. He needed more training examples to correctly classify these two objects.

So after issuing Google Image searches for “red balls” and “red apples”, I used the Gallerify extension to download the images and placed them into two different named folders. You can do the same with whatever images you are using.

Next, create a training_data/ subfolder in the image_retraining/ directory, in which you’ll place the two folders with your images per category. The Inception retraining process uses the directory names as the label for the category of image it’ll classify as. Move some of the images in each category into a test_data/ folder that won’t be used for the retraining process.

Folder organization for training data
Examples of training data for “red apples”
Examples of training data for “red balls”

Important to note: We’re using this image data as examples that Inception’s CNN model will use to capture a mathematical model that differentiates features in the images. The only explicit distinction we’re making is the label associated to each image based on the folder name it’s contained in and our selection of a CNN multi-label classifier. We’re not coding any specific rules or logic that distinguishes the two classes.

Because of this, it’s really important that the data we provide as training examples is clean, representative, and provides good coverage of all the types of input we expect to use in order to get accurate results from our model. If the training data doesn’t accurately represent the types of data we expect the model to predict against in the future, we could be introducing error or inaccuracy into the model inadvertently. For example, I removed an image of a red ball cartoon game character because it wasn’t a good representation of red balls.

Re-Train the model: Bake the pizza

With our training data setup, we’re now ready to retrain the Inception model. We’ll do this by executing the python module retrain.py in the image_retraining/ directory:

python retrain.py --how_many_training_steps 500 -- output_graph=./retrained_graph.pb --output_labels=./retrained_labels.txt --image_dir=./training_data

This will run the retraining algorithm with mostly default options, but you can control additional options by specifying additional command line arguments. You can get a full listing by using the -h flag:

python retrain.py -h

The program will output logs to the console through the training process:

INFO:tensorflow:Looking for images in 'Red Apple'WARNING:tensorflow:WARNING: Folder has less than 20 images, which may cause issues.
INFO:tensorflow:Looking for images in 'Red Ball'
WARNING:tensorflow:WARNING: Folder has less than 20 images, which may cause issues.
>> Downloading inception-2015-12-05.tgz 100.0%
INFO:tensorflow:Successfully downloaded inception-2015-12-05.tgz 88931400 bytes.
Extracting file from ./inception-march-3-2018/inception-2015-12-05.tgz
Model path: ./inception-march-3-2018/classify_image_graph_def.pb

First, it looks for images in your training data folders, then it downloads a version of a previously trained Inception model because we’re only retraining the last layer. You will see the script starting to create bottleneck files:

INFO:tensorflow:Creating bottleneck at ./bottlenecks/Red Apple/redapple_142.jpg_inception_v3.txt
INFO:tensorflow:Creating bottleneck at ./bottlenecks/Red Apple/redapple_143.jpg_inception_v3.txt
INFO:tensorflow:Creating bottleneck at ./bottlenecks/Red Apple/redapple_144.jpg_inception_v3.txt
INFO:tensorflow:Creating bottleneck at ./bottlenecks/Red Apple/redapple_145.jpg_inception_v3.txt
INFO:tensorflow:Creating bottleneck at ./bottlenecks/Red Apple/redapple_146.jpg_inception_v3.txt
INFO:tensorflow:Creating bottleneck at ./bottlenecks/Red Apple/redapple_147.jpg_inception_v3.txt
INFO:tensorflow:Creating bottleneck at ./bottlenecks/Red Apple/redapple_149.jpg_inception_v3.txt

Bottleneck’ is an informal term used for the layer just before the final output layer that actually does the classification. By default they’re stored in the /tmp/bottleneck directory, and if you rerun the script they’ll be reused so you don’t have to wait for this part again.

If you encounter a bad apple 🍎, with an error stack trace like the one below:

INFO:tensorflow:Creating bottleneck at ./bottlenecks/Red Apple/redapple_136.jpg_inception_v3.txt
Not a JPEG file: starts with 0x89 0x50
Traceback (most recent call last):
File "retrain.py", line 1486, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/Users/barnhart/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "retrain.py", line 1294, in main
export_model(model_info, class_count, FLAGS.saved_model_dir)
File "/Users/barnhart/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1214, in __exit__
exec_type, exec_value, exec_tb)
File "/Users/barnhart/anaconda/envs/py35/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/barnhart/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3625, in get_controller
yield default
File "retrain.py", line 1187, in main
bottleneck_tensor, FLAGS.architecture)
File "retrain.py", line 500, in cache_bottlenecks
resized_input_tensor, bottleneck_tensor, architecture)
File "retrain.py", line 442, in get_or_create_bottleneck
bottleneck_tensor)
File "retrain.py", line 397, in create_bottleneck_file
str(e)))
RuntimeError: Error during processing file ./training_data/Red Apple/redapple_136.jpg (Invalid JPEG data, size 4068[[Node: DecodeJpeg_1 = DecodeJpeg[acceptable_fraction=1, channels=3, dct_method="", fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_DecodeJPGInput_0)]]

… it means the image is not a valid JPEG format file (Gallerify possibly saved it as a different format than its original). Try deleting that image from your training data, and pick a replacement. You can rerun the same retrain script, and it will pick up where you left off.

Once it’s able to create bottleneck files for all your training data images, it can begin to train the last layer of Inception:

INFO:tensorflow:2018-03-05 12:09:05.076961: Step 0: Train accuracy = 61.0%
INFO:tensorflow:2018-03-05 12:09:05.077692: Step 0: Cross entropy = 0.599025
INFO:tensorflow:2018-03-05 12:09:05.213300: Step 0: Validation accuracy = 51.0% (N=100)
INFO:tensorflow:2018-03-05 12:09:06.192184: Step 10: Train accuracy = 100.0%
INFO:tensorflow:2018-03-05 12:09:06.192382: Step 10: Cross entropy = 0.253409
INFO:tensorflow:2018-03-05 12:09:06.296140: Step 10: Validation accuracy = 100.0% (N=100)
INFO:tensorflow:2018-03-05 12:09:07.352922: Step 20: Train accuracy = 100.0%
INFO:tensorflow:2018-03-05 12:09:07.353142: Step 20: Cross entropy = 0.156156
INFO:tensorflow:2018-03-05 12:09:07.464078: Step 20: Validation accuracy = 100.0% (N=100)
INFO:tensorflow:2018-03-05 12:09:08.432503: Step 30: Train accuracy = 100.0%
INFO:tensorflow:2018-03-05 12:09:08.432636: Step 30: Cross entropy = 0.103196
INFO:tensorflow:2018-03-05 12:09:08.526025: Step 30: Validation accuracy = 100.0% (N=100)
INFO:tensorflow:2018-03-05 12:09:09.487891: Step 40: Train accuracy = 100.0%
INFO:tensorflow:2018-03-05 12:09:09.488104: Step 40: Cross entropy = 0.085478

Every 10 steps of the training model it reports out the following:

  • Training accuracy: How well the classifier works against the training set
  • Cross entropy: A loss function that measures how well the learning is progressing
  • Validation accuracy: How well the classifier works against a reserved set of images for testing

After 500 steps, you’ll see log messages indicating that your model is trained and saved:

INFO:tensorflow:2018-03-05 12:09:55.249556: Step 499: Train accuracy = 100.0%
INFO:tensorflow:2018-03-05 12:09:55.249698: Step 499: Cross entropy = 0.007347
INFO:tensorflow:2018-03-05 12:09:55.341729: Step 499: Validation accuracy = 100.0% (N=100)
Model path: ./inception-march-3-2018/classify_image_graph_def.pb
INFO:tensorflow:Restoring parameters from /tmp/_retrain_checkpoint
INFO:tensorflow:Final test accuracy = 100.0% (N=5)
Model path: ./inception-march-3-2018/classify_image_graph_def.pb
INFO:tensorflow:Restoring parameters from /tmp/_retrain_checkpoint
INFO:tensorflow:Froze 2 variables.
Converted 2 variables to const ops.
Model path: ./inception-march-3-2018/classify_image_graph_def.pb
INFO:tensorflow:Restoring parameters from /tmp/_retrain_checkpoint
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'/tmp/saved_models/1/saved_model.pb'

Testing the model: Taste the pizza

Ok, so we have a model saved to disk. Now what?

We can test our model with images we excluded from the training process, or find completely new ones, and see how well our model generalizes — this is a model’s ability to correctly categorize previously unseen images. We’ll also see how the algorithm returns a statistical confidence associated with each category for a particular image it’s trying to classify.

But first, we need some code that allows us to read a new image from disk (specified as a command line argument path to the image), load our trained model from disk, and apply the model to the image to get predictions. The following label_image.py Python script accomplishes this:

Our first test subject (redapple_003.jpg)

Now we can run the script and test our first image by executing the command:

python label_image.py ~/Dev/experiment/tensorflow/ic_test_data/red_apple/redapple_003.jpg

which results in:

red apple (score = 0.99650)
red ball (score = 0.00350)

Our model is 99.6% confident that this is a red apple, or only 0.03% sure that it could be a red ball. It’s pretty obviously an apple so that’s good. But you can see that there is a probability distribution that is returned, so the end system may have to use thresholds (like > 75% confident) if we were to return a single answer. For example, let’s try a harder image:

A trickier test subject (redapple_ball.jpg)

For this image that looks like an artificial apple toy, our classifier results are:

red apple (score = 0.60262)
red ball (score = 0.39738)

This image is definitely more confusing as it contains elements of both apples and balls, and it may be incorrect to classify it one way or the other.

When building a software application that makes use of these types of models, it’s important to handle these cases depending on the functional use case requirements. It may be that we program a threshold that, if our confidence is not high enough for a particular category, tells the user we can’t classify it with enough accuracy. Or perhaps we return the highest three possible labels. One thing we’ll definitely want to do is flag these types of inputs for our model builders to review as it is a unique case and we may want to include it in training data for the next version of the model.

A note on unintended consequences

You may have noticed that my training data only contains instances of red apples and red balls. This choice was intentional, and I correspondingly included the color in the label so that people are aware of the algorithmic biasfor this model.

If, however, I had just chosen the label apple and ball, but still my training data only had red instances of these, then we’re unintentionally codifying the color of the object associated to that label by the images we chose for the training data. For example, what happens when I submit instances of these objects with different colors?

red ball (score = 0.54413), red apple (score = 0.45587)

In the case of this yellow apple, our classifier returns almost a 50/50 chance it’s either a ball or an apple. For an ML practitioner, this outcome is expected when you look at the training set data. There are no instances of yellow apples, and most instances are real pictures of apples not clipart solid color style apples.

red ball (score = 0.56656), red apple (score = 0.43344)

Here we see the classifier is still a bit unsure about whether this should be a red ball or an apple, it seems a little more like a red ball but not with great confidence. If we were trying to classify any type and color of ball, then we’d certainly want to add this, and other color variations, to our training set.

red ball (score = 0.95317), red apple (score = 0.04683)

Interestingly, the “Red Planet” is 95% likely to be a red ball, which in an abstract sense is true. But this just reiterates that appropriate use of the model for the cases it was trained is still important.

Transfer learning is a good place to start

As we’ve seen here, making use of deep learning models and retraining the final classification layer is a good way to start with your first machine learning experiments. If you have a problem that involves image recognition, speech recognition, or language translation and have a smaller dataset you’d like to apply to see how it works in practice, then TensorFlow is an effective way to rapidly prototype models and use cases.

Want to find out how you can start using machine learning in your business? Our robots can help….and by robots, we mean the talented data science experts at Robots and Pencils. 🤖 👋

Randall Barnhart: teacher of machines, lover of humans, data science practice lead at Robots and Pencils.

Got another minute? You may also like:

--

--

A digital innovation firm. We help our clients use mobile, web & frontier technologies to transform their businesses and create what’s next.