How I created the SeeFood (Not Hotdog) app in just a few hours

Spoiler alert! Back in 2017, the TV show Silicon Valley had a gag where one of the characters was tasked to create an image recognition app. Dubbed as the “Shazam” for food it’s purpose was to give the user the ability to take a picture of any type of food and immediately be told what it was. The punch line was that the creator (Jìan-Yáng) couldn’t be bothered to train the app for every type of food in existence so he instead decided to make an app that could detect if a piece of food was either a hotdog or not a hotdog. You can see the clip here (beware some foul language).

I was inspired to recreate this app as it would be an easy and quick model to train and something that might end up being fun to play around with. In this writing I plan to go over and explain the Python code. Be sure to check the links at the end for some more in depth info as well as the running application that you can test out!

To preface, the reason this was able to be accomplished so quickly, as well as in such few lines of code, is because fast.ai offers a library that abstracts a lot of the complex inner workings of PyTorch while presenting interfaces to simplify a lot of the work behind the scenes such as fine tuning models. Generally it works very well. Not as good as manually hand tuning via lower level functions, but if you are just beginning deep learning, it’s a great starting point.

So the first step is to install the necessary dependencies which is pretty much just fast.ai and DuckDuckGo’s library as we will use it to automatically download the images that we wish to tune our model with.

pip install -U fastai duckduckgo_search

Now we can create our Python Script of which I’m calling create_model.py.

So first perform our imports and create a couple of helper functions:

from duckduckgo_search import ddg_images
from fastai.vision.all import download_images, resize_images, verify_images, get_image_files, ImageBlock, \
   CategoryBlock, RandomSplitter, parent_label, ResizeMethod, Resize, vision_learner, resnet18, error_rate, \
   L, Path, DataBlock


def search_images(search_term, max_images=30):
   print(f"Searching for '{search_term}'")
   return L(ddg_images(search_term, max_results=max_images)).itemgot('image')


def search_and_populate(search_term, category, file_path, max_images=30):
   dest = (file_path/category)
   dest.mkdir(exist_ok=True, parents=True)
   download_images(dest, urls=search_images(f'{search_term} photo', max_images=max_images))
   resize_images(file_path/category, max_size=400, dest=file_path/category)

We have one method named search_images() which searches via a given search term and returns max_images number of images via DuckDuckGo. The other method search_and_populate() calls the previous method while also downloading and dumping them all to a given category folder which is necessary as we will be separating our results by folder in order to establish the needed labels for the model tuning. Essentially we will have a series of hotdog and not_hotdog images.

So we go ahead and pull a bunch of hotdog images and then even more not hotdog images via various other items of food as one would assume that people will be trying to cross reference items of food when testing the model:

path = Path('seefood')
search_and_populate("hotdog", "hotdog", path, max_images=90)
for o in ['burger', 'sandwich', 'fruit', 'chips', 'salad']:
   search_and_populate(o, "not_hotdog", path, max_images=30)

Be aware that downloading these images and not verifying them manually could allow images that are inaccurate to muddy your results. One relevant example in this case is this dog in a hotdog costume.

While an understandably funny error, I’ve gone ahead and excluded it from the rest in order to avoid the risk of dogs being recognized as hotdogs during later testing. (I will also leave a link to a Kaggle notebook at the end of this blog in which I go through this same process but also show how you can visually weed out low confidence images without having to go through and manually check them all.)

After that, we need to clear out any images that appear to be invalid as they may have pulled something that will not be valid to the model when we start tuning:

failed = verify_images(get_image_files(path))
failed.map(Path.unlink)
print(f"{len(failed)} failed images")

Then, we simply create a DataBlock object which is an object used to build both DataSets and DataLoaders. A DataSet stores our image samples with the applied labels, while a DataLoader wraps a DataSet in order to make it iterable and easier to navigate.

dls = DataBlock(
   blocks=(ImageBlock, CategoryBlock),
   get_items=get_image_files,
   splitter=RandomSplitter(valid_pct=0.2),
   get_y=parent_label,
   item_tfms=[Resize(256, ResizeMethod.Squish)]
).dataloaders(path, bs=32)

(Note that we ran into errors running the dataloaders command on Intel based Macs and weren’t able to complete the process via fast.ai)

Thanks to the magic of fast.ai we can now fine tune our model:

learn = vision_learner(dls, resnet18, metrics=error_rate) learn.fine_tune(3)

And finally, we can export it into a Python Pickle object to be used later for our application:

learn.export("hotdogModel.pkl")

Okay! So now we can load the model into a simple Gradio application to quickly test things out.

See app.py below:

from fastai.vision.all import *
import gradio as gr

learn = load_learner("hotdogModel.pkl")


def classify_image(image: Image):
   prediction, index, probability = learn.predict(image)
   return "is_hotdog.png" if prediction == "hotdog" else "not_hotdog.png"


examples = ["examples/hotdog.jpg", "examples/burger.jpg", "examples/sandwich.jpg", "examples/dog.jpg",
           "examples/hotdog_dog.jpg", "examples/fancy_hotdog.jpg"]

iface = gr.Interface(fn=classify_image, inputs="image", outputs="image", examples=examples, allow_flagging="never")
iface.launch(inline=False)

It loads the model and creates a simple interface displaying some example images while also allowing the uploading of your own custom images.

The results are pretty impressive and it’s exciting to see how accurate image recognition can be nowadays.

See some examples of the app running below:

alt text

While this is obviously a silly use case with not a lot of real world usage, it was a fun project to spend an afternoon on and presents some real opportunities. It’s really amazing how easy it can be to implement something of more value such as humans vs cars for surveillance, animal breed detection, or creating the actual Shazam for food. I hope you found some inspiration and look forward to any feedback you may send my way.

Although I haven’t had much luck doing so, If you want to play with the app and try to fool it, check this link: https://huggingface.co/spaces/ZettaFi/SeeFood

If you would like to learn more from the code I’ve shown above and see how you can also manually prune high error rate images you can find a Python notebook explaining all of this here: https://www.kaggle.com/bradporter/seefood-hotdog-detection-training

As well as the source code for both create_model.py and app.py here: https://github.com/Zettafi/see-food-example

If you are interested in in learning more on Deep Learning I recommend fast.ai’s course: https://course.fast.ai/

Back to all blogs