Just wanted to say thanks for the warm welcome from HN when the app was released last month — I hope this blogpost answers the questions that were raised back then.
I’d be happy to answer anything else you’d like to know!
I am making an app that takes pictures and tries to tell you if the food in the picture has allergens. I didn't know if I should feel humble or just laugh. (I decided it was hilarious in the end) But it made me aim higher in a hackathon last weekend. I also use your app in my elevator pitch for people to understand.
Can you speak to the origins of it being 'not hotdog'?
Did you, or the writing staff or other consultants, start with hotdogs or dick picks?
It made me laugh because it reminded me of this popular lecture(o) that was, and is, passed around tech circles
Any relation? Or fun coincidence?
(o) https://youtu.be/uJnd7GBgxuA?t=5555s .. the lecture is given by andrej karpathy on 2016-09-24 and the timestamp goes to a point in the lecture where the lecturer discusses an interface made to have humans compete with conv nets to identify hotdogs
Probably only 20% of the world's hot dogs are just a basic hot dog with mustard on it. Once you move past one or two condiments, the domain of hot dogs identification along with fixings gets confusing from a computer vision standpoint.
Pinterest's similar images function is able to identify hotdogs with single condiments fairly well:
Having embedded tensorflow for on-site identification is all well and good for immediacy and cost, but if I can't really properly identify whether something is a hotdog vs. a long skinny thing with a mustard squiggle, what good does that do me? What would be the next step up in your mind?
I ask this as someone who is sincerely interested in building low cost, fun, projects.
While we’re here and chatting about this, I should say most of the credit for this app should really go towards the following people:
Mike Judge, Alec Berg, Clay Tarver, and all the awesome writers that actually came up with the concept: Meghan Pleticha (who wrote the episode), Adam Countee, Carrie Kemper, Dan O’Keefe (of Festivus fame), Chris Provenzano (who wrote the amazing “Hooli-con” episode this season), Graham Wagner, Shawn Boxee, Rachele Lynn & Andrew Law…
Todd Silverstein, Jonathan Dotan, Amy Solomon, Jim Klever-Weis and our awesome Transmedia Producer Lisa Schomas for shepherding it through and making it real!
Our kick-ass production designers Dorothy Street & Rich Toyon.
Meaghan, Dana, David, Jay, Jonathan and the entire crew at HBO that worked hard to get the app published (yay! we did it!)
I am glad I am not the only one with questions about the external GPU, I had considered trying that, but came to the conclusion that the data transfer between CPU to GPU would be too slow for ML tasks.
So, what is your opinion on this ? if you had to do it again would you use the eGPU or just use AWS or another GPU cloud service .
My takeaway is that local development has a huge developer experience advantage when you are going through your initial network design / data wrangling phase. You can iterate quickly on labeling images, develop using all your favorite tools/IDEs, and dealing with the lack of official eGPU support is bearable. Efficiency-wise it’s not bad. As far as I could tell the bottleneck ended up being on the GPU, even on a 2016 MacBook Pro with Thunderbolt 2 and tons of data augmentation done on CPU. It’s also a very lengthy phase so it helps that’s it’s a lot cheaper than cloud.
When you get into the final, long training runs, I would say the developer experience advantages start to come down, and not having to deal with the freezes/crashes or other eGPU disadvantages (like keeping your laptop powered on in one place for an 80-hour run) makes moving to the cloud (or a dedicated machine) become very appealing indeed. You will also sometimes be able to parallelize your training in such a way that the cloud will be more time-efficient (if still not quite money-efficient). For Cloud, I had my best experience using Paperspace [0]. I’m very interested to give Google Cloud’s Machine Learning API a try.
If you’re pressed for money, you can’t do better than buying a top of the line GPU once every year or every other year, and putting it in an eGPU enclosure.
If you want the absolute best experience, I’d build a local desktop machine with 2–4 GPUs (so you can do multiple training runs in parallel while you design, or do a faster, parallelized run when you are finalizing).
Cloud does not quite totally make sense to me until the costs come down, unless you are 1) pressed for time and 2) will not be doing more than 1 machine learning training in your lifetime. Building your own local cluster becomes cost-efficient after 2 or 3 AI projects
per year, I’d say.
It's interesting how amenable image classification neural networks are to the "take working model, peel off last layer or two, retrain for a new application" approach. I've seen this suggested as working pretty well in a few instances.
I guess the interpretation is that the first few normalize->convolution->pool->dropout layers are basically achieving something broadly analogous to the initial feature extraction steps that used to be the mainstay in this area (PCA/ICA, HOG, SIFT/SURF, etc.), and are reasonably problem-independent.
For sure, although I should say, for this specific instance I ended up training a network from scratch. I did get inspiration from the MobileNets architecture, but I did not keep any of the weights from their ImageNet training. That was shockingly affordable to do even on my very limited setup, and the results were better than what I could do with a retraining (mostly has to do with how finicky small networks can be when it comes to retraining).
Yes, that’s what you see in the picture, although as completely personal advice, I would stop short of recommending it. For one there are arguably better cases out there now, and you can sometimes build your own eGPU rig for less. Finally, the Mac software integration (with any eGPU) is very hacky at the moment despite the community’s best efforts, and I had to deal with a lot of kernel panics and graphics crashes, so overall I’m not sure I would recommend others attempt the same setup.
Nice write up that should become the go-to tutorial for TF and local training. Helped me a lot w/ the mobile part, it was a bit strange to thing about transfer the training when I read at first but it became clear in the second reading.
Pretty fascinating and encouraging to see how much was accomplished with a laptop and consumer GPU. Gave me some great ideas. Also happy to see Chicago dogs properly identified.
One of my primary motivators behind building this blogpost was to show how exactly one can use TensorFlow to ship a production mobile application. There’s certainly a lot of material out there, but a lot of it is either light on details, or only fit for prototypes/demos. There was quite a bit of work involved in making TensorFlow work well on a variety of devices, and I’m proud we managed to get it down to just 50MB or so of RAM usage (network included), and a very low crash rate. Hopefully things like CoreML on iOS and TensorFlow Lite on Android will make things even easier for developers in the future!
This is amazing - impressed by your persistence to source the training data yourself, that must have been tedious!
Did you try quantizing the parameters to shrink the model size some more? If so, how did it affect the results? It also runs slightly faster on mobile from my experience.
Great question — I did not, because I had unfortunately spent all of my data on that last training run, and I did not have a untainted dataset left to measure the impact of quantization on. (Just poor planning on my part really.)
It’s also my understanding at the moment that quantization does not help with inference speed or memory usage, which were my chief concerns. I was comfortable with the binary size (<20MB) that was being shipped and did not feel the need to save a few more MBs there. I was more worried about accuracy, and did not want to ship a quantized version of my network without being able to assess the impact.
Finally, it now seems that quantization may be best applied at training time rather than at shipping time, according to a recent paper by the University of Iowa & Snapchat [0], so I would probably want to bake that earlier into my design phase next time around.
At MongoDB World this past week they did a demo of stitch where they actually built something similar with no back end code required and used the Clarifai API and an angular front end. It took like less than 80 minutes and could like run on prod of I wanted.
Thanks for the kind words! To prevent impostor syndrome, I should clarify that I worked on the app for many, many months — basically since August of last year — as a nights/weekends thing. It’s true that the final version was built almost from scratch in a few weeks, but it wouldn’t have been possible without the time investment in the preceding months. Although for the most part I just wasted a lot of time because I had no idea what I was doing lol (still don’t)
Yes, I was very excited we were able to release it for Android… And even though we used React Native, there were so many native (and C++) bits, it ended up being quite complex!
As for the gear, I think it’s really damaging that so many people think Deep Learning is only for people with large datasets, cloud farms (and PhDs) — as the app proves, you can do a lot with just data you curate by hand, a laptop (and a lowly Master’s degree :p)
Love this architecture. I think Im going to adopt some of it for HungryBot, my nonprofits diet tracking research arm. I think on-phone predictions solves a lot of my affordability issues.
[+] [-] timanglade|8 years ago|reply
I’d be happy to answer anything else you’d like to know!
Original thread: https://news.ycombinator.com/item?id=14347211
Demo of the app (in the show): https://www.youtube.com/watch?v=ACmydtFDTGs
App for iOS: https://itunes.apple.com/app/not-hotdog/id1212457521
App for Android (just released yesterday): https://play.google.com/store/apps/details?id=com.seefoodtec...
[+] [-] zitterbewegung|8 years ago|reply
[+] [-] dagurp|8 years ago|reply
[+] [-] justifier|8 years ago|reply
Did you, or the writing staff or other consultants, start with hotdogs or dick picks?
It made me laugh because it reminded me of this popular lecture(o) that was, and is, passed around tech circles
Any relation? Or fun coincidence?
(o) https://youtu.be/uJnd7GBgxuA?t=5555s .. the lecture is given by andrej karpathy on 2016-09-24 and the timestamp goes to a point in the lecture where the lecturer discusses an interface made to have humans compete with conv nets to identify hotdogs
[+] [-] mamon|8 years ago|reply
[+] [-] ckirksey|8 years ago|reply
[+] [-] askl|8 years ago|reply
[+] [-] plumeria|8 years ago|reply
[+] [-] richardkeller|8 years ago|reply
[+] [-] timanglade|8 years ago|reply
[+] [-] mpeg|8 years ago|reply
[+] [-] tnecniv|8 years ago|reply
[+] [-] x2398dh1|8 years ago|reply
https://twitter.com/iotmpls/status/879381125541613568/photo/...
Probably only 20% of the world's hot dogs are just a basic hot dog with mustard on it. Once you move past one or two condiments, the domain of hot dogs identification along with fixings gets confusing from a computer vision standpoint.
Pinterest's similar images function is able to identify hotdogs with single condiments fairly well:
https://www.pinterest.com/pin/268175352794006376/visual-sear...
They appear to be using deep CNN's.
https://labs.pinterest.com/assets/paper/visual_search_at_pin...
Having embedded tensorflow for on-site identification is all well and good for immediacy and cost, but if I can't really properly identify whether something is a hotdog vs. a long skinny thing with a mustard squiggle, what good does that do me? What would be the next step up in your mind?
I ask this as someone who is sincerely interested in building low cost, fun, projects.
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] OJFord|8 years ago|reply
My condiments to the author, I see what you did there ;)
[+] [-] timanglade|8 years ago|reply
Mike Judge, Alec Berg, Clay Tarver, and all the awesome writers that actually came up with the concept: Meghan Pleticha (who wrote the episode), Adam Countee, Carrie Kemper, Dan O’Keefe (of Festivus fame), Chris Provenzano (who wrote the amazing “Hooli-con” episode this season), Graham Wagner, Shawn Boxee, Rachele Lynn & Andrew Law…
Todd Silverstein, Jonathan Dotan, Amy Solomon, Jim Klever-Weis and our awesome Transmedia Producer Lisa Schomas for shepherding it through and making it real!
Our kick-ass production designers Dorothy Street & Rich Toyon.
Meaghan, Dana, David, Jay, Jonathan and the entire crew at HBO that worked hard to get the app published (yay! we did it!)
[+] [-] loader|8 years ago|reply
[+] [-] giantwolf|8 years ago|reply
[+] [-] bluetwo|8 years ago|reply
[+] [-] latenightcoding|8 years ago|reply
[+] [-] timanglade|8 years ago|reply
When you get into the final, long training runs, I would say the developer experience advantages start to come down, and not having to deal with the freezes/crashes or other eGPU disadvantages (like keeping your laptop powered on in one place for an 80-hour run) makes moving to the cloud (or a dedicated machine) become very appealing indeed. You will also sometimes be able to parallelize your training in such a way that the cloud will be more time-efficient (if still not quite money-efficient). For Cloud, I had my best experience using Paperspace [0]. I’m very interested to give Google Cloud’s Machine Learning API a try.
If you’re pressed for money, you can’t do better than buying a top of the line GPU once every year or every other year, and putting it in an eGPU enclosure.
If you want the absolute best experience, I’d build a local desktop machine with 2–4 GPUs (so you can do multiple training runs in parallel while you design, or do a faster, parallelized run when you are finalizing).
Cloud does not quite totally make sense to me until the costs come down, unless you are 1) pressed for time and 2) will not be doing more than 1 machine learning training in your lifetime. Building your own local cluster becomes cost-efficient after 2 or 3 AI projects per year, I’d say.
[0]: https://www.paperspace.com/ml
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] thearn4|8 years ago|reply
I guess the interpretation is that the first few normalize->convolution->pool->dropout layers are basically achieving something broadly analogous to the initial feature extraction steps that used to be the mainstay in this area (PCA/ICA, HOG, SIFT/SURF, etc.), and are reasonably problem-independent.
[+] [-] timanglade|8 years ago|reply
[+] [-] cedric|8 years ago|reply
[+] [-] timanglade|8 years ago|reply
[+] [-] rogerb|8 years ago|reply
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] tuna|8 years ago|reply
[+] [-] nganig|8 years ago|reply
[+] [-] tebica|8 years ago|reply
[+] [-] timanglade|8 years ago|reply
[+] [-] ckirksey|8 years ago|reply
[+] [-] laibert|8 years ago|reply
Did you try quantizing the parameters to shrink the model size some more? If so, how did it affect the results? It also runs slightly faster on mobile from my experience.
[+] [-] timanglade|8 years ago|reply
It’s also my understanding at the moment that quantization does not help with inference speed or memory usage, which were my chief concerns. I was comfortable with the binary size (<20MB) that was being shipped and did not feel the need to save a few more MBs there. I was more worried about accuracy, and did not want to ship a quantized version of my network without being able to assess the impact.
Finally, it now seems that quantization may be best applied at training time rather than at shipping time, according to a recent paper by the University of Iowa & Snapchat [0], so I would probably want to bake that earlier into my design phase next time around.
[0]: https://arxiv.org/abs/1706.03912
[+] [-] vinum_sabbathi|8 years ago|reply
[+] [-] kenwalger|8 years ago|reply
Have a look at their sample PlateSpace app: https://github.com/mongodb/platespace
Very cool new service and some excellent tutorials as well, for example for the PlateSpace web app: https://docs.mongodb.com/stitch/getting-started/platespace-w...
I'd definitely recommend having a look.
[+] [-] SmellTheGlove|8 years ago|reply
Any chance the full source will ever be opened up? Would be an excellent companion to the article.
[+] [-] timanglade|8 years ago|reply
In the meantime, iff there are any details you’d like to see, don’t hesitate to chime in and I’ll try to respond with details!
[+] [-] tmaly|8 years ago|reply
[+] [-] timanglade|8 years ago|reply
[+] [-] quotewall|8 years ago|reply
[+] [-] timanglade|8 years ago|reply
As for the gear, I think it’s really damaging that so many people think Deep Learning is only for people with large datasets, cloud farms (and PhDs) — as the app proves, you can do a lot with just data you curate by hand, a laptop (and a lowly Master’s degree :p)
[+] [-] subcosmos|8 years ago|reply
https://www.infino.me/hungrybot
Great work!
[+] [-] john_borkowski|8 years ago|reply
How did you source and categorize the initial 150K of hotdogs & not hotdogs?
[+] [-] unknown|8 years ago|reply
[deleted]