top | item 44964800

How can AI ID a cat?

187 points| sonabinu | 7 months ago |quantamagazine.org | reply

69 comments

order
[+] bdcravens|7 months ago|reply
I have six animals, and Apple Photos does a great job of recognizing them by name after I labeled them the first time (the office dog as well). Two of them however are gray tabbies (brothers) and it can't distinguish them, so I had to name them with an ampersand ("Harley & Ralph Lauren")

Impressed that it can do as well as it does, I just find that amusing.

[+] javchz|7 months ago|reply
The same with Google photos, it groups similar cats as just one. Fun fact does the same for human twins
[+] mshockwave|7 months ago|reply
Came to say Apple also did a great job on tagging my bois who are both grey-ish cats, even in pictures they faced backward, no idea how they did that
[+] megaloblasto|7 months ago|reply
This is a nice article but it fails to mention something important. Beyond the computer magic that makes neural networks so powerful, there is a massive human effort, often from people in Sub-Saharan Africa, that spend all day labeling images, text, audio, etc for the major AI companies [1]. These workers are often exploited and treated as expendable.

It's not all just math. Real people are what make this work.

[1] https://www.theverge.com/features/23764584/ai-artificial-int...

[+] runjake|7 months ago|reply
> These workers are often exploited and treated as expendable.

So, common ground with a lot of Hacker News audience?

Don't take me too seriously here, and not to excuse anything but what would these people be doing if they weren't data labeling? How would they be treated differently?

Presumably, they'd be working for some other multinational, because overall their quality of living is better than working at whatever other local industry exists?

The data labeling job itself strikes me as something dystopian. As if we're the work mules for our AI overlords.

[+] Noumenon72|7 months ago|reply
If someone has a simple task to do and has scoured the entire globe to find people who can do this task without being pulled away from more important work, they should be praised. Paying Americans prevailing wage for this would be simpler but it would hurt both America and Sub-Saharan Africa.
[+] astrobe_|7 months ago|reply
> These days, computers can easily recognize photos of cats, but that’s not because a clever programmer discovered a way to isolate the essence of “catness.”

It could have been. It did happen in some cases as computer vision didn't wait for neural networks (e.g. OCR). But to hijack a famous quote, "Neural networks are like violence - if it doesn't solve your problems, you are not using enough of it."

> A neuron with two inputs has three parameters. Two of them, called weights, determine how much each input affects the output. The third parameter, called the bias, determines the neuron’s overall preference for putting out 0 or 1.

So a neuron does very basic polynomial interpolation and by hooking them together you get polynomial regression. I don't know if it amusing or amazing that people use polynomial regression to write programs now.

[+] bc569a80a344f9c|7 months ago|reply
> So a neuron does very basic polynomial interpolation and by hooking them together you get polynomial regression

The article glosses over activation functions, which - if non-polynomial - give the entire neural networks non-linearity. A major inflection point was proving that neural networks architectures even with very few layers (as small as one) can approximate any continuous function.

https://en.m.wikipedia.org/wiki/Universal_approximation_theo...

[+] bc569a80a344f9c|7 months ago|reply
An interesting follow-up is using various xAI (explainable AI) techniques to then investigate what features in an image the classifier uses to make its decisions. Saliency maps work great for images. When I was playing around with it, the binary classifier I trained from scratch to distinguish cats from dogs ended up basically only looking at eyes. Enough images in the dataset featured cats with visible, open eyes, and the vertical slit is an excellent predictor. It was an interesting lesson that also emphasized how much the training data matters.
[+] cco|7 months ago|reply
ExAI feels like a better shortening, both for clarity and given that xAI is a company already.
[+] krackers|7 months ago|reply
This article seemed really basic, no insight other than "it learns the high dimensional manifold on which cat images lie, thus separating cats from non-cats" (not that simple explanations are bad, but Quanta articles seem to be getting more watered down over time).

The real question is whether we can get some insight as to how exactly it's able to do this. For convolution neural networks it turns out that you can isolate and study the behavior of individual circuits and try to understand what "traditional image processing" function they perform, and that gives some decent intuition: https://distill.pub/2020/circuits/ - CNNs become less mysterious when you break them down as being decomposed into "edge detectors, curve detectors, shape classifiers, etc."

For LLMs it's a bit harder, but anthropic did some research in this vein.

[+] cmpalmer52|7 months ago|reply
Just an anecdote, but back in college, I had an algorithms professor who gave us a classifier problem like the square and triangle boundary problem. His English was poor and nobody understood the problem as he stated it. I got an okay score on it, but never understood it very well.

Anyway, it’s 40 years later and I just read this article and said, “Oh! Now I get it.” A little too late, for Dr. Hippe’s class.

[+] kridsdale3|6 months ago|reply
I sometimes wonder how much better my grades in college could have been, or what advanced math I could have picked up which I abandoned, if my professors had had basic English skill. I'm sure they were great scientists, but assigning them to teach was not helping anyone.
[+] Findecanor|7 months ago|reply
Identification has two components: recognition and authentication.

I'm not an expert on neural networks, but from what all I've heard, current systems can only be trained to be really good at doing the former.

I once used to have a tabby cat. When it ran away, I put up posters with a picture and description. I got several calls about cats in the neighbourhood that had the same tabby colour scheme (recognition). And from a distance they indeed looked the same. But close up, they each had a different eye colour, colour of the nose, or length of its white "socks" on its paws. (authentication)

To do the second step, the system would need to be trained not just on raw pixel data but also on which features to look for to distinguish one cat from another. I think that current system could be brute-forced to do this, somewhat, by training also on negative examples ... but I feel like that is suboptimal.

[+] BobbyTables2|7 months ago|reply
Wasn’t it “Hitchhikers Guide to the Galaxy” that humorously described an AI controlled train system failing because it was looking at the clock instead of the trains?

Seems extremely prescient…

[+] spacecadet|7 months ago|reply
Many years ago one of our cats got out, she was gone for 3 weeks, we tracked her down using 6 game cameras. Long story short, I have 200,000 images of "wild life"... Last year I used a VLM to catalog all of the images by generating detailed descriptions. I was able to find images of our cat in 3 searches, the same images we used to identify her originally, which took hours each day combing through thousands of images.
[+] isopede|7 months ago|reply
Neat. Anyone know what is used to make the animations? I like the graphic design!
[+] cwmoore|7 months ago|reply
Small but effective visual cues, smooth and carefully chromatic.

I am struck by the conceptual framework of classification tasks so snappily rendering clear categories from such fuzziness.

[+] busymom0|7 months ago|reply
Probably one of the first articles on this topic which I have read to the finish line and understood everything fully. Thanks.
[+] globalnode|7 months ago|reply
Same here, I've never done any study of these things other than learning a bit about gradient descent out of interest. But the idea that these networks work as classifiers by figuring out boundary regions was more interesting than I previously believed.
[+] reilly3000|7 months ago|reply
Long have I wanted a cat door that would only open for my cats, not the mean neighborhood one that eats their food. I can’t be the only one. I’ve been meaning to try to build one with a camera, rPi and Google Coral, but never got around to it. There’s the matter of the locking mechanism and more.
[+] DannyBee|7 months ago|reply
I have built two of these for dogs. It's really not hard,w hether you go completely from scratch or use something premade.

If you want something mostly premade,go get an autoslide. If you want to do it completely from scratch:

1. RFID/bluetooth proximity is much easier to work with than camera + rpi + AI. For the usecase you are talking about, AI is not just overkill, but will make it actively harder to achieve your goal

2. Locking is pretty easy depending on motor mechanism - either a cheap relay'd magnetic lock, or simply a motor that can't be backdriven easily.

Motor wise, you can either use the rack and pinion style that autoslide does, or a simple linear motor if you don't want to deal with gear tracks.

Overall, i went the autoslide route and had it all set up and working in an hour or two.

[+] darkwater|7 months ago|reply
That's t'he definition of (entertaining) overengineering: since every house cat should have an RFID chip already, there are doors that use that already. 4 AA batteries, "low-tech" enough, it just works
[+] dehrmann|7 months ago|reply
Take a look at SureFlap and OnlyCat. They use RFID chips in the cats.
[+] Findecanor|7 months ago|reply
Long ago I read about an automatic cat door that operated simply on the colour of the cat. It worked because the cat was the only red cat in the neighbourhood.
[+] StrandedKitty|7 months ago|reply
For some reason I thought this article would explain how to ID a specific cat, that is basically facial recognition for cats.

Is this even something that's possible with current tech? Like, surely cats have some facial features that can be used to uniquely identify them? It would be cool to have a global database of all cats that users would be able to match their photos against. Imagine taking a picture of a cat you see on the street, and it immediately tells you the owner's details and whether it's missing.

[+] tanelpoder|7 months ago|reply
I wrote the CatBench vector search playground toy app exactly for this reason! [1] ("cat-similarity search for recommendation engines and cat-fraud detection"). I built it both for learning & fun, but also it's useful for demoing vector search functionality, plugged in to regular RDBMS application schemas in business context. I used cats & dogs as it's something everyone understands, instead of diving deep into some narrow industry vertical specific use case.

[1]: https://tanelpoder.com/posts/catbench-vector-search-query-th...

[+] dhosek|7 months ago|reply
I imagine when they run out of other sensors to add to our phones, they’ll add chip readers so you can just scan for the implanted microchip on a cat you encounter. (said semi-sarcastically since the tech requires close proximity between animal and reader which most cats you encounter on the street will not countenance)
[+] joshvm|7 months ago|reply
Yes, I've worked in this space for dogs (for re-identifying animals that have been vaccinated for rabies). It's a very difficult problem, but mostly because getting/scraping good training data is difficult. You really want lots of paired images of the same animal and that's hard compared to searching for "cat". Plus the usual challenges: animals don't like to stay still so getting good pictures is hard and users must have good guidance for lighting/pose to get the best results. Human facial recognition benefits from strong commercial interest and the most robust methods rely on extras like 3D scanning.

Tricks include facial alignment + cropping and very strong constraints on orientation to make sure you have a good frontal image (apps will give users photo alignment markers). Otherwise it's a standard visual seatch. Run a face extraction model to get the crop, warp to standard key points, compute the crop embedding, store in a database and do a nearest neighbour lookup.

There are a few startups doing this. Also look at PetFace which was a benchmark released a year or so ago. Not a huge amount of work in this area compared to humans, but it's of interest to people like cattle farmers as well.

https://github.com/mapooon/PetFace

[+] trjordan|7 months ago|reply
One of the funny things about LLMs and modern AI is that "the ability to recognize a cat" isn't a trained behavior anymore, as described here. It's an emergent property of training it to predict a lot of things, and cats happens to be present enough in the data such that they're one of the things you can ask a larger model and have it work.

My favorite work on digging into the models to explain this is Golden Gate Claude [0]. Basically, the folks at Anthropic went digging into the many-level, many-parameter model and found the neurons associated with the Golden Gate Bridge. Dialing it up to 11 made Claude bring up the bridge in response to literally everything.

I'm super curious to see how much of this "intuitive" model of neural networks can be backed out effectively, and what that does to how we use it.

[0] https://www.anthropic.com/news/golden-gate-claude

[+] npteljes|7 months ago|reply
Fun fact: we keep rabbits, and the different random AIs that I have tried over the years classify them so often as cats, that a proper "rabbit" classification is rare to come by! The full versions of ChatGPT do it well now, even with trickier photos (when the rabbit keeps their ears flat for example).
[+] Veliladon|7 months ago|reply
I have a Finnish Lapphund dog and from the right angle AI thinks it's a cat.
[+] wkat4242|7 months ago|reply
Well that's pretty easy. AI is trained on internet content and it's not like there's a lack of cat pictures there lol