top | item 45956906

(no title)

Two_hands | 3 months ago

Thats a cool idea, thanks for sharing! It's cool to see other uses for a model I built for a completely different task.

Is there any research/papers on this type of autism diagnosis tools for babies?

To your last point, yes I agree. Even the task I setup the model for is relatively easy compared to proper gaze tracking, I just rely on large datasets.

I suppose you could do it in the way you say and then from that gather data to eventually build out another model.

I'll for sure look into this, appreciate the idea sharing!

discuss

_menelaus|3 months ago

Idk of any research, sorry, just going from memory from a few years ago. Feel free to lmk if you ever have any questions about mobile gaze tracking, I spent several years on it. Can you DM on here? Idk.

FYI: I got funding and gathered really big mobile phone gaze datasets and trained CNN models on them that got pretty accurate. Avg err below 1cm.

The whole thing worked like this:

Mech Turk workers got to play a mobile memory game. A number flashed on the screen at a random point for a second while the phone took a photo. Then they had to enter it. If they entered it correctly, I assumed they were looking at that point at the screen when the photo was taken and added it to the gaze dataset. Collecting clean data like this was very important for model accuracy. Data is long gone, unfortunately. Oh and the screen for the memory game was pure white, which was essential for a reason described below.

The CNN was a cascade of several models:

First, off the shelf stuff located the face in the image.

A crop of the face was fed to an iris location model we trained. This estimated eye location and size.

Crops of the eyes were fed into 2 more cycles of iris detection, taking a smaller crop and making a more accurate estimation until the irises were located and sized to within about 1 pixel. Imagine the enhance... enhance... trope.

Then crops of the super well centered and size-normalized irises, as well as a crop of the face, were fed together into a CNN, along with metadata about the phone type.

This CNN estimated gaze location using the labels from the memory game derived data.

This worked really well, usually, in lighting the model liked. Failed unpredictably in other lighting. We tried all kinds of pre-processing to address lighting but it was always the achilles heel.

To my shock, I eventually realized (too late) that the CNN was learning to find the phone's reflection in the irises, and estimating the phone position relative to the gaze direction using that. So localizing the irises extremely well was crucial. Letting it know what kind of phone it was looking at, and ergo how large the reflection should appear at a certain distance, was too.

Making a model that segments out the phone or tablet's reflection in the iris is just a very efficient shortcut to do what any actually good model will learn to do anyway, and it will remove all of the lighting variation. Its the way to make gaze tracking on mobile actually work reliably without infrared. Never had time to backtrack and do this because we ran out of funding.

The MOST IMPORTANT thing here is to control what is on the phone screen. If the screen is half black, or dim, or has random darkly colored blobs, it will screw with where the model things the phone screen reflection begins and ends. HOWEVER if your use case allows you to control what is on the screen so it always has for instance a bright white border, your problem is 10x easier. The baby autism screener would let you control that for instance.

But anyway, like I said, to make something that just determines if the baby is looking on one side of the screen or the other, you could do the following:

1. Take maybe 1000 photos of a sampling of people watching a white tablet screen moving around in front of their face. 2. Annotate the photos by labeling visible corners of the reflection of the tablet screen in their irises 3. Make simple CNN to place these

If you can also make a model that locates the irises extremely well, like to within 1px, then making the gaze estimate becomes sort of trivial with that plus the tablet-reflection-in-iris finder. And I promise the iris location model is easy. We trained on about 3000-4000 images of very well labeled irises (with circle drawn over them for the label) with a simple CNN and got really great sub-pixel accuracy in 2018. That plus some smoothing across camera frames was more than enough.

Anyway, hope some of this helps. I know you aren't doing fine-grained gaze tracking like this but maybe something in there is useful.

Two_hands|3 months ago

wow, this is great! You can't DM but my email is in my blog post, in the footnotes.

Do you remember the cost of Mech Turk? It was something I wanted to use for EyesOff but never could get around the cost aspect.

I need some time to process everything you said, but the EyesOff model has pretty low accuracy at the moment. I'm sure some of these tidbits of info could help to improve the model, although my data is pretty messy in comparison. I had thought of doing more gaze tracking work for my model, but at long ranges it just breaks down completely (in my experience, happy to stand corrected if you're worked on that too).

Regarding the baby screener, I see how this approach could be very useful. If I get the time, I'll look into it a bit more and see what I can come up with. I'll let you know once I get round to it.