They likely went with this form factor because it allows for the camera to look forward and use that context in its responses. A watch wouldn't easily be able to do this.
You could make a smartwatch with a camera facing up from the screen. The user could bring their palm to their chest (so watchface faces out) to activate it. Then the camera can see forward and the microphone is close to the user.
And then you could do video calls on the same device too.
This has the added benefit of having some recognizable sign that someone is using the camera... which despite proclamations that "the public" is ready to accept being on camera all the time, I'm not convinced is true when it's someone wearing an overt device pointed at you and possibly recording, but you're not quite sure, all the time.
I for one have no particular desire to be part of your "context" (nor the company's training data set) without knowing it.
If so, they made a big bet. Vision LLMs were literally made this year. Before that, parsing images to get a coherent response is pretty resource intensive and not really reliable at all. Designing the entire device around image capturing for context seems like a very risky approach so I doubt that was their main reason.
kevinsundar|2 years ago
And then you could do video calls on the same device too.
abeyer|2 years ago
I for one have no particular desire to be part of your "context" (nor the company's training data set) without knowing it.
newaccount74|2 years ago
Anyway, it's cold season now in Europe, and it's interesting how much less useful my Apple watch is when it's almost always covered with a sleeve.
Something that goes on top of your clothes makes sense.
tomohelix|2 years ago
mlhpdx|2 years ago