top | item 43227900

(no title)

This was already posted here: https://news.ycombinator.com/item?id=43221377 but I’m really surprised at the lack of attention this model is getting. The responsiveness and apparent personality are pretty mind blowing. It’s similar to what OpenAI had initially demoed for advanced voice mode, at least for the voice conversation portion.

The demo interactions are recorded, which is mentioned in their disclaimer under the demo UI. What isn't mentioned though is that they include past conversations in the context for the model on future interactions. It was pretty surprising to be greeted with something like "welcome back" and the model being able to reference what was said in previous interactions. The full disclaimer on the page for the demo is:

" 1. Microphone permission is required. 2. Calls are recorded for quality review but not used for ML training and are deleted within 30 days. 3. By using this demo, you are agreeing to our "

edit: Actually this has been posted quite a few times already and had good visibility a couple days ago: - https://news.ycombinator.com/item?id=43200400 Others: https://hn.algolia.com/?q=sesame.com

discuss

hn_user82179|1 year ago

It was genuinely startling how human it felt. Apparently they are planning on open-sourcing some of their work as well as selling glasses (presumably with the voice assistant). I’m very excited to have a voice assistant like this and am almost a bit worried I will start feeling emotionally attached to a voice assistant with this level of human-like sound.

jofzar|1 year ago

I still feel like they don't have the right amount of human to them, maybe it's because I'm Australian and it sounds like I'm hearing an American robot?

Edit: well I asked the "male" model to speak more like an Australian and yep, getting way more uncanny. If it had an Australian accent I think it would mess with me more

igleria|1 year ago

Maybe the ability to personalize the voice so it is more... robotic or based on a fictional thing like Knight Rider would help to change the attachment to something more... healthy?

huijzer|1 year ago

> This was already posted here: https://news.ycombinator.com/item?id=43221377 but I’m really surprised at the lack of attention this model is getting.

I'm surprised by the lack of attention that Gemini 2.0 with native audio output got. They have a demo at https://youtu.be/qE673AY-WEI, which I think is really good too. The main problem with Google's model is that this audio output is not supported by the API, but you can try it at https://aistudio.google.com.

In general, text to speech is pretty good nowadays I think. For example, this is a little math video that I made a few days ago: https://www.youtube.com/watch?v=G1mvLrCfjFM with the (old) Google text to speech API. Honestly, I think the narration is better than I personally could have done. It's calm, well pronounced, and sounds relatively enthusiastic.

moralestapia|1 year ago

>They have a demo at https://youtu.be/qE673AY-WEI

That's not a demo, that's a video. Anyone can make something like that in an afternoon with a couple friends and a microphone.

Also, Google is known for putting out fake "demos", remember the Google Duplex scam?

smusamashah|1 year ago

How do I get to this in aistudio.google.com?

anon373839|1 year ago

It really is an astonishing technological feat! Also note that the largest model they trained is only 8.3B parameters (8B backbone + .3B decoder). It's exciting to think that they're going to be releasing this model under an Apache 2.0 license.

Mistletoe|1 year ago

Just realizing how uncanny valley it is to talk to AI and it never remembers anything you said in the past. Imagine if a human did that. It’s like you are talking to Tom Hanks’ Mr. Short Term Memory from SNL over and over.

https://youtube.com/watch?v=C6ufImch00g

micw|1 year ago

I does remember but you have to ask for. Try to say "make a bookmark at this point" and later ask for that bookmark. You can even give the bookmark a name or ask it to do so for you.

ekianjo|1 year ago

That can easily be fixed if you attach it to a RAG system

znpy|1 year ago

> 2. Calls are recorded for quality review but not used for ML training and are deleted within 30 days.

Sounds (pun intended) reasonable.