top | item 40345866

(no title)

gdb | 1 year ago

(I work at OpenAI.)

It's really how it works.

discuss

order

baq|1 year ago

> (I work at OpenAI.)

Winner of the 'understatement of the week' award (and it's only Monday).

Also top contender in the 'technically correct' category.

behnamoh|1 year ago

> Winner of the 'understatement of the week' award (and it's only Monday).

Yes! As soon as I saw gdb I was like "that can't be Greg", but sure enough, that's him.

swyx|1 year ago

and was briefly untrue for like 2 days

Uptrenda|1 year ago

[deleted]

999900000999|1 year ago

How far are we away from something like a helmet with chat GPT and a video camera installed, I imagine this will be awesome for low vision people. Imagine having a guide tell you how to walk to the grocery store, and help you grocery shop without an assistant. Of course you have tons of liability issues here, but this is very impressive

JieJie|1 year ago

We're planning on getting a phone-carrying lanyard and she will just carry her phone around her neck with Be My Eyes^0 looking out the rear camera, pointed outward. She's DeafBlind, so it'll be bluetoothed to her hearing aids, and she can interact with the world through the conversational AI.

I helped her access the video from the presentation, and it brought her to tears. Now, she can play guitar, and the AI and her can write songs and sing them together.

This is a big day in the lives of a lot of people whom aren't normally part of the conversation. As of today, they are.

0: https://www.bemyeyes.com/

silverquiet|1 year ago

It sounds like the system that Marshall Brain envisioned in his novella, Manna.

rfoo|1 year ago

Can't wait for the moment when I can puta single line "Help me put this in the cart" on my product and magically sells better.

krainboltgreene|1 year ago

> Imagine having a guide tell you how to walk to the grocery store

I don't need to imagine that, I've had it for about 8 years. It's OK.

> help you grocery shop without an assistant

Isn't this something you learn as a child? Is that a thing we need automated?

macintux|1 year ago

Just the ability to distinguish bills would be hugely helpful, although I suppose that's much less of a problem these days with credit cards and digital payment options.

sim7c00|1 year ago

city guide tours, not a bad take tbh :D rather than walkin behind the guy with a megaphone and a flag.

jamestimmins|1 year ago

With this capability, how close are y'all to it being able to listen to my pronunciation of a new language (e.g. Italian) and given specific feedback about how to pronounce it like a local?

Seems like these would be similar.

elil17|1 year ago

It completely botched teaching someone to say “hello” in Chinese - it used the wrong tones and then incorrectly told them their pronunciation was good.

dgroshev|1 year ago

I don't think that'd work without a dedicated startup behind it.

The first (and imo the main) hurdle is not reproduction, but just learning to hear the correct sounds. If you don't speak Hindi and are a native English speaker, this [1] is a good example. You can only work on nailing those consonants when they become as distinct to your ear as cUp and cAp are in English.

We can get by by falling back to context (it's unlikely someone would ask for a "shit of paper"!), but it's impossible to confidently reproduce the sounds unless they are already completely distinct in our heads/ears.

That's because we think we hear things as they are, but it's an illusion. Cup/cap distinction is as subtle to an Eastern European as Hindi consonants or Mandarin tones are to English speakers, because the set of meaningful sounds distinctions differs between languages. Relearning the phonetic system requires dedicated work (minimal pairs is one option) and learning enough phonetics to have the vocabulary to discuss sounds as they are. It's not enough to just give feedback.

[1]: https://www.youtube.com/watch?v=-I7iUUp-cX8

patcon|1 year ago

After watching the demo, my question isn't about how close it is to helping me learn a language, but about how close it is to being me in another language.

Even styles of thought might be different in other languages, so I don't say that lightly... (stay strong, Sapir-Wharf, stay strong ;)

hack_ml|1 year ago

I was conversing with it in Hinglish (A combination of Hindi and English) which folks in Urban India use and it was pretty on point apart from some use of esoteric hindi words but i think with right prompting we can fix that.

estebank|1 year ago

In the "Point and learn Spanish" video, when shown an Apple and a Banana, the AI said they were a Manzana (Apple) and a Pantalón (Pants).

taytus|1 year ago

The italian output in the demo was really bad.

cchance|1 year ago

This is damn near one of the most impressive things, can only imagine especially with live translation and voice synthesis (eleven labs style) you'd be capable of to integrate with something like teams (select each persons language and do realtime translation to each persons native language, with their own voice and intonations would NUTS)

purplerabbit|1 year ago

There’s so much pent up collaborative human energy trapped behind language barriers.

Beautiful articulation.

This is an enormous win for humanity.

terhechte|1 year ago

Random OpenAI question: While the GPT models have become ever cheaper, the price for the tts models have stayed in the $15/1Mio char range. I was hoping this would also become cheaper at some point. There're so many apps (e.g. language learning) that quickly become too expensive given these prices. With the GPT-4o voice (which sounds much better than the current TTS or TTS HD endpoint) I thought maybe the prices for TTS would go down. Sadly that hasn't happened. Is that something on the OpenAI agenda?

j-krieger|1 year ago

I've always been wondering what GPT models lack that makes them "query->response" only. I've always tried to get chatbots to lose the initially needed query, with no avail. What would It take to get a GPT model to freely generate tokens in a thought like pattern? I think when I'm alone without query from another human. Why can't they?

dragonwriter|1 year ago

> What would It take to get a GPT model to freely generate tokens in a thought like pattern?

That’s fundamentally not how GPT models work, but you can easily build a framework around them that calls them in a loop; you’d need a special system prompt to get anything “thought like” that way, and if you want it to be anything other than stream-of-simulated-consciousness with no relevance to anything, and a non-empty “user” prompt each round, which could be as simple as time, a status update on something in the world, etc.

xwolfi|1 year ago

Monkeys who've trained since birth to use sign language, and can reply incredible questions, have the same issue. The researchers noticed they never once asked a question like "why is the sky blue?" or "why do you dress up". Zero initiating conversation, but they do reply when you ask what they want.

I suppose it would cost even more electricity to have ChatGPT musing alone though, burning through its nvidia cards...

kolinko|1 year ago

Just provide empty queey and that’s it - it will generate tokens no prob.

You can use any open source model wirthout any promot whatsoever

nurple|1 year ago

I think this will be key in a logical proof that statistical generation can never lead to sentience; Penrose will be shown to be correct, at least regarding the computability of consciousness.

You could say, in a sense, that without a human mind to collapse the wave function, the superposition of data in a neural net's weights can never have any meaning.

Even when we build connections between these statistical systems to interact with each other in a way similar to contemplation, they still require a human-created nucleation point on which to root the generation of their ultimate chain of outputs.

I feel like the fact that these models contain so much data has gripped our hardwired obsession for novelty and clouds our perception of their actual capacity to do de novo creation, which I think will be shown to be nil.

An understanding of how LLMs function should probably make this intuitively clear. Even with infinite context and infinite ability to weigh conceptual relations, they would still sit lifeless for all time without some, any, initial input against which they can run their statistics.

hpeter|1 year ago

It happens sometimes. Just the other day a local TinyLlama instance started asking me questions. The chat memory was full of mostly nonsense and it asked me a completely random and simple question out of the blue. Did chatbots evolve a lot since he was created.

I think you can get models to "think" if you give them a goal in the system prompt, a memory of previous thoughts, and keep invoking them with cron

djur|1 year ago

You might not have a prompt from another human, but you're always receiving new input.

pelorat|1 year ago

> Why can't they?

They are designed for query and reponse. They don't do anything unless you give them input. Also there's not much research on the best architecture for running continuous though loops in the background and how to mix them into the conversational "context". Current LLMs only emulate single thought synthesis based on long-term memory recall (and some goes off to query the Internet).

> I think when I'm alone without query from another human.

You are actually constantly queried, but it's stimulation from your senses. There are also neurons in your brain which fires regularly, like a clock that ticks every second.

Do you want to make a system that thinks without input? Then you need to add hidden stimuli via a non-deterministic random number generator, preferably a quantum based RNG (or it won't be possible to claim the resulting system has free-will). Even a single photon hitting your retina can affect your thoughts and there are no doubt other quantum effects that trips neurons in your brain above the firing threshold.

I think you need at least three of four levels of loops interacting, with varying strength between them. First level would be the interface to the world, the input and output level (video, audio, text). Data from here are high priority and is capable of interrupting lower levels.

The second level would be short term memory and context switching. Conversations needs to be classified, and stored in a database, and you need an API to retrieve old contexts (conversations). You also possibly need context compression (summarization of conversations in case you're about to hit a context window limit).

The third level would be the actual "thinking", a loop that constantly talks to itself to accomplish a goal using the data from all the other levels but mostly driven by the short term memory. Possibly you could go super-human here and spawn multiple worker processes in parallel. You need to classify the memories by asking; do I need more information? where do I find this information? Do I need an algorithm to accomplish a task? What is the completion criteria. Everything here is powered by an algorithm. You would take your data and produce a list of steps that you have to follow to resolves to a conclusion.

Everything you do as a human to resolve a thought can be expressed as a list or tree of steps.

If you've had a conversation with someone and you keep thinking about it afterwards, what has happened is basically that you have spawned a "worker process" that tries to come to a conclusion that satisfies some criteria. Perhaps there was ambiguity in the conversation that you are trying to resolve, or the conversation gave you some chemical stimulation.

The last level would be subconscious noise driven by the RNG, this would filter up with low priority. In the absence of other external stimuli with higher priority, or currently running thought processes, this would drive the spontaneous self-thinking portion (and dreams) when external stimuli is lacking.

Implement this and you will have something more akin to true AGI (whatever that is) on a very basic level.

throwthrowuknow|1 year ago

Train it on stream of consciousness but good luck getting enough training data.

ALittleLight|1 year ago

In my ChatGPT app or on the website I can select GPT-4o as a model, but my model doesn't seem to work like the demo. The voice mode is the same as before and the images come from DALLE and ChatGPT doesn't seem to understand or modify them any better than previously.

sumedh|1 year ago

GPT-4o text version is available not the multi modal one.

jacobsimon|1 year ago

I couldn’t quite tell from the announcement, but is there still a separate TTS step, where GPT is generating tones/pitches that are to be used, or is it completely end to end where GPT is generating the output sounds directly?

derac|1 year ago

It's one model with text/audio/image input and output.

mttpgn|1 year ago

Licensing the emotion-intoned TTS as a standalone API is something I would look forward to seeing. Not sure how feasible that would be if, as a sibling comment suggested, it bypasses the text-rendering step altogether.

rane|1 year ago

Will the new voice mode allow mixing languages in sentences?

As a language learner, this would be tremendously useful.

bjtitus|1 year ago

Is it possible to use this as a TTS model? I noticed on the announcement post that this is a single model as opposed to a text model being piped to a separate TTS model.

andybak|1 year ago

May I just say this launch was a bit of a mess?

The web page implies you can try it immediately. Initially it wasn't available.

A few hours later it was in both the web UI and the mobile app - I got a popu[ telling me that GPT-4o was available. However nothing seems to be any different. I'm not given any option to use video as an input, the app can't seem to pick up any new info from my voice.

I'm left a bit confused as to what I can do that I couldn't do before. I certainly can't seem to recreate much of the stuff from the announcement demos.

sumedh|1 year ago

The website clearly says that the text version is available now but the multimodal version will be released over the coming weeks.

dpflan|1 year ago

Who's idea was the singing AIs? What specifically did you want to highlight with that part of the demo?

I imagine that there is a lot of usage at the HQ, human + AI karaoke?

skottenborg|1 year ago

"(I work at OpenAI.)"

Ah yes, also known as being co-founder :)

leozq|1 year ago

yes, also known as a programmer loves coding a lot:)

rrr_oh_man|1 year ago

https://community.openai.com/t/when-i-log-in-to-chatgpt-i-am...

Sorry to hijack, but how the hell can I solve this? I have the EXACT SAME error on two iOS devices (native app only — web is fine), but not on Android, Mac, or Windows.

dmarinoc|1 year ago

Are you blocking some of your traffic? I had the same issue until I (temporary) disabled NextDNS just for signing in.

Sadly, the error returned is not related to the cause.

hpeter|1 year ago

I can't wait to try it out, it sounds too good to be real.

It will be fully available in Eu with the GDPR compliance?

xanderlewis|1 year ago

I like the humility in your first statement.

theboat|1 year ago

I love how this comment proves the need for audio2audio. I initially read it as sarcastic, but now I can't tell if it's actually sincere.

Induane|1 year ago

I like their username.

moab|1 year ago

Pretty sure the snark is unnecessary.