top | item 40305627

(no title)

wangii | 1 year ago

I feel it's a pretty dangerous optimization before we REALLY understand what's going on inside of the LLM. e.g. guys believe in the geometric interpretation will have something to say, and it would probably hurt if you are using "filler" tokens.

Besides, the assumption (not a universal fact) that "forming complete sentences in mind before articulating word by word" seems overly simplifies activities happens in our mind: do we really have a complete planning before start talking/typing? as a Buddhist I lean towards it's an illusion. further more, what about simultaneous thoughts? are we linear thinker in the sentence level?

anyway, pretty neat math!

discuss

renonce|1 year ago

The optimization does not affect the result of LLM, it's guaranteed to produce equivalent results as decoding directly. Let's not treat that LLM as some magic that resembles our mind, it's just another program that produces sentences that happens to make sense.

naasking|1 year ago

> Let's not treat that LLM as some magic that resembles our mind,it's just another program that produces sentences that happens to make sense.

"That happen to make sense" is hiding a lot of magic. It would be statistically impossible to make as much sense as LLMs do in response to prompts if it did not actually make semantic distinctions. If it makes semantic distinctions, then it does resemble the human mind in at least one way.

wangii|1 year ago

According to the original Jacobi decoding paper, it's set in the machine translation tasks, with encoder + decoder, in which parallel algo applied only to the decoder part.

sigmoid10|1 year ago

Lets not treat our mind as something magical. It's just another program that learned to speak by consuming lots of training input. The implementation might look slightly different from the outside, but from a mathematical perspective, artificial neural networks are proven to be at least as capable as the human mind.

Etheryte|1 year ago

That assumption might be useful in this context, but I think it's pretty clearly not true. Ask anyone to tell you about a complex past event with a lot of parallel branches and you'll quickly see them add bits, pieces and tangents midsentence to cover the full range of events. I don't think I've seen the sentence granularity hypothesis in any serious scientific context before.

hatthew|1 year ago

Can't speak for everyone but I definitely don't mentally form complete sentences before talking. Sometimes I grammatically talk myself into a corner in the middle of a sentence and need to use some awkward words/phrases to finish my thought, or simply pause and restart the phrase from the beginning.

nomel|1 year ago

I feel surprisingly disconnected from my speaking self, acting as more of an observer, who is sometimes surprised at what I come up with. It just flows. I feel I have very little need for input.

But, I also feel fairly disconnected from my thinking self. I point my attention at something and solutions usually just pop out, maybe with some guidance/context forming required, in the form of internal dialog, which is usually of a rubber ducky style format [1], or mental testing of that mostly spontaneous solution.

I feel the "real" me is the one sensing/observing, which includes the observing of those spontaneous solutions, and what I say.

[1] Works with any problem space, not just coding "debugging": https://rubberduckdebugging.com/

int_19h|1 year ago

We don't appear to be forming words sequentially from underlying parts, even though in many languages they are broken down in smaller units that carry semantic meaning themselves. There doesn't seem to be any clear reason for this to break down suddenly at sentence level.

causal|1 year ago

What is the geometric interpretation?