top | item 45140866

(no title)

teucris | 5 months ago

I think there’s a bit of parroting going around but LLMs are predictive and there’s a lot you can inuit a lot about how they behave just on that fact alone. Sure, calling it “token” prediction is oversimplifying things, but stating that, by their nature, LLMs are guessing at the next most likely thing in the scenario (next data structure needing to be coded up, next step in a process, next concept to cover in a paragraph, etc.) is a very useful mental model.

discuss

bt1a|5 months ago

I would challenge the utility of this mental model as again they're not simply tracing a "most likely" path unless your sampling methods are trivially greedy. I don't know of a better way to model it, and I promise I'm not trying to be anal here

teucris|5 months ago

“All models are wrong, but some are useful.”

Agreed - I picked certain words to be intentionally ambiguous eg “most likely” since it provides an effective intuitive grasp of what’s going on, even if it’s more complicated than that.

Uehreka|5 months ago

Honestly, I think the best way to reason about LLM behavior is to abandon any sort of white-box mental model (where you start from things you “know” about their internal mechanisms). Treat them as a black box, observe their behavior in many situations and over a long period of time, draw conclusions from the patterns you observe and test if your conclusions have predictive weight.

Of course, if someone is predisposed to incuriosity about LLMs and refuses to use them, they won’t be able to participate in that approach. However I don’t think there’s an alternative.

libraryofbabel|5 months ago

This is precisely what I recommend to people starting out with LLMs: do not start with the architecture, start with their behavior - use them for a while as a black box and then circle back and learn about transformers and cross entropy loss functions and whatever. Bottom-up approaches to learning work well in other areas of computing, but not this - there is nothing in the architecture to suggest the emergent behavior that we see.

anthem2025|5 months ago

So just ignore everything you actually know until you can fool yourself into thinking fancy auto complete is totally real intelligence?

Why not apply that to computers in general and then we can all worship the magic boxes.