(no title)
kmijyiyxfbklao | 3 months ago
I think the reason people don't say that is because they want to say "I already understand what they are, and I'm not impressed and it's nothing new". But what the comment you are replying to is saying is that the inner workings are the important innovative stuff.
yannyu|3 months ago
LLMs are probabilistic or non-deterministic computer programs, plenty of people say this. That is not much different than saying "LLMs are probabilistic next-token prediction based on current context".
> I think the reason people don't say that is because they want to say "I already understand what they are, and I'm not impressed and it's nothing new". But what the comment you are replying to is saying is that the inner workings are the important innovative stuff.
But we already know the inner workings. It's transformers, embeddings, and math at a scale that we couldn't do before 2015. We already had multi-layer perceptrons with backpropagation and recurrent neural networks and markov chains before this, but the hardware to do this kind of contextual next-token prediction simply didn't exist at those times.
I understand that it feels like there's a lot going on with these chatbots, but half of the illusion of chatbots isn't even the LLM, it's the context management that is exceptionally mundane compared to the LLM itself. These things are combined with a carefully crafted UX to deliberately convey the impression that you're talking to a human. But in the end, it is just a program and it's just doing context management and token prediction that happens to align (most of the time) with human expectations because it was designed to do so.
The two of you seem to be implying there's something spooky or mysterious happening with LLMs that goes beyond our comprehension of them, but I'm not seeing the components of your argument for this.
ACCount37|3 months ago
Overconfident and wrong.
No one understands how an LLM works. Some people just delude themselves into thinking that they do.
Saying "I know how LLMs work because I read a paper about transformer architecture" is about as delusional as saying "I read a paper about transistors, and now I understand how Ryzen 9800X3D works". Maybe more so.
It takes actual reverse engineering work to figure out how LLMs can do small bits and tiny slivers of what they do. And here you are - claiming that we actually already know everything there is to know about them.