top | item 44244384

(no title)

flessner | 8 months ago

> Already we live with incredible digital intelligence, and after some initial shock, most of us are pretty used to it. Very quickly we go from being amazed that AI can generate a beautifully-written paragraph to wondering when it can generate a beautifully-written novel;

It was probably around 7 years ago when I first got interested in machine learning. Back then I followed a crude YouTube tutorial which consisted of downloading a Reddit comment dump and training an ML model on it to predict the next character for a given input. It was magical.

I always see LLMs as an evolution of that. Instead of the next character, it's now the next token. Instead of GBs of Reddit comments, it's now TBs of "everything". Instead of millions of parameters, it's now billions of parameters.

Over the years, the magic was never lost on me. However, I can never see LLMs as more than a "token prediction machine". Maybe throwing more compute and data at it will at some point make it so great that it's worthy to be called "AGI" anyway? I don't know.

Well anyway, thanks for the nostalgia trip on my birthday! I don't entirely share the same optimism - but I guess optimism is a necessary trait for a CEO, isn't it?

discuss

helloplanets|8 months ago

What's your take on Anthropic's 'Tracing the thoughts of a large language model'? [0]

> To write the second line, the model had to satisfy two constraints at the same time: the need to rhyme (with "grab it"), and the need to make sense (why did he grab the carrot?). Our guess was that Claude was writing word-by-word without much forethought until the end of the line, where it would make sure to pick a word that rhymes. We therefore expected to see a circuit with parallel paths, one for ensuring the final word made sense, and one for ensuring it rhymes.

> Instead, we found that Claude plans ahead. Before starting the second line, it began "thinking" of potential on-topic words that would rhyme with "grab it". Then, with these plans in mind, it writes a line to end with the planned word.

This is an older model (Claude 3.5 Haiku) with no test time compute.

[0]: https://www.anthropic.com/news/tracing-thoughts-language-mod...

Sammi|8 months ago

What is called "planning" or "thinking" here doesn't seem conceptually much different to me than going from naive breath first search based Dijkstra shortest path search, to adding a heuristics that makes it search in a particular direction first and calling it A*. In both cases you're adding another layer to an existing algorithm in order to make it more effective. Doesn't make either AGI.

I'm really no expert in neural nets or LLMs, so my thinking here is not an expert opinion, but as a CS major reading that blog from Anthropic, I just cannot see how they provided any evidence for "thinking". To me it's pretty aggressive marketing to call this "thinking".

yencabulator|8 months ago

Generalize the concept from next token prediction to coming tokens prediction and the rest still applies. LLMs are still incredibly poor at symbolic thought and following multi-step algorithms, and I as a non-ML person don't really see what in the LLM mechanism would provide such power. Or maybe we're still just another 1000x scale off and symbolic thought will emerge at some point.

Me personally, I expect to see LLMs to be a mere part of whatever will be invented later.

iNic|8 months ago

The mere token prediction comment is wrong, but I don't think any of the other comments really explained why. Next token prediction is not what the AI does, but its goal. It's like saying soccer is a boring sport having only ever seen the final scores. The important thing about LLMs is that they can internally represent many different complex ideas efficiently and coherently! This makes them an incredible starting point for further training. Nowadays no LLM you interact with will be a pure next token predictor anymore, they will have all gone through various stages of RL, so that they actually do what we want them to do. I think I really feel the magic looking at the "circuit" work by Anthropic. It really shows that these models have some internal processing / thinking that is complex and clever.

quonn|8 months ago

> that they can internally represent many different complex ideas efficiently and coherently

The Transformer circuits[0] suggest that this representation is not coherent at all.

[0] https://transformer-circuits.pub

trashtester|8 months ago

The "next token prediction" is a distraction. That's not where the interesting part of an AI model happens.

If you think of the tokenization near the end as a serializer, something like turning an object model into json, you get a better understanding. The interesting part of a an OOP program is not in the json, but what happens in memory before the json is created.

Likewise, the interesting parts of a neural net model, whether it's LLM's, AlphaProteo or some diffusion based video model, happen in the steps that operate in their latent space, which is in many ways similar to our subconscious thinking.

In those layers, the AI models detect deeper and deeper patterns of reality. Much deeper than the surface pattern of the text, images, video etc used to train them. Also, many of these patterns generalize when different modalities are combined.

From this latent space, you can "serialize" outputs in several different ways. Text is one, image/video another. For now, the latent spaces are not general enough to do all equally well, instead models are created that specialize on one modality.

I think the step to AGI does not require throwing a lot more compute into the models, but rather to have them straddle multiple modalities better, in particular, these:

- Physical world modelling at the level of Veo3 (possibly with some lessons from self driving or robotics model for elements like object permananence and perception) - Symbolic processing of the best LLM's. - Ability to be goal oriented and iterate towards a goal, similar to the Alpha* family of systems - Optionally: Optimized for the use of a few specific tools, including a humanoid robot.

Once all of these are integrated into the same latent space, I think we basically have what it takes to replace most human thought.

sgt101|8 months ago

>which is in many ways similar to our subconscious thinking

this is just made up.

- we don't have any useful insight on human subconscious thinking. - we don't have any useful insight on the structures that support human subconscious thinking. - the mechanisms that support human cognition that we do know about are radically different from the mechanisms that current models use. For example we know that biological neurons & synapses are structurally diverse, we know that suppression and control signals are used to change the behaviour of the networks , we know that chemical control layers (hormones) transform the state of the system.

We also know that biological neural systems continuously learn and adapt, for example in the face of injury. Large models just don't do these things.

Also this thing about deeper and deeper realities? C'mon, it's surface level association all the way down!

phorkyas82|8 months ago

As far as I understood any AI model is just a linear combination of its training data. Even if that were such a large corpus as the entire web... it's still just like a sophisticated compression of other's people's expressions.

It has not made its own experiences, not interacted with the outer world. Dunno, I won't to rule out something operating solely on language artifacts cannot develop intelligence or consciousness, whatever that is,.. but so far there are also enough humans we could care about and invest into.

andsoitis|8 months ago

> the AI models detect deeper and deeper patterns of reality. Much deeper than the surface pattern of the text

What are you talking about?

klipt|8 months ago

If you wish to make an apple pie from scratch

You must first invent the universe

If you wish to predict the next token really well

You must first model the universe

Aeolun|8 months ago

> wondering when it can generate a beautifully-written novel

Not quite yet, but I’m working on it. It’s ~~hard~~ impossible to get original ideas out of an LLM, so it’ll probably always be a human assisted effort.

agumonkey|8 months ago

The TB of everything with transformers makes a difference, maybe i'm just too uneducated, but the amount of semantic context that can be taken into account when generating the next token is really disrupting.

marsten|8 months ago

> Over the years, the magic was never lost on me. However, I can never see LLMs as more than a "token prediction machine".

The "mere token prediction machine" criticism, like Pearl's "deep learning amounts to just curve fitting", is true but it also misses the point. AI in the end turns a mirror on humanity and will force us to accept that intelligence and consciousness can emerge from some pretty simple building blocks. That in some deep sense, all we are is curve fitting.

It reminds me of the lines from T.S. Eliot, “...And the end of all our exploring, Will be to arrive where we started, And know the place for the first time."