top | item 41711251

Why aren't LLMs trained on action / cause+effect data vs. just analytical stuff?

3 points| purplerabbit | 1 year ago

Stupid question, but if we want models that are capable of doing things (agents) vs just spitting out interesting content, why isn't anyone training them on data that represents actions?

Models are incredible at generating analytical / blog-ish / stack overflowish content, but suck at doing things that are complex enough that they require iteration.

For instance: If we want models that can handle complex projects, why don't we record actions taken in the execution of complex projects, and train models on that? Or if we want models that can use a browser competently, why don't we train models on screenshots + action descriptions? (Or is this what was done with o1, which is why it seems to have unprecedented capabilities?)

Is the problem just getting high-quality data? I know we've got internet dumps full of blog-ish content, but no big, easy-to-gather dumps of high-quality information about actions or chains of actions and their effects over time

(I'm sure there are tons of framing problems in this question -- sorry)

4 comments

dtagames|1 year ago

What you're describing isn't how GPT training works. Mostly, they work on next token prediction without having any understanding of what those tokens actually mean. It works well for text and images but it can't lead to a reproducible set of steps.

I wrote an article[0] about it recently that you might enjoy.

[0] Something From Nothing | A Painless Approach to Understanding AI

https://medium.com/gitconnected/something-from-nothing-d755f...

purplerabbit|1 year ago

The tokens could describe a sequence of actions and their consequences vs. blog / forum type content

wmf|1 year ago

This is starting to happen; they're calling them Large Action Models.