(no title)
AgentMatrixAI | 7 months ago
Can't help but feel many are optimizing happy paths in their demos and hiding the true reality. Doesn't mean there isn't a place for agents but rather how we view them and their potential impact needs to be separated from those that benefit from hype.
just my two cents
lairv|7 months ago
- AlphaGo/AlphaZero (MCTS)
- OpenAI Five (PPO)
- GPT 1/2/3 (Transformers)
- Dall-e 1/2, Stable Diffusion (CLIP, Diffusion)
- ChatGPT (RLHF)
- SORA (Diffusion Transformers)
"Agents" is a marketing term and isn't backed by anything. There is little data available, so it's hard to have generally capable agents in the sense that LLMs are generally capable
chaos_emergent|7 months ago
The technology for reasoning models is the ability to do RL on verifiable tasks, with the some (as-of-yet unpublished, but well-known) search over reasoning chains, with a (presumably neural) reasoning fragment proposal machine, and a (presumably neural) scoring machine for those reasoning fragments.
The technology for agents is effectively the same, with some currently-in-R&D way to scale the training architecture for longer-horizon tasks. ChatGPT agent or o3/o4-mini are likely the first published models that take advantage of this research.
It's fairly obvious that this is the direction that all the AI labs are going if you go to SF house parties or listen to AI insiders like Dwarkesh Patel.
ashwindharne|7 months ago
Obviously, this is working better in some problem spaces than others; seems to mainly depend on how in-distribution the data domain is to the LLM's training set. Choices about context selection and the API surface exposed in function calls seem to have a large effect on how well these models can do useful work as well.
mumbisChungo|7 months ago
paradite|7 months ago
MDP, Q learning, TD, RL, PPO are basically all about agent.
What we have today is still very much the same field as it was.
lossolo|7 months ago
posix86|7 months ago
Just because it didn't reach 100% just yet doesn't mean that LLMs as a whole are doomed. In fact, the fact that they are slowly approaching 100% shows promise that there IS a future for LLMs, and that they still have the potential to change things fundamentally, more so than they did already.
camdenreslink|7 months ago
So it is really great for tasks where do the work is a lot harder than verifying it, and mostly useless for tasks where doing the work and verifying it are similarly difficult.
posix86|7 months ago
wslh|7 months ago
Even with the best intentions, this feels similar to when a developer hands off code directly to the customer without any review, or QA, etc. We all know that what a developer considers "done" often differs significantly from what the customer expects.
risyachka|7 months ago
Yep. This is literally what every AI company does nowadays.
Forgeties79|7 months ago
To your point - the most impressive AI tool (not an LLM but bear with me) I have used to date, and I loathe giving Adobe any credit, is Adobe's Audio Enhance tool. It has brought back audio that prior to it I would throw out or, if the client was lucky, would charge thousands of dollars and spend weeks working on to repair to get it half as good as that thing spits out in minutes. Not only is it good at salvaging terrible audio, it can make mediocre zoom audio sound almost like it was recorded in a proper studio. It is truly magic to me.
Warning: don't feed it music lol it tries to make the sounds into words. That being said, you can get some wild effects when you do it!
skywhopper|7 months ago
unknown|7 months ago
[deleted]
j_timberlake|7 months ago
But since you can't really do that with wedding planning or whatnot, the 100% ceiling means the AI can only compete on speed and cost. And the cost will be... whatever Nvidia feels like charging per chip.
guluarte|7 months ago
ankit219|7 months ago
I agree with you on the hype part. Unfortunately, that is the reality of current silicon valley. Hype gets you noticed, and gets you users. Hype propels companies forward, so that is about to stay.