It’s impressive how every iteration tries to get further from pretending actual AGI would be anywhere close when we are basically writing library functions with the worst DSL known to man, markdown-with-english.
Call me naive, but my read is the opposite. It's impressive to me that we have systems which can interpret plain english instructions with a progressively higher degree of reliability. Also, that such a simple mechanism for extending memory (if you believe it's an apt analogy) is possible. That seems closer to AGI to me, though maybe it is a stopgap to better generality/"intelligence" in the model.
I'm not sure English is a bad way to outline what the system should do. It has tradeoffs. I'm not sure library functions are a 1:1 analogy either. Or if they are, you might grant me that it's possible to write a few english sentences that would expand into a massive amount of code.
It's very difficult to measure progress on these models in a way that anyone can trust, moreso when you involve "agent" code around the model.
> I'm not sure English is a bad way to outline what the system should do.
It isn't, as these are how stakeholders convey needs to those charged with satisfying same (a.k.a. "requirements"). Where expectations become unrealistic is believing language models can somehow "understand" those outlines as if a human expert were doing so in order to produce an equivalent work product.
Language models can produce nondeterministic results based on the statistical model derived from their training data set(s), with varying degrees of relevance as determined by persons interpreting the generated content.
They do not understand "what the system should do."
I 100% agree. I don't know what the GP is on. Being able to write instructions in a .md file is "further away from AGI"? Like... what? It's just a little quality of life feature. How and why is it related to AGI?
Top HN comments sometime read like a random generator:
Why are people treating everything OpenAI does as an evidence of anti- AGI? It's like saying if you don't mortgage your house to all-in AAPL, you "don't really believe Apple has a future." Even OpenAI does believe there is X% chance AGI will be achieved, it doesn't mean they should stop literally everything else they're doing.
I’ve posted this before, but here goes: we achieved AGI in either 2017 or 2022 (take your pick) with the transformer architecture and the achievement of scaled-up NLP in ChatGPT.
What is AGI? Artificial. General. Intelligence. Applying domain independent intelligence to solve problems expressed in fully general natural language.
It’s more than a pedantic point though. What people expect from AGI is the transformative capabilities that emerge from removing the human from the ideation-creation loop. How do you do that? By systematizing the knowledge work process and providing deterministic structure to agentic processes.
Which is exactly what these developments are doing.
Literally yesterday we had a post about GPT-5.2, which jumped 30% on ARC-AGI 2, 100% on AIME without tools, and a bunch of other impressive stats. A layman's (mine) reading of those numbers feels like the models continue to improve as fast as they always have. Then today we have people saying every iteration is further from AGI. It really perplexes me is how split-brain HN is on this topic.
Goodhart's law: When a measure becomes a target, it ceases to be a good measure.
AI companies have high incentive to make score go up. They may employ human to write similar-to-benchmark training data to hack benchmark (while not directly train on test).
Throwing your hard problem at work to LLM is a better metric than benchmarks.
One classic problem in all ML is ensuring the benchmark is representative and that the algorithm isn’t overfitting the benchmark.
This remains an open problem for LLMs - we don’t have true AGI benchmarks and the LLMs are frequently learning the benchmark problems without actually necessarily getting that much better in real world. Gemini 3 has been hailed precisely because it’s delivered huge gains across the board that aren’t overfitting to benchmarks.
HN is not an entity with a single perspective, and there are plenty of people on here who have a financial stake in you believing their perspective on the matter.
I think really more than anything it’s become clear that AGI is an illusion. There’s nothing there. It’s the mirage in the desert, you keep waking towards it but it’s always out of reach and unclear if it even exists.
So companies are really trying to deliver value. This is the right pivot. If you gave me an AGI with a 100 IQ, that seems pretty much worthless in today’s world. But domain expertise - that I’ll take.
It's clear from the development trajectory that AGI is not what current AI development is leading to and I think that is a natural consequence of AGI not fitting the constraints imposed by business necessity. AGI would need to have levels of agency and self-motivation that are inconsistent with basic AI safety principles.
Instead, we're getting a clear division of labor where the most sensitive agentic behavior is reserved for humans and the AIs become a form of cognitive augmentation of the human agency. This was always the most likely outcome and the best we can hope for as it precludes dangerous types of AI from emerging.
This might be actually be better in a certain way: if you change a real customer-facing API then customers will complain when you break their code. An LLM will likely adapt. So the interface is more flexible.
But perhaps an LLM could write an adapter that gets cached until something changes?
And yet the tools wielding these are quite adept at writing and modifying them themselves. It’s LLMs building skills for LLMs. The public ones will naturally be vacuumed up by scrapers and put in the training set, making all future LLMs know more.
Take off is here, human in the loop assisted for now… hopefully for much longer.
Yes. Prompt engineering is like a shittier verson of writing a VBA app inside Excel or Access.
Bloat has a new name and its AI integration. You thought Chrome using GB per tab was bad, wait until you need a whole datacenter to use your coding environment.
> Prompt engineering is like a shittier verson of writing a VBA app inside Excel or Access.
Sure, if you could use VBA to read a patient's current complaint, vitals, and medical history, look up all the relevant research on Google Scholar, and then output a recommended course of treatment.
derac|2 months ago
I'm not sure English is a bad way to outline what the system should do. It has tradeoffs. I'm not sure library functions are a 1:1 analogy either. Or if they are, you might grant me that it's possible to write a few english sentences that would expand into a massive amount of code.
It's very difficult to measure progress on these models in a way that anyone can trust, moreso when you involve "agent" code around the model.
AdieuToLogic|2 months ago
It isn't, as these are how stakeholders convey needs to those charged with satisfying same (a.k.a. "requirements"). Where expectations become unrealistic is believing language models can somehow "understand" those outlines as if a human expert were doing so in order to produce an equivalent work product.
Language models can produce nondeterministic results based on the statistical model derived from their training data set(s), with varying degrees of relevance as determined by persons interpreting the generated content.
They do not understand "what the system should do."
raincole|2 months ago
Top HN comments sometime read like a random generator:
return random_criticism_of_ai_companies() + " " + unrelated_trivia_fact()
Why are people treating everything OpenAI does as an evidence of anti- AGI? It's like saying if you don't mortgage your house to all-in AAPL, you "don't really believe Apple has a future." Even OpenAI does believe there is X% chance AGI will be achieved, it doesn't mean they should stop literally everything else they're doing.
unknown|2 months ago
[deleted]
adastra22|2 months ago
What is AGI? Artificial. General. Intelligence. Applying domain independent intelligence to solve problems expressed in fully general natural language.
It’s more than a pedantic point though. What people expect from AGI is the transformative capabilities that emerge from removing the human from the ideation-creation loop. How do you do that? By systematizing the knowledge work process and providing deterministic structure to agentic processes.
Which is exactly what these developments are doing.
johnfn|2 months ago
qouteall|2 months ago
AI companies have high incentive to make score go up. They may employ human to write similar-to-benchmark training data to hack benchmark (while not directly train on test).
Throwing your hard problem at work to LLM is a better metric than benchmarks.
vlovich123|2 months ago
This remains an open problem for LLMs - we don’t have true AGI benchmarks and the LLMs are frequently learning the benchmark problems without actually necessarily getting that much better in real world. Gemini 3 has been hailed precisely because it’s delivered huge gains across the board that aren’t overfitting to benchmarks.
FuckButtons|2 months ago
noitpmeder|2 months ago
tintor|2 months ago
kenjackson|2 months ago
So companies are really trying to deliver value. This is the right pivot. If you gave me an AGI with a 100 IQ, that seems pretty much worthless in today’s world. But domain expertise - that I’ll take.
lowdest|2 months ago
j45|2 months ago
Is the technology continuing to be more applicable?
Is the way the technology is continuing to be more applicable leading to frameworks of usage that could lead to the next leap? :)
ETH_start|2 months ago
Instead, we're getting a clear division of labor where the most sensitive agentic behavior is reserved for humans and the AIs become a form of cognitive augmentation of the human agency. This was always the most likely outcome and the best we can hope for as it precludes dangerous types of AI from emerging.
ogogmad|2 months ago
pavelstoev|2 months ago
DonHopkins|2 months ago
sc077y|2 months ago
skybrian|2 months ago
But perhaps an LLM could write an adapter that gets cached until something changes?
airstrike|2 months ago
baq|2 months ago
Take off is here, human in the loop assisted for now… hopefully for much longer.
mrcwinn|2 months ago
nimchimpsky|2 months ago
[deleted]
cyanydeez|2 months ago
Bloat has a new name and its AI integration. You thought Chrome using GB per tab was bad, wait until you need a whole datacenter to use your coding environment.
Alex3917|2 months ago
Sure, if you could use VBA to read a patient's current complaint, vitals, and medical history, look up all the relevant research on Google Scholar, and then output a recommended course of treatment.
simonw|2 months ago