WingNews

derac|2 months ago

Call me naive, but my read is the opposite. It's impressive to me that we have systems which can interpret plain english instructions with a progressively higher degree of reliability. Also, that such a simple mechanism for extending memory (if you believe it's an apt analogy) is possible. That seems closer to AGI to me, though maybe it is a stopgap to better generality/"intelligence" in the model.

I'm not sure English is a bad way to outline what the system should do. It has tradeoffs. I'm not sure library functions are a 1:1 analogy either. Or if they are, you might grant me that it's possible to write a few english sentences that would expand into a massive amount of code.

It's very difficult to measure progress on these models in a way that anyone can trust, moreso when you involve "agent" code around the model.

AdieuToLogic|2 months ago

> I'm not sure English is a bad way to outline what the system should do.

It isn't, as these are how stakeholders convey needs to those charged with satisfying same (a.k.a. "requirements"). Where expectations become unrealistic is believing language models can somehow "understand" those outlines as if a human expert were doing so in order to produce an equivalent work product.

Language models can produce nondeterministic results based on the statistical model derived from their training data set(s), with varying degrees of relevance as determined by persons interpreting the generated content.

They do not understand "what the system should do."

raincole|2 months ago

I 100% agree. I don't know what the GP is on. Being able to write instructions in a .md file is "further away from AGI"? Like... what? It's just a little quality of life feature. How and why is it related to AGI?

Top HN comments sometime read like a random generator:

return random_criticism_of_ai_companies() + " " + unrelated_trivia_fact()

Why are people treating everything OpenAI does as an evidence of anti- AGI? It's like saying if you don't mortgage your house to all-in AAPL, you "don't really believe Apple has a future." Even OpenAI does believe there is X% chance AGI will be achieved, it doesn't mean they should stop literally everything else they're doing.

unknown|2 months ago

[deleted]

adastra22|2 months ago

I’ve posted this before, but here goes: we achieved AGI in either 2017 or 2022 (take your pick) with the transformer architecture and the achievement of scaled-up NLP in ChatGPT.

What is AGI? Artificial. General. Intelligence. Applying domain independent intelligence to solve problems expressed in fully general natural language.

It’s more than a pedantic point though. What people expect from AGI is the transformative capabilities that emerge from removing the human from the ideation-creation loop. How do you do that? By systematizing the knowledge work process and providing deterministic structure to agentic processes.

Which is exactly what these developments are doing.

johnfn|2 months ago

Literally yesterday we had a post about GPT-5.2, which jumped 30% on ARC-AGI 2, 100% on AIME without tools, and a bunch of other impressive stats. A layman's (mine) reading of those numbers feels like the models continue to improve as fast as they always have. Then today we have people saying every iteration is further from AGI. It really perplexes me is how split-brain HN is on this topic.

qouteall|2 months ago

Goodhart's law: When a measure becomes a target, it ceases to be a good measure.

AI companies have high incentive to make score go up. They may employ human to write similar-to-benchmark training data to hack benchmark (while not directly train on test).

Throwing your hard problem at work to LLM is a better metric than benchmarks.

vlovich123|2 months ago

One classic problem in all ML is ensuring the benchmark is representative and that the algorithm isn’t overfitting the benchmark.

This remains an open problem for LLMs - we don’t have true AGI benchmarks and the LLMs are frequently learning the benchmark problems without actually necessarily getting that much better in real world. Gemini 3 has been hailed precisely because it’s delivered huge gains across the board that aren’t overfitting to benchmarks.

FuckButtons|2 months ago

HN is not an entity with a single perspective, and there are plenty of people on here who have a financial stake in you believing their perspective on the matter.

noitpmeder|2 months ago

Just because they're better at writing CS algorithms doesn't mean they're taking steps closer to anything resembling AGI.

tintor|2 months ago

HM is not a single person. Different people on HM have different opinions.

kenjackson|2 months ago

I think really more than anything it’s become clear that AGI is an illusion. There’s nothing there. It’s the mirage in the desert, you keep waking towards it but it’s always out of reach and unclear if it even exists.

So companies are really trying to deliver value. This is the right pivot. If you gave me an AGI with a 100 IQ, that seems pretty much worthless in today’s world. But domain expertise - that I’ll take.

lowdest|2 months ago

I am under the impression that I'm a natural general intelligence, and I am far from the optimal entity to perform my job.

j45|2 months ago

AGI as a binary 0 or 1 existing or not isn't the thing that interests me to look at primarily.

Is the technology continuing to be more applicable?

Is the way the technology is continuing to be more applicable leading to frameworks of usage that could lead to the next leap? :)

ETH_start|2 months ago

It's clear from the development trajectory that AGI is not what current AI development is leading to and I think that is a natural consequence of AGI not fitting the constraints imposed by business necessity. AGI would need to have levels of agency and self-motivation that are inconsistent with basic AI safety principles.

Instead, we're getting a clear division of labor where the most sensitive agentic behavior is reserved for humans and the AIs become a form of cognitive augmentation of the human agency. This was always the most likely outcome and the best we can hope for as it precludes dangerous types of AI from emerging.

ogogmad|2 months ago

Gemini seems to be firmly in the lead now. OpenAI doesn't seem to have the SoTA. This should have bearing on whether or not LLMs have peaked yet.

pavelstoev|2 months ago

Not wrong but markdown with English may be the most used DSL, second only to a language itself. Volume over quality.

DonHopkins|2 months ago

Markdown-with-English sounds like the ultimate domain nonspecific language to me.

sc077y|2 months ago

Who knew that English would be the most popular programming language of 2025?

skybrian|2 months ago

This might be actually be better in a certain way: if you change a real customer-facing API then customers will complain when you break their code. An LLM will likely adapt. So the interface is more flexible.

But perhaps an LLM could write an adapter that gets cached until something changes?

airstrike|2 months ago

The LLM also adapts even when the API hasn't changed and sometimes just gets it wrong, so it's not the silver bullet you're claiming

baq|2 months ago

And yet the tools wielding these are quite adept at writing and modifying them themselves. It’s LLMs building skills for LLMs. The public ones will naturally be vacuumed up by scrapers and put in the training set, making all future LLMs know more.

Take off is here, human in the loop assisted for now… hopefully for much longer.

mrcwinn|2 months ago

I think you're missing the point.

nimchimpsky|2 months ago

[deleted]

cyanydeez|2 months ago

Yes. Prompt engineering is like a shittier verson of writing a VBA app inside Excel or Access.

Bloat has a new name and its AI integration. You thought Chrome using GB per tab was bad, wait until you need a whole datacenter to use your coding environment.

Alex3917|2 months ago

> Prompt engineering is like a shittier verson of writing a VBA app inside Excel or Access.

Sure, if you could use VBA to read a patient's current complaint, vitals, and medical history, look up all the relevant research on Google Scholar, and then output a recommended course of treatment.

simonw|2 months ago

The difference between prompting a coding agent and VBA is that with VBA you have to write and test and iterate on the code yourself.

(no title)

discuss