top | item 47001129

(no title)

diminish | 17 days ago

one feels the llm wow moment whenever what they do on an area has been surpassed by an llm. newer versions of llms are probably trained by the feedback from developer code agent sessions; so this is probably why pro developers started to feel "wow" recently.

the real challenge will be in the frontier of the human knowledge and whether llms will be able to advance things forward or not.

ps1; i'm using 5.3/o4.6/k2.5/m2.5/glm5 and others daily for development - so my work has 1.5x intensified - i tackle increasingly harder problems but llms still really fail big in brand new challenges like i fail too. so i'm more alert than ever.

ps2: syntactical autocomplete used to write 80% of my code; now llms replaced autocomplete but at a semanticlevel; i think and LLM implements most of my actions like a cerebellum for muscle coordination; but sometimes teaching me new info from the net.

discuss

mattlangston|17 days ago

The frontier-of-knowledge point is the right question. My own research is a case in point - I apply experimental physics methods to LLMs, measuring their equations of motion in search of a unified framework for how and why they work. Some of the answers I'm looking for may not exist in any training data.

That's where the 4.5->4.6 jump hit me hardest - not routine tasks but problems where I need the model to reason about stuff it hasn't seen. It still fails, but it went from confidently wrong to productively wrong, if that makes sense. I can actually steer it now.

The cerebellum analogy resonates. I'd go further - it's becoming something I think out loud with, which is changing how I approach problems, not just how fast I solve them.

mycall|17 days ago

That wrongness is the frontier labs trying to remove their benchmaxxing bias, so the models now have a concept of 'I don't know' and will rethink directions and goals better. There was lots of research last year on this topic, and it takes 6 to 12 months before it is implemented for general consumption.

2026 will see further improvements for you.