top | item 47160878

(no title)

dboreham | 4 days ago

Ah, Oberon! I "learned" programming from Niklaus Wirth's Pascal book on a family vacation where I had no computer. Actually I didn't have a computer with a Pascal compiler even if I wasn't on vacation.

Anyway, I began this project while on vacation (again) then completed it while attending a conference, so the work wasn't 100% duty cycle. That said it took about a month from beginning to the current state. You can see in the linked article almost all the LLM sessions that built the project.

LLMs do seem to be a bit narcissistic as you've alluded to -- confidently declaring that it has implemented "PRI PAR" for example, but conveniently not mentioning that it only parsed the keywords and didn't in fact implement priority semantics. This reminds me of less experienced developers I've managed in the past. Loth to deliver bad news.

This project was all done with Claude. When I began I was given the Opus 4.5 model but fairly early in the timeline Anthropic enabled the new Opus 4.6 model. This was before its official release so I'm not sure if they have a rollout policy that targeted me or my project. Anyway, most of the work was Opus 4.6.

Overall I learned a tremendous amount about what today's frontier models can do: I could probably give 4-5 talks on various things I noticed, or talk for a few hours over beers. General take away was that the experience was uncannily similar to developing software as a human, or running a team of somewhat less experienced humans. A fun time to be alive for sure.

discuss

Rochus|4 days ago

Cool, I was at ETH when Modula-2 was en vogue, and we also had lectures where we programmed transputers in Occam-2.

In contrast to my experiences with e.g. Gemini 3 Pro, where it regularly happened that the LLM claimed to have reached full features scope in each iteration, but the result turned out to be full of stubs, Devin at least doesn't pull my leg and delivers what was agreed, but unfortunately debugging and fixing takes much more time than generating the initial version (about factor five). But so far I never tried to run an LLM project over such a long time as you did; must have cost a fortune.

dboreham|4 days ago

Cost me almost nothing in inference time (I have the monthly subscription), although if I had been paying myself at consulting rates it would have cost a few thousand for my time "LLM whispering" :) For clarity: I wasn't running the LLM for a month solid. I was on vacation in New Zealand -- I'd fire up the laptop in the AirBnB most nights and make Claude add a couple features, fix some bugs. Repeat rinse.

I find that it's uncannily like running a team of eager but not too experienced engineers: those humans would also show up claiming to have "finished". I'd say "well does it run so and so test ok?". They'd go away, come back a few days later... The LLM acts much the same. You have to keep it on a short leash but when it gets cracking on a problem it's amazing to watch. E.g. I saw it write countless test programs on the fly to diagnose a parser hang bug. It would try this and that, binary chopping on the problematic source file. If I was doing that myself I'd need a few strong coffees before diving in.