(no title)
Rochus
|
4 days ago
Interesting. How long did it take until the result effectively worked? What agent did you use? I recently tried to generate an Oberon to C99 transpiler in devin.ai from an existing parser and validated AST I have already implemented myself in C++, but after two days of round-tripping and the LLM increasingly entangling in special cases and producing strange code with more and more redundancy, I stopped and wrote it myself. That was a costly exercise which didn't succeed. The language was not the problem. Devin showed it understood Oberon. But it got completely confused with the different variants of arrays (fixed size by value, fixed or variable size dynamic, fixed or variable size var or value parameter) and spread redundant code all over losing track. I also tried to make it generate a Risc-V code generator, which was neither successful (bug fixing didn't seem to converge and even seemed to turn in circles).
dboreham|4 days ago
Anyway, I began this project while on vacation (again) then completed it while attending a conference, so the work wasn't 100% duty cycle. That said it took about a month from beginning to the current state. You can see in the linked article almost all the LLM sessions that built the project.
LLMs do seem to be a bit narcissistic as you've alluded to -- confidently declaring that it has implemented "PRI PAR" for example, but conveniently not mentioning that it only parsed the keywords and didn't in fact implement priority semantics. This reminds me of less experienced developers I've managed in the past. Loth to deliver bad news.
This project was all done with Claude. When I began I was given the Opus 4.5 model but fairly early in the timeline Anthropic enabled the new Opus 4.6 model. This was before its official release so I'm not sure if they have a rollout policy that targeted me or my project. Anyway, most of the work was Opus 4.6.
Overall I learned a tremendous amount about what today's frontier models can do: I could probably give 4-5 talks on various things I noticed, or talk for a few hours over beers. General take away was that the experience was uncannily similar to developing software as a human, or running a team of somewhat less experienced humans. A fun time to be alive for sure.
Rochus|4 days ago
In contrast to my experiences with e.g. Gemini 3 Pro, where it regularly happened that the LLM claimed to have reached full features scope in each iteration, but the result turned out to be full of stubs, Devin at least doesn't pull my leg and delivers what was agreed, but unfortunately debugging and fixing takes much more time than generating the initial version (about factor five). But so far I never tried to run an LLM project over such a long time as you did; must have cost a fortune.