(no title)
Rochus | 4 days ago
In contrast to my experiences with e.g. Gemini 3 Pro, where it regularly happened that the LLM claimed to have reached full features scope in each iteration, but the result turned out to be full of stubs, Devin at least doesn't pull my leg and delivers what was agreed, but unfortunately debugging and fixing takes much more time than generating the initial version (about factor five). But so far I never tried to run an LLM project over such a long time as you did; must have cost a fortune.
dboreham|4 days ago
I find that it's uncannily like running a team of eager but not too experienced engineers: those humans would also show up claiming to have "finished". I'd say "well does it run so and so test ok?". They'd go away, come back a few days later... The LLM acts much the same. You have to keep it on a short leash but when it gets cracking on a problem it's amazing to watch. E.g. I saw it write countless test programs on the fly to diagnose a parser hang bug. It would try this and that, binary chopping on the problematic source file. If I was doing that myself I'd need a few strong coffees before diving in.