That's a pretty trivial example for one of these IDEs to knock out. Assembly is certainly in their training sets, and obviously docker is too. I've watched cursor absolutely run amok when I let it play around in some of my codebase.
I'm bullish it'll get there sooner rather than later, but we're not there yet.
I'm very pro LLM and AI. But I completely agree with the comment about how many pieces praising LLMs are doing so with trivial examples. Trivial might not be the right word, but I can't think of a better one that doesn't have a negative connotation, but this shouldn't be negative. Your examples are good and useful, and capture a bunch of tasks a software engineer would do.
I'd say your mandelbrot debug and the LLVM patch are both "trivial" in the same sense: they're discrete, well defined, clear-success-criteria-tasks that could be assigned to any mid/senior software engineer in a relevant domain and they could chip through it in a few weeks.
Don't get me wrong, that's an insane power and capability of LLMs, I agree. But ultimately it's just doing a day job that millions of people can do sleep deprived and hungover.
Non-trivial examples are things that would take a team of different specialist skillsets months to create. One obvious potential reason why there's few non-trivial AI examples is because non-trivial AI examples require non-trivial amount of time to be able to generate and verify.
A non-trivial example isn't an example you can look at the output and say "yup, AI's done well here". It requires someone spends time going into what's been produced, assessing it, essentially redesigning it as a human to figure out all the complexity of a modern non-trivial system to confirm the AI actually did all that stuff correctly.
An in depth audit of a complex software system can take months or even years and is a thorough and tedious task for a human, and the Venn diagrams of humans who are thinking "I want to spend more time doing thorough, tedious code tasks" and "I want to mess around with AI coding" is 2 separate circles.
It coming from computer science might be the issue. There's a lot of open source repos out there that have tricky bugs, and todo lists of features that are too complex or time consuming for casual contributors to tackle. Adding significant value to an open source project is a pretty nice demo that won't get called "pretty trivial".
The complexity of the problem masqerades the common problem of providing sensible context to your AI of choice to have it doing something constructive in your personal codebase. Or giving it tools to check the truth of one of its assertions. Something a developer does countless times.
No the hardest problem is teaching CS undergrads. I just started this year (no background in academia, just 75% of a PhD and well-rounded life experience) and I’ve basically torn up the entire curriculum they handed to me and started vibe-teaching.
Because they are trivial in a way that you can go on GitHub and copy one of those while not pretending LLM isn't a mashup of the internet.
What people agree on being non-trivial is working on a real project. There's a lot of opensource projects that could benefit from a useful code contribution. But they only got slop thrown at them.
I have one: features I've tried this on in my codebase. Because claude and gemini have both failed pretty badly.
So it's pretty stupid to just assume that critics haven't tried.
Example feature: send analytics events on app start triggered by notifications. Both Gemini and Claude completely failed to understand the component tree; rewrote hundreds of lines of code in broken ways; and even when prompted with the difficulty (this is happening outside of the component tree), failed to come up with a good solution. And even when deliberately prompted not to, like to simultaneously make cosmetic code changes to other pieces of the files they're touching.
I suspect personal tools are as close as we're going to get to this mythical demo that satisfies all critics. i.e. here is a list of problems i've solved with just AI.
Strikes a balance between simplicity and real world usefulness
simonw|8 months ago
1dom|8 months ago
I'd say your mandelbrot debug and the LLVM patch are both "trivial" in the same sense: they're discrete, well defined, clear-success-criteria-tasks that could be assigned to any mid/senior software engineer in a relevant domain and they could chip through it in a few weeks.
Don't get me wrong, that's an insane power and capability of LLMs, I agree. But ultimately it's just doing a day job that millions of people can do sleep deprived and hungover.
Non-trivial examples are things that would take a team of different specialist skillsets months to create. One obvious potential reason why there's few non-trivial AI examples is because non-trivial AI examples require non-trivial amount of time to be able to generate and verify.
A non-trivial example isn't an example you can look at the output and say "yup, AI's done well here". It requires someone spends time going into what's been produced, assessing it, essentially redesigning it as a human to figure out all the complexity of a modern non-trivial system to confirm the AI actually did all that stuff correctly.
An in depth audit of a complex software system can take months or even years and is a thorough and tedious task for a human, and the Venn diagrams of humans who are thinking "I want to spend more time doing thorough, tedious code tasks" and "I want to mess around with AI coding" is 2 separate circles.
sroussey|8 months ago
dust42|8 months ago
fragmede|8 months ago
afro88|8 months ago
Can't be too far off!
cranium|8 months ago
The implicit decisions it had to make were also inconsequential, eg. selection of ASCII chars, color or not, bounds of the domain,...
However, it shows that agents are powerful translators / extractors of general knowledge!
raxxorraxor|8 months ago
th0ma5|8 months ago
j45|8 months ago
jkhdigital|8 months ago
skydhash|8 months ago
What people agree on being non-trivial is working on a real project. There's a lot of opensource projects that could benefit from a useful code contribution. But they only got slop thrown at them.
kayge|8 months ago
x0x0|8 months ago
So it's pretty stupid to just assume that critics haven't tried.
Example feature: send analytics events on app start triggered by notifications. Both Gemini and Claude completely failed to understand the component tree; rewrote hundreds of lines of code in broken ways; and even when prompted with the difficulty (this is happening outside of the component tree), failed to come up with a good solution. And even when deliberately prompted not to, like to simultaneously make cosmetic code changes to other pieces of the files they're touching.
pydry|8 months ago
What do you think is so difficult about doing the same thing with coding problems?
Havoc|8 months ago
Strikes a balance between simplicity and real world usefulness
simonw|8 months ago