A million times, this. Sometimes they luck into the intent, but much more frequently they end up in a ball of mud that just happens to pass the tests.
"8 unit tests? Great, I'll code up 8 branches so all your tests pass!" Of course that neglects the fact that there's now actually 2^8 paths through your code.
What makes you think the next generation models won't be explicitly trained to prevent this, or any other pitfall or best practice as the low hanging fruit fall one by one?
if you can steer an LLM to write an application based on what you want, you can steer an LLM to write the tests you want. Some people will be better at getting the LLM to write tests, but it's only going to get easier and easier
This is one of the reasons why I just wrote a testing book (beta reviews giving feedback now). Testing is one of those boring subjects that many programmers ignore. But it just got very relevant. Especially TDD.
No, OP is merely an AI deepthroater that will blindly swallow whatever drivel is put out by AI companies and then "benchmark" it by having it generate a pelican (oh and he got early access to the model), then call whatever he puts out "AI optimism"
The reality of things is, AI still can't handle long running tasks without blowing $500k worth of tokens for an end result that doesn't work, and further work is another $100k worth to get nothing novel.
Where are you pulling these numbers from? I'm genuinely interested. Is it the kind of budget you need to spend in order to have Claude build a Word clone?
utopiah|1 month ago
daxfohl|1 month ago
"8 unit tests? Great, I'll code up 8 branches so all your tests pass!" Of course that neglects the fact that there's now actually 2^8 paths through your code.
Art9681|1 month ago
andrewchambers|1 month ago
Perhaps more advanced llms + specifications + better tests.
krashidov|1 month ago
__mharrison__|1 month ago
well_ackshually|1 month ago
The reality of things is, AI still can't handle long running tasks without blowing $500k worth of tokens for an end result that doesn't work, and further work is another $100k worth to get nothing novel.
Xmd5a|1 month ago