I’ve gotten 0 production usable python out of any LLM. Small script to do something trivial, sure. Anything I’m going to have to maintain or debug in the future, not even close. I think there is a _lot_ of terrible python code out there training LLMs, so being a more popular language is not helpful. This era is making transparent how low standards really are.
overfeed|10 months ago
Fascinating, I wonder how you use it because once I decompose code to modules and function signatures, Claude[0] is pretty good at implementing Python functions. I'd say it one-shots 60% of the times, I have to tweak the prompt or adjust the proposed diffs 30%, and the remaining 10% is unusable code that I end up writing by hand. Other things Claude is even better at: writing tests, simple refactors within a module, authoring first-draft docstrings, adding context-appropriate type hints.
0. Local LLMs like Gemma3, Qwen-coder seem to be in the same ballpark in terms of capabilities, it's just that they are much slower on my hardware. Except for the 30b Qwen3 MoE that was released a day ago, that one is freakin' fast.
cmorgan31|10 months ago
thelittleone|10 months ago
startupsfail|10 months ago
motbus3|10 months ago