The more code is out there, the worse is the average in the training dataset. There will be legacy approaches and APIs, poor design choices, popular use cases irrelevant for your context etc that increase the chances of output not matching your expectations. In Java world this is exactly how it works. I need 3-5 iterations with Claude to get things done the way I expect, sometimes jumping straight to manual refactoring and then returning the result to Claude for review and learning. My CLAUDE.md (multiple of them) are growing big with all patterns and anti-patterns identified this way. To overcome this problem model needs specialized training, that I don‘t think the industry knows how to approach (it has to beat the effort put in the education system for humans).
mjdiloreto|14 hours ago
re-thc|14 hours ago
We already have coding tuned models i.e. Codex. We should just have language / technology specific models with a focus on recent / modern usage.
Problem with something like Java is too old -- too many variants. Make a cut off like at least above Java 8 or 17.
ivan_gammel|4 hours ago
The “just” part is a big assumption. It is far from easy, given that modern best practices are always underspecified. The effective model for coding must have reasoning signals to be much stronger than coding patterns, and that, I suspect, requires very different architecture.