top | item 46972862

(no title)

mschild | 18 days ago

I find that with more complex projects (full-stack application with some 50 controllers, services, and about 90 distinct full-feature pages) it often starts writing code that simply breaks functionality.

For example, had to update some more complex code to correctly calculate a financial penalty amount. The amount is defined by law and recently received an overhaul so we had to change our implementation.

Every model we tried (and we have corporate access and legal allowance to use pretty much all of them) failed to update it correctly. Models would start changing parts of the calculation that didn't need to be updated. After saying that the specific parts shouldn't be touched and to retry, most of them would go right back to changing it again. The legal definition of the calculation logic is, surprisingly, pretty clear and we do have rigorous tests in place to ensure the calculations are correct.

Beyond that, it was frustrating trying to get the models to stick to our coding standards. Our application has developers from other teams doing work as well. We enforce a minimum standard to ensure code quality doesn't suffer and other people can take over without much issue. This standard is documented in the code itself but also explicitly written out in the repository in simple language. Even when explicitly prompting the models to stick to the standard and copy pasting it into the actual chat, it would ignore 50% of it.

The most apt comparison I can make is that of a consultant that always agrees with you to your face but when doing actual work, ignores half of your instructions and you end up running after them to try to minimize the mess and clean up you have to do. It outputs more code but it doesn't meet the standards we have. I'd genuinely be happy to offload tasks to AI so I can focus on the more interesting parts of work I have, but from my experience and that of my colleagues, its just not working out for us (yet).

discuss

judahmeek|18 days ago

I noticed that you said "models" & not "agents". Agents can receive feedback from automated QA systems, such as linters, unit, & integration tests, which can dramatically improve their work.

There's still the risk that the agent will try to modify the QA systems themselves, but that's why there will always be a human in the loop.

mschild|18 days ago

Should've clarified in that case. I used models as a general stand-in for AI.

To provide a bit more context: - We use VS Code (plus derivatives like Cursor) hooked up to general modals and allowing general context access to the entire repository. - We have a MCP server that has access to company internal framework and tools (especially the documentation) so it should know how they are used.

So far, we've found 2 use-cases that make AI work for us: 1. Code Review. This took quite a bit of refinement for the instructions but we've got it to a point where it provides decent comments on the things we want it to comment on. It still fails on the more complex application logic, but will consistently point out minor things. It's used now as a Pre-PR review so engineers can use it and fix things before publishing a PR. Less noise for the rest of the developers. 2. CRUD croft like tests for a controller. We still create the controller endpoint, but providing it with the controller, DTOs, and an example of how another controller has its tests done, it will produce decent code. Even then, we still often have to fix a couple of things and debug to see where it went wrong like fixing a broken test by removing the actual strictlyEquals() call.

Just keeping up with newest AI changes is hard. We all have personal curiosity but at the end of the day, we need to deliver our product and only have so much time to experiment with AI stuff. Nevermind all the other developments in our regulatory heavy environment and tech stack we need to keep on top off.