top | item 45351102

(no title)

This sounds very similar to my workflow. Do you have pre-commits or CI beyond testing? I’ve started thinking about my codebase as an RL environment with the pre-commits as hyperparameters. It’s fascinating seeing what coding patterns emerge as a result.

discuss

joshvm|5 months ago

I think pre-commit is essential. I enforce conventional commits (+ a hook which limits commit length to 50 chars) and for Python, ruff with many options enabled. Perhaps the most important one is to enforce complexity limits. That will catch a lot of basic mistakes. Any sanity checks that you can make deterministic are a good idea. You could even add unit tests to pre-commit, but I think it's fine to have the model run pytest separately.

The models tend to be very good about syntax, but this sort of linting will often catch dead code like unused variables or arguments.

You do need to rule-prompt that the agent may need to run pre-commit multiple times to verify the changes worked, or to re-add to the commit. Also, frustratingly, you also need to be explicit that pre-commit might fail and it should fix the errors (otherwise sometimes it'll run and say "I ran pre-commit!") For commits there are some other guardrails, like blanket denying git add <wildcard>.

Claude will sometimes complain via its internal monologue when it fails a ton of linter checks and is forced to write complete docstrings for everything. Sometimes you need to nudge it to not give up, and then it will act excited when the number of errors goes down.

iagooar|5 months ago

Very solid advice. I need to experiment more with the pre-commit stuff, I am a bit tired of reminding the model to actually run tests / checks. They seem to be as lazy about testing as your average junior dev ;)

iagooar|5 months ago

Yes, I do have automated linting (a bit of a PITA at this scale). On the CI side I am using Github Actions - it does the job, but haven't put much work into it yet.

Generally I have observed that using a statically typed language like Typescript helps catching issues early on. Had much worse results with Ruby.