top | item 45577849

(no title)

alwahi | 4 months ago

if claude generates the tests, runs those tests, applies the fixes without any oversight, it is a very "who watches the watchmen" situation.

discuss

order

vidarh|4 months ago

That is true, so don't give it entirely free reign with that. I let Claude generate as many additional tests as it'd like, but I either produce high level tests, or review a set generated by Claude first, before I let it fill in the blanks, and it's instructed very firmly to see a specific set of test cases as critical, and then increasingly "boxed in" with more validated test cases as we go along.

E.g. for my compiler, I had it build scaffolding to make it possible to run rubyspecs. Then I've had it systematically attack the crashes and failures mostly by itself once the test suite ran.

ErikBjare|4 months ago

If you generate the tests, run those tests, apply fixes without any oversight, it is the very same situation. In reality, we have PR reviews.

skydhash|4 months ago

Is it? Stuff like ripgrep, msmpt,… are very much one-man project. And most packages on distro are maintained by only one person. Expertise is a thing and getting reliable results is what differentiates expert from amateurs.

fragmede|4 months ago

Gemini?

gmb_uk|4 months ago

Good lord, that would be like the blind leading the daft.