(no title)
jsw97 | 5 months ago
Separately, I have been meaning to implement a cheating detector — have run into Claude modifying problem statements, adding axioms, etc.
jsw97 | 5 months ago
Separately, I have been meaning to implement a cheating detector — have run into Claude modifying problem statements, adding axioms, etc.
ants_everywhere|5 months ago
> have run into Claude modifying problem statements, adding axioms, etc.
Same here. I've thought about creating a utility that tells Claude it has to keep going until a test exits with nonzero status. But I'm concerned Claude would just fake everything to make the test pass.
manmal|5 months ago
jsw97|5 months ago