exactly. hindsight bias makes it really hard to separate genuine inference from subtle prompt leakage. even framing the question can accidentally steer it toward the right answer. would be interesting to try with completely synthetic problems first just to test the method.
No comments yet.