(no title)
buzzovich | 4 months ago
See, it's much simpler.
Concrete test setup:
- Flawed codebase given to agents for review
- Agent A: Standard behavioural instructions
- Agent B: Same + COGNITION::ETHOS (4 lines added)
Agent B found 20% more flaws than Agent A.
Only variable: those 4 lines.Objective measurement: count of flaws detected.
N=40 runs, statistically significant improvement.
The evidence is all in the repo.
No comments yet.