top | item 45608978

(no title)

buzzovich | 4 months ago

Ah, you're misunderstanding. I'm not measuring "cognition."

See, it's much simpler.

Concrete test setup:

  - Flawed codebase given to agents for review

  - Agent A: Standard behavioural instructions

  - Agent B: Same + COGNITION::ETHOS (4 lines added)
Agent B found 20% more flaws than Agent A. Only variable: those 4 lines.

Objective measurement: count of flaws detected.

N=40 runs, statistically significant improvement.

The evidence is all in the repo.

discuss

order

No comments yet.