top | item 42909139

(no title)

that would help with decidable problems but would still be not generalisable for problems with non trivial rewards, or ones with none.

discuss

astrange|1 year ago

Reasoning seems to generalize, insofar as o1 and DeepSeek-R1 are better at answering questions than their base models.