top | item 42476353

(no title)

kvn8888 | 1 year ago

That would be a ton of problems for a small team of PhD/Grad level experts to solve (for GPQA Diamond, etc) in a short time. Remember, on EpochAl Frontier Math, these problems require hours to days worth of reasoning by humans

The author also suggested this is a new architecture that uses existing methods, like a Monte Carlo tree search that deepmind is investigating (they use this method for AlphaZero)

I don't see the point of colluding for this sort of fraud, as these methods like tree search and pruning already exist. And other labs could genuinely produce these results

discuss

order

agnosticmantis|1 year ago

I had the ARC AGI in mind when I suggested human workers. I agree the other benchmark results make the use of human workers unlikely.