Possibly, but as of now it's a completely unsolved problem and to my knowledge nobody has shown even a tiny model being able to perform it.
Based on the top page today I may even be able to make the argument we can't even simulate the abilities of a fruit fly.
The absolute frontier models can perform only a fraction of a fraction of what a typical work day looks like for a human. I calculated the chance that a frontier model today has about a 1^-29 chance of performing a single day of connected tasks based on GAIA benchmarks.
arminiusreturns|1 year ago
charlescurt123|1 year ago
Based on the top page today I may even be able to make the argument we can't even simulate the abilities of a fruit fly.
The absolute frontier models can perform only a fraction of a fraction of what a typical work day looks like for a human. I calculated the chance that a frontier model today has about a 1^-29 chance of performing a single day of connected tasks based on GAIA benchmarks.