Aloe's neurosymbolic system just beat OpenAI's deep research score on the GAIA benchmark by 20 points. While Gary is full of bluster, he does know a few things about the limitations of LLMs. :) (aloe.inc)
Yeah there was on old paper that blew math/physics benchmarks out of the water by letting the LLM write code and having the physics engine execute it. I don't have a link to it off my head but that seems to be the right directly.
LLM + general tool use seems to be quite effective.
nojvek|6 months ago
LLM + general tool use seems to be quite effective.