One must now ask whether research results are analyzing pure LLMs (eg. gpt-series) or LLM synthesis engines (eg. o-series, r-series). In this case, the headline is summarizing a paper originally published in 2023 and does not necessarily have bearing on new synthesis engines. In fact, evidence strongly suggests the opposite given o3's significant performance on ARC-AGI-1 which requires on-the-fly composition capability.
orbital-decay|1 year ago
[1] https://arxiv.org/abs/2412.02975