(no title)
jaen | 4 days ago
LLMs generating code to solve ARC-AGI is literally what they do these days, so as far as I see, basically this entire exercise is equivalent to just running "Deep Think" test-time compute type models and committing their output to Github?
What exactly was the novel, un-LLMable human input here?
kofdai|3 days ago
1. The Inference Engine is 100% Deterministic: The "solver" is a standalone Python program (26K lines + NumPy). At runtime, it has zero neural dependencies. It doesn't call an LLM, it doesn't load weights, and it doesn't "hallucinate." It performs a combinatorial search over a formal Domain Specific Language (DSL). You could run this on a legacy machine with no internet connection. This is fundamentally different from o1/o3 or Grok-Thinking, where the model is the solver at test-time.
2. The "Novel Human Input" is the DSL Design: Using an LLM to help write Python boilerplate is trivial. Using an LLM to design a 7-phase symbolic pipeline that solves ARC is currently impossible. My core contributions that an LLM could not "reason" out are:
The Cross DSL: The insight that ~57% of ARC transforms can be modeled by local 5-cell Von Neumann neighborhoods.
Iterative Residual Learning: A gradient-free strategy where the system synthesizes a transform, calculates the residual error on the grid, and iteratively synthesizes "correction" programs.
Pruning & Verification: Implementing a formal verification loop where every candidate solution is checked against the 3-5 training examples before being proposed.
3. Scaling through Logic, not Compute: While the industry spends millions on "Test-time Compute" (GPU-heavy CoT), Verantyx achieves 18.1% (and now 20% in v6) using Symbolic Synthesis on a single CPU. The 208 commits in the repo represent 208 iterations of staring at grid failures and manually expanding the primitive vocabulary to cover topological edge cases that LLMs consistently miss.
If using Copilot to speed up the implementation of a deterministic search algorithm invalidates the algorithm, then we’d have to invalidate most modern OS kernels or compilers written today. The "intelligence" isn't in the typing; it's in the program synthesis architecture that does what pure LLM inference cannot.
I'd encourage you to check the source—it's just pure, brute-force symbolic logic: https://github.com/Ag3497120/verantyx-v6
jaen|3 days ago
> I'd encourage you to check the source
I couldn't have written my comment without reading the source, obviously!
> o1/o3 or Grok-Thinking, where the model is the solver at test-time.
What? I said that the SotA is the model generating code at test-time, not solving it directly via CoT/ToT etc.