top | item 47178269

(no title)

kofdai | 2 days ago

Title: [Show HN] Verantyx Update: 22.7% on ARC-AGI-2 using Human-Logic + OpenClaw Loop

Body: Following up on my previous post (where I was at 18.1%), I’ve just reached 22.7% (227/1000) on the ARC-AGI-2 public evaluation set.

I want to address the skepticism regarding my development speed. As an undergraduate student in Japan, I have limited manual coding time. To overcome this, I’ve established a "Human-Architect / AI-Builder" research loop.

How the 24/7 loop works:

Human (Me): I analyze failed tasks to identify underlying geometric patterns and design new DSL primitives (e.g., the new gravity_solver and cross3d_geometry in v62).

AI Agent (OpenClaw/Claude Code): Based on my architectural design, the agent scaffolds the implementation, performs rigorous regression testing across all 1,000 tasks, and refines the code for performance.

This synergy allows for a high-frequency commit cycle that a single developer could never achieve alone, while ensuring the inference engine remains 100% symbolic and deterministic. At test-time, there are zero LLM calls; it's pure structural reasoning.

V62 Key Updates:

Gravity Solver: 4 distinct strategies for object sliding/gravity-based transformations.

Cross3D Geometry Engine: Improved handling of 3D-projected cross structures.

Score: 22.7% (monotonically increasing from 20.1% and 22.4% earlier this week).

I believe this hybrid development model—where human intuition drives logic and AI agents drive implementation—is the fastest path to 80%+ on the "Humanity's Last Exam".

I'm eager to hear your thoughts on this "System 2" approach and the role of AI agents in building symbolic AI.

GitHub: https://github.com/Ag3497120/verantyx-v6 Project Site: https://verantyx.ai

discuss

No comments yet.