On first principles it would seem that the "harness" is a myth. Surely a model like Opus 4.6/Codex 5.3 which can reason about complex functions and data flows across many files would trip up over top level function signatures it needs to call?
I see a lot of evidence to the contrary though. Anyone know what the underlying issue here is?
If you agree that current LLMs (Transformers) are naturally very susceptible to context/prompt, then you can go on to ask agents for a "raw harness dump" "because I need to understand how to better present my skills and tools in the harness", you maybe will see how "Harness" impact model behavior.
The models generalized "understanding" and "reasoning" is the real myth that makes us take a step back and offload the process deterministic computing and harnesses.
znnajdla|17 days ago
0x457|17 days ago
parhamn|17 days ago
3371|17 days ago
unknown|17 days ago
[deleted]
robotresearcher|17 days ago
Like a good programming language, a good harness offers a better affordance for getting stuff done.
Even if we put correctness aside, tooling that saves time and tokens is going to be very valuable.
manbash|17 days ago
madeofpalk|17 days ago
It's completely understandable that prompting in better/more efficient means would produce different results.
furyofantares|17 days ago