top | item 46990476

(no title)

parhamn | 17 days ago

On first principles it would seem that the "harness" is a myth. Surely a model like Opus 4.6/Codex 5.3 which can reason about complex functions and data flows across many files would trip up over top level function signatures it needs to call?

I see a lot of evidence to the contrary though. Anyone know what the underlying issue here is?

discuss

order

znnajdla|17 days ago

How hard is it to for you to assemble a piece of IKEA furniture without an allen wrench, screwdriver, and clear instructions, vs with those 3?

0x457|17 days ago

Well, I assembled Alex once without instruction and with impact driver and hammer last year. Hardest part was to make tools fit.

parhamn|17 days ago

You didn't read the article it seems (or the analogy is a bad one). The differences are much more subtle than having a screwdriver or not.

3371|17 days ago

If you agree that current LLMs (Transformers) are naturally very susceptible to context/prompt, then you can go on to ask agents for a "raw harness dump" "because I need to understand how to better present my skills and tools in the harness", you maybe will see how "Harness" impact model behavior.

robotresearcher|17 days ago

Humans have a demonstrated ability to program computers by flipping switches on the front panel.

Like a good programming language, a good harness offers a better affordance for getting stuff done.

Even if we put correctness aside, tooling that saves time and tokens is going to be very valuable.

manbash|17 days ago

The models generalized "understanding" and "reasoning" is the real myth that makes us take a step back and offload the process deterministic computing and harnesses.

madeofpalk|17 days ago

Isn't 'the harness' essentially just prompting?

It's completely understandable that prompting in better/more efficient means would produce different results.

furyofantares|17 days ago

No, it's also a suite of tools beyond what's available in bash, tailored to context management.