top | item 46905783

(no title)

My first reaction: wow, incredible.

My second reaction: still incredible, but noting that a C compiler is one of the most rigorously specified pieces of software out there. The spec is precise, the expected behavior is well-defined, and test cases are unambiguous.

I'm curious how well this translates to the kind of work most of us do day-to-day where requirements are fuzzy, many edge cases are discovered on the go, and what we want to build is a moving target.

discuss

ndesaulniers|24 days ago

> C compiler is one of the most rigorously specified pieces of software out there

/me Laughs in "unspecified behavior."

ori_b|24 days ago

There's undefined behavior, which is quite well specified. What do you mean by unspecified behavior? Do you have an example?

irishcoffee|24 days ago

Undefined is absolutely clear in the spec.

Unspecified is whatever you want it to mean. I am also laughing, having never heard "unspecified" before.

astrange|23 days ago

The C spec is certainly not formal or precise.

https://www.ralfj.de/blog/2020/12/14/provenance.html

Another example is that it's unclear from the standard if you can write malloc() in C.

butterNaN|23 days ago

Sure but the point OP is making is that it is still more spec'd than most real world problems

cryptonector|24 days ago

> My second reaction:

This is the key: the more you constrain the LLM, the better it will perform. At least that's my experience with Claude. When working with existing code, the better the code to begin with, the better Claude performs, while if the code has issues then Claude can end up spinning its wheels.

softwaredoug|24 days ago

Yes I think any codegen with a lot of tests and verification is more about “fitting” to the tests. Like fitting an ML model. It’s model training, not coding.

But a lot of programming we discover correctness as we go, one reason humans don’t completely exit the loop. We need to see and build tests as we go, giving them particular care and attention to ensure they test what matters.

uywykjdskn|24 days ago

The agent can obviously do that