top | item 45425682

(no title)

Most programmers don't understand the low level assembly or machine code. High level language becomes the layer where human comprehension and collaboration happens.

LLM is pushing that layer towards natural language and spec-driven development. The only *big* difference is that high level programming languages are still deterministic but natural language is not.

I'm guessing we've reached an irreducible point where the amount of information needed specify the behavior of a program is nearly optimally represented in programming languages after decades of evolution. More abstraction into the natural language realm would make it lossy. And less abstraction down to the low level code would make it verbose.

discuss

adamddev1|5 months ago

The difference is not just a jump to a higher abstraction with natural language. It's something fundamentally differet.

The previous tools (assemblers, compilers, frameworks) were built on hard-coded logic that can be checked and even mathematically verified. So you could trust what you're standing on. But with LLMs we jump off the safely-built tower into a world of uncertainty, guesses, and hallucinations.

mym1990|5 months ago

If LLMs still produce code that is eventually compiled down to a very low level...that would mean it can be checked and verified, the process just has additional steps.

JavaScript has a ton of behavior that is very uncertain at times and I'm sure many JS developers would agree that trusting what you're standing on is at times difficult. There is also a large percentage of developers that don't mathematically verify their code, so the verification is kind of moot in those cases, hence bugs.

The current world of LLM code generation lacks the verification you are looking for, however I am guessing that these tools will soon emerge in the market. For now, building as incrementally as possible and having good tests seems to be a decent path forward.

austin-cheney|5 months ago

> Most programmers don't understand the low level assembly or machine code.

Most programmers that write JavaScript for a living don't really understand how to scale applications in JavaScript, which includes data structures in JavaScript. There is a very real dependence on layers of abstractions to enable features that can scale. They don't understand the primary API to the browser, the DOM, at all and many don't understand the Node API outside the browser.

For an outside observer it really begs the Office Space question: What would you say you do here? Its weird trying to explain it to people completely outside software. For the rest of us in software we are so used to this we take the insanity for granted as an inescapable reality.

Ironically, at least in the terms of your comment, is that when you confront JavaScript developers about this lack of fundamental knowledge comparisons to assembly frequently come up. As though writing JavaScript directly is somehow equivalent to writing machine code, but for many people in that line of work they are equivalent distant realities.

The introduction of LLMs makes complete sense. When nobody knows how any of this code works then there isn't a harm to letting a machine write it for you, because there isn't a difference in the underlying awareness.

rmunn|5 months ago

> Most programmers that write JavaScript for a living don't really understand how to scale applications in JavaScript, which includes data structures in JavaScript. There is a very real dependence on layers of abstractions to enable features that can scale.

Although I'm sure you are correct, I would also want to mention that most programmers that write JavaScript for a living aren't working for Meta or Alphabet or other companies that need to scale to billions, or even millions, of users. Most people writing JavaScript code are, realistically, going to have fewer than ten thousand users for their apps. Either because those apps are for internal use at their company (such as my current project, where at most the app is going to be used by 200-250 people, so although I do understand data structures I'm allowing myself to do O(N^2) business logic if it simplifies the code, because at most I need to handle 5-6 requests per minute), or else because their apps are never going to take off and get the millions of hits that they're hoping for.

If you don't need to scale, optimizing for programmer convenience is actually a good bet early on, as it tends to reduce the number of bugs. Scaling can be done later. Now, I don't mean that you should never even consider scaling: design your architecture so that it doesn't completely prevent you from scaling later on, for example. But thinking about scale should be done second. Fix bugs first, scale once you know you need to. Because a lot of the time, You Ain't Gonna Need It.

foo42|5 months ago

A side effect of the non-deterministic behaviour is that, unlike previous increases in abstraction, the high level prompts are not checked in to the code base and available to recreate their low level output on demand. Instead we commit the lower level output (ie code) and future revisions must operate on this output without the ability to modify the original high level instructions.

the_duke|5 months ago

I feel like natural language specs can play a role, but there should be an intermediate description layer with strict semantics.

Case in point: I'm seeing much more success in LLM driven coding with Rust, because the strong type system prevents many invalid states that can occur in more loosely or untyped languages.

It takes longer, and often the LLM has to iterate through `cargo check` cycles to get to a state that compiles, but once it does the changes are very often correct.

The Rust community has the saying "if it compiles, it probably works". You can still have plenty of logic bugs of course , but the domain of possible mistakes is smaller.

What would be ideal is a very strict (logical) definition of application semantics that LLMs have to implement, and that ideally can be checked against the implementation. As in: have a very strict programming language with dependent types , littered with pre/post conditions, etc.

LLMs can still help to transform natural language descriptions into a formal specification, but that specification should be what drives the implementation.

redsymbol|5 months ago

There is another big difference: natural languages have ambiguity baked in. If a programming language has any ambiguity in how it can be parsed, that is rightly considered a major bug. But it's almost a feature of natural languages, allowing poetry, innuendo, and other nuanced forms of communication.

int_19h|5 months ago

There are constructed languages that preserve the expressivity of natural human languages but without the implicit ambiguity, though; most notably, Loglan and its successor Lojban. If you read Golden Age sci-fi, Loglan sometimes shows up there specifically in this role - e.g. "Moon is a Harsh Mistress":

> By then Mike had voder-vocoder circuits supplementing his read-outs, print-outs, and decision-action boxes, and could understand not only classic programming but also Loglan and English, and could accept other languages and was doing technical translating—and reading endlessly. But in giving him instructions was safer to use Loglan. If you spoke English, results might be whimsical; multi-valued nature of English gave option circuits too much leeway.

For those unfamiliar with it, it's not that Lojban is perfectly unambiguous. It's that its design strives to ensure that ambiguity is always deliberate by making it explicit.

The obvious problem with all this is that Lojban is a very niche language with a fairly small corpus, so training AI on it is a challenge (although it's interesting to note that existing SOTA models can read and write it even so, better than many obscure human languages). However, Lojban has the nice property of being fully machine parseable - it has a PEG grammar. And, once you parse it, you can use dictionaries to construct a semantic tree of any Lojban snippet.

When it comes to LLMs, this property can be used in two ways. First, you can use structured output driven by the grammar to constrain the model to output only syntactically valid Lojban at any point. Second, you can parse the fully constructed text once it has been generated, add semantic annotations, and feed the tree back into the model to have it double-check that what it ended up writing means exactly what it wanted to mean.

With SOTA models, in fact, you don't even need the structured output - you can just give them parser as a tool and have them iterate. I did that with Claude and had it produce Lojban translations that, while not perfect, were very good. So I think that it might be possible, in principle, to generate Lojban training data out of other languages, and I can't help but wonder what would happen if you trained a model primarily on that; I suspect it would reduce hallucinations and generally improve metrics, but this is just a gut feel. Unfortunately this is a hypothesis that requires a lot of $$$ to properly test...

low_tech_punk|5 months ago

I had a similar thought, feature not bug.

The nature of programming might have to shift to embrace the material property of LLM. It could become a more interpretative, social, and discovery-based activity. Maybe that's what "vibe coding" would eventually become.

archy_|5 months ago

C has a lot of ambiguity in how it is parsed ("undefined behavior") but people usually view that as a benefit because it allows compilers more freedom to dictate an implementation.

lxgr|5 months ago

> The only big difference is that high level programming languages are still deterministic but natural language is not.

Arguably, determinism isn't everything in programming: It's very possible to have perfectly deterministic, yet highly surprising (in terms of actual vs. implied semantics to a human reader) code.

In other words, the axis "high/low level of abstraction" is orthogonal to the "deterministic/probabilistic" one.

raincole|5 months ago

Yes, but determinism is still very important in this case. It means you only need to memorize the surprising behavior once (like literally every single senior programmer has memorized their programming language's quirks even they don't want to).

Without determinism, learning becomes less rewarding.

tossandthrow|5 months ago

A program with ambiguities will not work, a spec with ambiguities is, on the other hand, incredibly common.

Specs are not more abstract but more ambiguous, which is not the same thing.

drdrek|5 months ago

Somehow many very smart AI entrepreneurs do not understand the concept of limits to lossless data compression. If an idea cannot be reduced further without losing information, no amount of AI is going to be able to compress it.

This is why you see so many failed startup around slack/email/jira efficiency. Half the time you do not know if you missed critical information so you need to go to the source, negating gains you had with information that was successfully summarized.

dorkrawk|5 months ago

Downloading music off the internet is just the next logical step after taping songs off the radio. Cassette tapes didn't really affect the music industry, so I wouldn't worry about this whole Napster thing.