top | item 46908906

(no title)

atomicnumber3 | 24 days ago

"When was the last time you reviewed the machine code produced by a compiler?"

Compilers will produce working output given working input literally 100% of my time in my career. I've never personally found a compiler bug.

Meanwhile AI can't be trusted to give me a recipe for potato soup. That is to say, I would under no circumstances blindly follow the output of an LLM I asked to make soup. While I have, every day of my life, gladly sent all of the compiler output to the CPU without ever checking it.

The compiler metaphor is simply incorrect and people trying to say LLMs compile English into code insult compiler devs and English speakers alike.

discuss

order

LiamPowell|24 days ago

> Compilers will produce working output given working input literally 100% of my time in my career.

In my experience this isn't true. People just assume their code is wrong and mess with it until they inadvertently do something that works around the bug. I've personally reported 17 bugs in GCC over the last 2 years and there are currently 1241 open wrong-code bugs.

Here's an example of a simple to understand bug (not mine) in the C frontend that has existed since GCC 4.7: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105180

grey-area|24 days ago

These are still deterministic bugs, which is the point the OP was making. They can be found and solved once. Most of those bugs are simply not that important, so they never get attention.

LLMS on the other hand are non-deterministic and unpredictable and fuzzy by design. That makes them not ideal when trying to produce output which is provably correct - sure you can output and then laboriously check the output - some people find that useful, some are yet to find it useful.

It's a little like using Bitcoin to replace currencies - sure you can do that, but it includes design flaws which make it fundamentally unsuited to doing so. 10 years ago we had rabid defenders of these currencies telling us they would soon take over the global monetary system and replace it, nowadays, not so much.

throw10920|24 days ago

> I've personally reported 17 bugs in GCC over the last 2 years

You are an extreme outlier. I know about two dozen people who work with C(++) and not a single one of them has ever told me that they've found a compiler bug when we've talked about coding and debugging - it's been exclusively them describing PEBCAK.

rhubarbtree|24 days ago

This argument is disingenuous and distracts rather than addresses the point.

Yes, it is possible for a compiler to have a bug. No, that is I’m mo way analogous to AI producing buggy code.

I’ve experienced maybe two compiler bugs in my twenty year career. I have experienced countless AI mistakes - hundreds? Thousands? Already.

These are not the same and it has the whiff of sales patter trying to address objections. Please stop.

dbtablesorrows|24 days ago

the fact that the bug tracker exists is proving GP's point.

eklavya|24 days ago

Right, now what would you say is the probability of getting a bug in compiler output vs ai output?

It's a great tool, once it matures.

rootnod3|24 days ago

Absolutely this. I am tired of that trope.

Or the argument that "well, at some point we can come up with a prompt language that does exactly what you want and you just give it a detailed spec." A detailed spec is called code. It's the most round-about way to make a programming language that even then is still not deterministic at best.

wtetzner|24 days ago

And at the point that your detailed specification language is deterministic, why do you need AI in the middle?

andai|24 days ago

This is obviously besides the point but I did blindly follow a wiener schnitzel recipe ChatGPT made me and cooked for a whole crew. It turned out great. I think I got lucky though, the next day I absolutely massacred the pancakes.

D-Machine|24 days ago

I genuinely admire your courage and willingness (or perhaps just chaos energy) to attempt both wiener schnitzel and pancakes for a crew, based on AI recipes, despite clearly limited knowledge of either.

bonesss|24 days ago

Recent experiments with LLM recipes (ChatGPT): missed salt in a recipe to make rice, then flubbed whether that type of rice was recommended to be washed in the recipe it was supposedly summarizing (and lied about it, too)…

Probabilistic generation will be weighted towards the means in the training data. Do I want my code looking like most code most of the time in a world full of Node.js and PHP? Am I better served by rapid delivery from a non-learning algorithm that requires eternal vigilance and critical re-evaluation or with slower delivery with a single review filtered through an meatspace actor who will build out trustable modules in a linear fashion with known failure modes already addressed by process (ie TDD, specs, integration & acceptance tests)?

I’m using LLMs a lot, but can’t shake the feeling that the TCO and total time shakes out worse than it feels as you go.

bostik|24 days ago

Everything more complex than a hello-world has bugs. Compiler bugs are uncommon, but not that uncommon. (I must have debugged a few ICEs in my career, but luckily have had more skilled people to rely on when code generation itself was wrong.)

Compilers aren't even that bad. The stack goes much deeper and during your career you may be (un)lucky enough to find yourself far below compilers: https://bostik.iki.fi/aivoituksia/random/developer-debugging...

NB. I've been to vfs/fs depths. A coworker relied on an oscilloscope quite frequently.

nneonneo|24 days ago

I had a fun bug while building a smartwatch app that was caused by the sample rate of the accelerometer increasing when the device heated up. I had code that was performing machine learning on the accelerometer data, which would mysteriously get less accurate during prolonged operation. It turned out that we gathered most of our training data during shorter runs when the device was cool, and when the device heated up during extended use, it changed the frequencies of the recorded signals enough to throw off our model.

I've also used a logic analyzer to debug communications protocols quite a few times in my career, and I've grown to rather like that sort of work, tedious as it may be.

Just this week I built a VFS using FUSE and managed to kernel panic my Mac a half-dozen times. Very fun debugging times.

pcl|24 days ago

”I've never personally found a compiler bug.”

I remember the time I spent hours debugging a feature that worked on Solaris and Windows but failed to produce the right results on SGI. Turns out the SGI C++ compiler silently ignored the `throw` keyword! Just didn’t emit an opcode at all! Or maybe it wrote a NOP.

All I’m saying is, compilers aren’t perfect.

I agree about determinism though. And I mitigate that concern by prompting AI assistants to write code that solves a problem, instead of just asking for a new and potentially different answer every time I execute the app.

Ygg2|24 days ago

Compilers don't change output assemby based on what markdown you provide them via .claude.

Or what tone of voice in prompt you gave them. Or if Saturn is in Aries or Sagittarius.

idopmstuff|24 days ago

> Meanwhile AI can't be trusted to give me a recipe for potato soup.

This just isn't true any more. Outside of work, my most common use case for LLMs is probably cooking. I used to frequently second guess them, but no longer - in my experience SOTA models are totally reliable for producing good recipes.

I recognize that at a higher level we're still talking about probabilistic recipe generation vs. deterministic compiler output, but at this point it's nonetheless just inaccurate to act as though LLMs can't be trusted with simple (e.g. potato soup recipe) tasks.

bayindirh|24 days ago

Compilers and processors are deterministic by design. LLMs are non-deterministic by design.

It's not apples vs. oranges. They are literally opposite of each other.

Scene_Cast2|24 days ago

Just to nitpick - compilers (and, to some extent, processors) weren't deterministic a few decades ago. Getting them to be deterministic has been a monumental effort - see build reproducibility.

anematode|24 days ago

I'm trying to track down a GCC miscompilation right now ;)

keyle|24 days ago

I feel for you :D

wtetzner|24 days ago

> The compiler metaphor is simply incorrect

If an LLM was analogous to a compiler, then we would be committing prompts to source control, not the output of the LLM (the "machine code").

jen729w|24 days ago

> Meanwhile AI can't be trusted to give me a recipe for potato soup.

Because there isn’t a canonical recipe for potato soup.

lebuin|24 days ago

There's also no canonical way to write software, so in that sense generating code is more similar to coming up with a potato soup recipe than compiling code.

Jensson|24 days ago

That is not the issue, any potato soup recipe would be fine, the issue is that it might fetch values from different recipes and give you an abomination.

keyle|24 days ago

You're correct, and I believe this is only a matter of time. Over time it has been getting better and will keep doing so.

blks|24 days ago

It won’t be deterministic.

wtetzner|24 days ago

The input to LLMs is natural language. Natural language is ambiguous. No amount of LLM improvements will change that.

bigstrat2003|24 days ago

Maybe. But it's been 3 years and it still isn't good enough to actually trust. That doesn't raise confidence that it will ever get there.

senko|24 days ago

> Compilers will produce working output given working input literally 100% of my time in my career. I've never personally found a compiler bug.

First compilers were created in the fifties. I doubt those were bug-free.

Give LLMs some fifty or so years, then let's see how (un)reliable they are.

wtetzner|24 days ago

What I don't understand about these arguments is that the input to the LLMs is natural language, which is inherently ambiguous. At which point, what does it even mean for an LLM to be reliable?

And if you start feeding an unambiguous, formal language to an LLM, couldn't you just write a compiler for that language instead of having the LLM interpret it?