Claude’s C Compiler vs. GCC

I think this is a great example of both points of view in the ongoing debate.

Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!

Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.

Pro: Sure, but we can get the agent to fix that.

Anti: Can you, though? We've seen that the more complex the code base, the worse the agents do. Fixing complex issues in a compiler seems like something the agents will struggle with. Also, if they could fix it, why haven't they?

Pro: Sure, maybe now, but the next generation will fix it.

Anti: Maybe. While the last few generations have been getting better and better, we're still not seeing them deal with this kind of complexity better.

Pro: Yeah, but look at it! This is amazing! A whole compiler in just a few hours! How many millions of hours were spent getting GCC to this state? It's not fair to compare them like this!

Anti: Anthropic said they made a working compiler that could compile the Linux kernel. GCC is what we normally compile the Linux kernel with. The comparison was invited. It turned out (for whatever reason) that CCC failed to compile the Linux kernel when GCC could. Once again, the hype of AI doesn't match the reality.

Pro: but it's only been a few years since we started using LLMs, and a year or so since agents. This is only the beginning!

Anti: this is all true, and yes, this is interesting. But there are so many other questions around this tech. Let's not rush into it and mess everything up.

Alupis|21 days ago

I'm reminded, once again, of the recent "vibe coded" OCaml fiasco[1].

The PR author had zero understanding why their entirely LLM-generated contribution was viewed so suspiciously.

The article validates a significant point: it is one thing to have passing tests and be able to produce output that resembles correctness - however it's something entirely different for that output to be good and maintainable.

[1] https://github.com/ocaml/ocaml/pull/14369

bgirard|21 days ago

This to me sounds a lot like the SpaceX conversation:

- Ohh look it can [write small function / do a small rocket hop] but it can't [ write a compiler / get to orbit]!

- Ohh look it can [write a toy compiler / get to orbit] but it can't [compile linux / be reusable]

- Ohh look it can [compile linux / get reusable orbital rocket] but it can't [build a compiler that rivals GCC / turn the rockets around fast enough]

- <Denial despite the insane rate of progress>

There's no reason to keep building this compiler just to prove this point. But I bet it would catch up real fast to GCC with a fraction of the resources if it was guided by a few compiler engineers in the loop.

We're going to see a lot of disruption come from AI assisted development.

gignico|21 days ago

Exactly. This flawed argument by which everything will be fixed by future models drives me crazy every time.

dataflow|21 days ago

> Pro: Sure, maybe now, but the next generation will fix it.

Do we need a c2 wiki page for "sufficiently smart LLM" like we do for https://wiki.c2.com/?SufficientlySmartCompiler ?

ares623|21 days ago

Pretty much. It's missing a tiny detail though. One side is demanding we keep giving hundreds of billions to them and at the same time promising the other side's unemployment.

nikitau|21 days ago

And not to mention that a C compiler is something we have literally 50 years worth of code for. I still seriously doubt the ability of LLMs to tackle truly new problems.

rk06|21 days ago

As an Anti, my argument is "if AI will good in future, then come back in the future"

frizlab|21 days ago

I think you also forgot: Anti: But the whole thing can only have been generated because GCC and other compilers already exists (and depending on how strong the anti-feeling is: and has been stolen…)!

zozbot234|20 days ago

You didn't even mention that this vibe-coded toy compiler cost $20k in token spend. That's an insane amount of money for what this is.

Rapzid|21 days ago

Two completely valid perspectives.

Unless you need a correctly compiled Linux kernel. In that case one gets exhausting real quick.

weli|20 days ago

> this is all true, and yes, this is interesting. But there are so many other questions around this tech. Let's not rush into it and mess everything up.

That's a really nice fictitious conversation but in my experience "anti-ai" people would be prone to say "This is stupid LLM's will never be able to write complex code and attempting to do so is futile". If your mind is open to explore how LLM's will actually write complex software then by definition you are not "anti".

aurareturn|21 days ago

I don't think this is how pro and anti conversation goes.

I think the pro would tell you that if GCC developers could leverage Opus 4.6, they'd be more productive.

The anti would tell you that it doesn't help with productivity, it makes us less versed in the code base.

I think the CCC project was just a demonstration on what Opus can do now autonomously. 99.9% of software projects out there aren't building something as complex as a Linux compiler.

anonnon|21 days ago

The "Anti" stance is only tenable now if you believe LLMs are going to hit a major roadblock in the next few months around which Big AI won't be able to navigate. Something akin to the various "ghosts in the machine" that started bedeviling EEs after 2000 when transistors got sufficiently small, including gate leakage and sub-threshold current, such that Dennard Scaling came to an abrupt end and clock speeds stalled.

I personally hope that that happens, but I doubt it will. Note also that processors still continued to improve even without Dennard Scaling due to denser, better optimized onboard caches, better branch prediction, and more parallelism (including at the instruction level), and the broader trend towards SoCs and away from PCB-based systems, among other things. So at least by analogy, it's not impossible that even with that conjectured roadblock, Big AI could still find room for improvement, just at a much slower rate.

But current LLMs are thoroughly compelling, and even just continued incremental improvements will prove massively disruptive to society.

nvrmnd|21 days ago

> It's not fair to compare them like this!

As someone who leans pro in this debate, I don't think I would make that statement. I would say the results are exactly as we expect.

Also, a highly verifiable task like this is well suited to LLMs, and I expect within the next ~2 years AI tools will produce a better compiler than gcc.

abrbhat|21 days ago

It seems that the cause of the difference in opinion is that the anti camp is looking at the current state while the pro camp looking at the slope and projecting it into the future.

palmotea|20 days ago

> Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!

> Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.

Also, from the Anti-LLM perspective: did the coding agent actually build a working compiler, or just plagiarize prior art? C compilers are certainly part of the LLM's training set.

That's relevant because the implication seems to be: "Look, the agent can successfully develop really advanced software!" when the reality may be that it can plagiarize existing advanced software, and will fall on its face if asked to do anything not already done before.

A lot of propaganda and hype follows the pattern of presenting things in a way to creating misleading implications in the mind of the listener that the facts don't actually support.

63stack|20 days ago

This is spot on, you can find traces of this conversation in the original thread posted on HN as well, where people are proclaiming "yeah it doesn't work, but still impressive!"

Reminds me so much of the people posting their problems about the tesla cybertruck and ending the post with "still love the truck though"

red75prime|21 days ago

> Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!

> Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.

Pro-LLM: Read the freaking article, it's not that long. The compiler made a mistake in an area where only two compilers exist that are up to the task: Linux Kernel.

viraptor|20 days ago

That's such a strawman conversation. Starting from:

> it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.

It works. It's not perfect, but anthropic claims to have successfully compiled and booted 3 different configurations with it. The blog post failed to reproduce one specific version on one specific architecture. I wish anthropic gave us more information about which kernel commits succeeded, but still. Compare this to years that it took for clang to compile the kernel, yet people were not calling that compiler useless.

If anyone thinks other compilers "just work", I invite them to start fixing packages that fail to build in nixos after every major compiler change, to get a dose of real world experience.

Wowfunhappy|20 days ago

> Pro: but it's only been a few years since we started using LLMs, and a year or so since agents. This is only the beginning!

The billion dollar question is, can we get from 80% to 100%? Is this going to be a situation where that final gap is just insurmountable, or will the capabilities simply keep increasing?

quaintdev|21 days ago

I read a Youtube comment recently on pro AI video, it was

"The source code of gcc is available online"

sylware|20 days ago

You are missing the most important:

"Pro": give me tons of money to keep going this endeavour.

More seriously: if some LLM or other can _assist_ c++ to plain and simple C port...

bicepjai|20 days ago

This encompasses all the back-and-forth arguments I can think of. I would assume that proponents will eventually also mention AGI or ASI. :)

yoyohello13|21 days ago

I think LLMs the technology is very cool and l’m frankly amazed at what it can do. What I’m ‘anti’ about is pushing the entire economy all in on LLM tech. The accelerationist take of ‘just keep going as fast as possible and it will work out, trust me bro’ is the most unhinged dangerous shit I’ve ever heard and unfortunately seems to be the default worldview of those in charge of the money. I’m not sure where all the AI tools will end up, but I am willing to bet big that the average person is not going to be better off 10 years from now. The direction the world is going scares the shit out of me and the usages of AI by bad actors is not helping assuage that fear.

Honestly? I think if we as a society could trust our leaders (government and industry) to not be total dirtbags the resistance to AI would be much lower.

Like imagine if the message was “hey, this will lead to unemployment, but we are going to make sure people can still feed their families during the transition, maybe look in to ways to support subsidies retraining programs for people whose jobs have been impacted .” Seems like a much more palatable narrative than, “fuck you pleb! go retrain as a plumber or die in a ditch. I’ll be on my private island counting the money I made from destroying your livelihood.”

agumonkey|20 days ago

This is a pattern I see a lot, in programming languages communities too, where it's a source a joy and dreams first and facts later.

kaycey2022|20 days ago

Maybe Anthropic can sponsor a research team to polish this using just an agent. A lot of things can be learned from that exercise.

giancarlostoro|20 days ago

We could be colonizing Mars with Claude Code and there will always be some skeptic somewhere.

NitpickLawyer|21 days ago

Two thoughts here:

First, remember when we had LLMs run optimisation passes last year? Alphaevolve doing square packing, and optimising ML kernels? The "anti" crowd was like "well, of course it can automatically optimise some code, that's easy". And things like "wake me up when it does hard tasks". Now, suddenly when they do hard tasks, we're back at "haha, but it's unoptimised and slow, laaame".

Second, if you could take 100 juniors, 100 mid level devs and 100 senior devs, lock them in a room for 2 weeks, how many working solutions that could boot up linux in 2 different arches, and almost boot in the third arch would you get? And could you have the same devs now do it in zig?

The thing that keeps coming up is that the "anti" crowd is fighting their own deamons, and have kinda lost the plot along the way. Every "debate" is about promisses, CEOs, billions, and so on. Meanwhile, at every step of the way these things become better and better. And incredibly useful in the right hands. I find it's best to just ignore the identity folks, and keep on being amazed at the progress. The haters will just find the next goalpost and the next fight with invisible entities. To paraphrase - those who can, do, those who can't, find things to nitpick.

resfirestar|21 days ago

What does this imagined conversation have to do with the linked article? The “pro” and “anti” character both sound like the kind of insufferable idiots I’d expect to encounter on social media, the OP is a very nice blog post about performance testing and finding out what compilers do, doesn’t attempt any unwarranted speculation about what agents “struggle with” or will do “next generation”, how is it an example of that sort of shitposting?

unknown|21 days ago

[deleted]

abbyprog|21 days ago

I'm firmly in the anti/unimpressed camp so far - but of course open to see where it goes.

I mean this compiler is the equivalent of handing someone a calculator when it was first invented and seeing that it took 2 hours to multiply two numbers together, I would go "cool that you have a machine that can do math, but I can multiply faster by hand, so it's a useless device to me".

nineteen999|20 days ago

I mean - who would honestly expect an LLM to be able to compete with a compiler with 40 years of development behind it? Even more if you count the collective man years expended in that time. The Claude agents took two weeks to produce a substandard compiler, under the fairly tight direction of a human who understood the problem space.

At the same time - you could direct Claude to review the register spilling code and the linker code of both LLVM/gcc for potential improvements to CCC and you will see improvements. You can ask it not to copy GPL code verbatim but to paraphrase and tell it it can rip code from LLVM as long as the licenses are preserved. It will do it.

You might only see marginal improvements without spending another $100K on API calls. This is about one of the hardest projects you could ask it to bite off and chew on. And would you trust the compiler output yet over GCC or LLVM?

Of course not.

But I wager, that if you _started_ with the LLVM/gcc codebases and asked it to look for improvements - it might be surprising to see what it finds.

Both sides have good arguments. But this could be a totally different ball game in 2, 5 and 10 years. I do feel like those who are most terrified by it are those whose identity is very much tied to being a programmer, and seeing the potential for their role to be replaced and I can understand that.

Me personally - I'm relieved I finally have someone else to blame and shout at rather than myself for the bugs in the software I produce. I'm relieved that I can focus now on the more creative direction and design of my personal projects (and even some work projects on the non-critical paths) and not get bogged down in my own perfectionism with respect to every little component until reaching exhaustion and giving up.

And I'm fascinated by the creativity of some of the projects I see that are taking the same mindset and approach.

I was depressed by it at first. But as I've experimented more and more, I've come to enjoy seeing things that I couldn't ever have achieved even with 100 man years of my own come to fruition.

raincole|21 days ago

Are you trying to demonstrate a textbook example of straw man argument?

soulofmischief|21 days ago

In my experience, it is often the other way around. Enthusiasts are tasked with trying to open minds that seem very closed on the subject. Most serious users of these tools recognize the shortcomings and also can make well-educated guesses on the short term future. It's the anti crowd who get hellbent on this ridiculously unfounded "robots are just parrots and can't ever replace real programmers" shtick.

DSMan195276|21 days ago

Something that bothers me here is that Anthropic claimed in their blog post that the Linux kernel could boot on x86 - is this not actually true then? They just made that part up?

It seemed pretty unambiguous to me from the blog post that they were saying the kernel could boot on all three arch's, but clearly that's not true unless they did some serious hand-waving with kernel config options. Looking closer in the repo they only show a claimed Linux boot for RISC-V, so...

[0]: https://www.anthropic.com/engineering/building-c-compiler - "build a bootable Linux 6.9 on x86, ARM, and RISC-V."

[1]: https://github.com/anthropics/claudes-c-compiler/blob/main/B... - only shows a test of RISC-V

bjackman|21 days ago

My guess is that CCC works if you disable static keys/DKMS/etc.

In the specific case of __jump_table I would even guess there was some work in getting the Clang build working.

rich_sasha|21 days ago

It's really cool to see how slow unoptimised C is. You get so used to seeing C easily beat any other language in performance that you assume it's really just intrinsic to the language. The benchmark shows a SQLite3 unoptimised build 12x slower for CCC, 20x for optimised build. That's enormous!

I'm not dissing CCC here, rather I'm impressed with how much speed is squeezed out by GCC out of what is assumed to be already an intrinsically fast language.

kingstnap|21 days ago

The speed of C is still largely intrinsic to the language.

The primatives are directly related to the actual silicon. A function call is actually going to turn into a call instruction (or get inlined). The order of bytes in your struct are how they exist in memory, etc. A pointer being dereferenced is a load/store.

The converse holds as well. Interpreted languages are slow because this association with the hardware isn't the case.

When you have a poopy compiler that does lots of register shuffling then you loose this association.

Specifically the constant spilling with those specific functions functions that were the 1000x slowdown, makes the C code look a lot more like Python code (where every variable is several dereference away).

MobiusHorizons|21 days ago

I mean you can always make things slower. There are lots of non-optimizing or low optimizing compilers that are _MUCH_ faster than this. TCC is probably the most famous example, but hardly the only alternative C compiler with performance somewhere between -O1 and -O2 in GCC. By comparison as I understand it, CCC has performance worse than -O0 which is honestly a bit surprising to me, since -O0 should not be a hard to achieve target. As I understand it, at -O0 C is basically just macro expanding into assembly with a bit of order of operations thrown in. I don't believe it even does register allocation.

josephcsible|21 days ago

> the build failed at the linker stage

> The compiler did its job fine

> Where CCC Succeeds Correctness: Compiled every C file in the kernel (0 errors)

I don't think that follows. It's entirely possible that the compiler produced garbage assembly for a bunch of the kernel code that would make it totally not work even if it did link. (The SQLite code passing its self tests doesn't convince me otherwise, because the Linux kernel uses way more advanced/low-level/uncommon features than SQLite does.)

IshKebab|20 days ago

Yeah I saw a post on LinkedIn (can't find it again sorry) where they found that CCC compiles C by mostly just ignoring errors. `const` is a nop. It doesn't care if you redefine variables with different types, use a string where an `int` is expected, etc.

Whenever I've done optimisation (e.g. genetic algorithms / simulated annealing) before you always have to be super careful about your objective function because the optimisation will always come up with some sneaky lazy way to satisfy it that you didn't think of. I guess this is similar - their objective was to compile valid C code and pass some tests. They totally forgot about not compiling invalid code.

measurablefunc|21 days ago

I agree. Lack of errors is not an indicator of correct compilation. Piping something to /dev/null won't provide any errors either & so there is nothing we can conclude from it. The fact that it compiles SQLite correctly does provide some evidence that their compiler at least implements enough of the C semantics involved in SQLite.

yosefk|21 days ago

"Ironically, among the four stages, the compiler (translation to assembly) is the most approachable one for an AI to build. It is mostly about pattern matching and rule application: take C constructs and map them to assembly patterns.

The assembler is harder than it looks. It needs to know the exact binary encoding of every instruction for the target architecture. x86-64 alone has thousands of instruction variants with complex encoding rules (REX prefixes, ModR/M bytes, SIB bytes, displacement sizes). Getting even one bit wrong means the CPU will do something completely unexpected.

The linker is arguably the hardest. It has to handle relocations, symbol resolution across multiple object files, different section types, position-independent code, thread-local storage, dynamic linking and format-specific details of ELF binaries. The Linux kernel linker script alone is hundreds of lines of layout directives that the linker must get exactly right."

I worked on compilers, assemblers and linkers and this is almost exactly backwards

stlee42|21 days ago

Exactly this. Linker is threading given blocks together with fixups for position-independent code - this can be called rule application. Assembler is pattern matching.

This explanation confused me too:

  Each individual iteration: around 4x slower (register spilling)
  Cache pressure: around 2-3x additional penalty (instructions do not fit in L1/L2 cache)
  Combined over a billion iterations: 158,000x total slowdown

If each iteration is X percent slower, then a billion iterations will also be X percent slower. I wonder what is actually going on.

vidarh|21 days ago

Claude one-shot a basic x86 assembler + linker for me. Missing lots of instructions, yes, but that is a matter of filling in tables of data mechanically.

Supporting linker scripts is marginally harder, but having manually written compilers before, my experience is the exact opposite of yours.

lelanthran|21 days ago

I am inclined to agree with you... but, did CC produce a working linker as well as a working compiler?

I thought it was just the compiler that Anthropic produced.

tyre|21 days ago

As a neutral observation: it’s remarkable how quickly we as humans adjust expectations.

Imagine five years ago saying that you could have a general purpose AI write a c compiler that can handle the Linux kernel, by itself, from scratch for $20k by writing a simple English prompt.

That would have been completely unbelievable! Absurd! No one would take it seriously.

And now look at where we are.

mawadev|21 days ago

Now consider how much of the original C compiler's source code it was trained on and still managed to output a worse result?

jbjbjbjb|21 days ago

> a simple English prompt

And that’s where my suspicion stems from.

An equivalent original human piece of work from an expert level programmer wouldn’t be able to do this without all the context. By that I mean all the all the shared insights, discussion and design that happened when making the compiler.

So to do this without any of that context is likely just very elaborate copy pasta.

skydhash|20 days ago

> Imagine five years ago saying that you could have a general purpose AI write a c compiler that can handle the Linux kernel, by itself, from scratch for $20k by writing a simple English prompt.

You’re very conveniently ignoring the billions in training and that it has practically the whole internet as input.

orangecoffee|21 days ago

Indeed, it's the Overton window that has moved. Which is why I secretly think the pro-AI side is more right than the anti-AI side. Makes me sad.

suddenlybananas|20 days ago

Wasn't there a fair amount of human intervention in the AI agents? My understanding is, the author didn't just write "make me a c compiler in rust" but had to intervene at several points, even if he didn't touch the code directly.

AstroBen|21 days ago

You're right. It's been pretty incredible. It's also frustrating as hell though when people extrapolate from this progress

Just because we're here doesn't mean we're getting to AGI or software developers begging for jobs at Starbucks

unknown|20 days ago

[deleted]

IshKebab|20 days ago

I totally agree, but I think a lot of the push-back is that this is presented as better than it actually is.

player1234|20 days ago

[deleted]

o175|21 days ago

The 158,000x slowdown on SQLite is the number that matters here, not whether it can parse C correctly. Parsing is the solved problem — every CS undergrad writes a recursive descent parser. The interesting (and hard) parts of a compiler are register allocation, instruction selection, and optimization passes, and those are exactly where this falls apart.

That said, I think the framing of "CCC vs GCC" is wrong. GCC has had thousands of engineer-years poured into it. The actually impressive thing is that an LLM produced a compiler at all that handles enough of C to compile non-trivial programs. Even a terrible one. Five years ago that would've been unthinkable.

The goalpost everyone should be watching isn't "can it match GCC" — it's whether the next iteration closes that 158,000x gap to, say, 100x. If it does, that tells you something real about the trajectory.

teraflop|21 days ago

The part of the article about the 158,000x slowdown doesn't really make sense to me.

It says that a nested query does a large number of iterations through the SQLite bytecode evaluator. And it claims that each iteration is 4x slower, with an additional 2-3x penalty from "cache pressure". (There seems to be no explanation of where those numbers came from. Given that the blog post is largely AI-generated, I don't know whether I can trust them not to be hallucinated.)

But making each iteration 12x slower should only make the whole program 12x slower, not 158,000x slower.

Such a huge slowdown strongly suggests that CCC's generated code is doing something asymptotically slower than GCC's generated code, which in turn suggests a miscompilation.

I notice that the test script doesn't seem to perform any kind of correctness testing on the compiled code, other than not crashing. I would find this much more interesting if it tried to run SQLite's extensive test suite.

wrxd|21 days ago

This thing has likely all of GCC, clang and any other open source C compiler in its training set.

It could have spotted out GCC source code verbatim and matched its performance.

arvyy|20 days ago

CS undergrads write parsers for some toy lisps or other straight forward syntax. C isn't as trivial https://faultlore.com/blah/c-isnt-a-language/#you-cant-actua...

(a small remark, but to be clear I'm not terribly impressed by AI showcase of the c compiler, nor with browser before that, as it stands)

yc-kraln|20 days ago

It's really difficult for me to understand the level of cynicism in the HN comments on this topic, at all. The amount of goalpost-moving and redefining is absolutely absurd. I really get the impression that the majority of the HN comments are just people whining about sour grapes, with very little value added to the discussion.

I'd like to see someone disagree with the following:

Building a C compiler, targeting three architectures, is hard. Building a C compiler which can correctly compile (maybe not link) the modern linux kernel is damn hard. Building a C compiler which can correctly compile sqlite and pass the test suite at any speed is damn hard.

To the specific issues with the concrete project as presented: This was the equivalent of a "weekend project", and it's amazing

So what if some gcc is needed for the 16-bit stuff? So what if a human was required to steer claude a bit? So what if the optimizing pass practically doesn't exist?

Most companies are not software companies, software is a line-item, an expensive, an unavoidable cost. The amount of code (not software engineering, or architecture, but programming) developed tends towards glue of existing libraries to accomplish business goals, which, in comparison with a correct modern C compiler, is far less performance critical, complex, broad, etc. No one is seriously saying that you have to use an LLM to build your high-performance math library, or that you have to use an LLM to build anything, much in the same way that no one is seriously saying that you have to rewrite the world in rust, or typescript, or react, or whatever is bothering you at the moment.

I'm reminded of a classic slashdot comment--about attempting to solve a non-technical problem with technology, which is doomed to fail--it really seems that the complaints here aren't about the LLMs themselves, or the agents, but about what people/organizations do with them, which is then a complaint about people, but not the technology.

DauntingPear7|18 days ago

Someone will have to know how to steer the LLM to fix/update/maintain the bespoke software they decided to use, so still a large cost there

gjulianm|20 days ago

> This was the equivalent of a "weekend project", and it's amazing

I mean, $20k in tokens, plus the supervision by the author to keep things running, plus the number of people that got involved according to the article... doesn't look like "a weekend project".

> Building a C compiler which can correctly compile (maybe not link) the modern linux kernel is damn hard.

Is it correctly compiling it? Several people have pointed out that the compiler will not emit errors for clearly invalid code. What code is it actually generating?

> Building a C compiler which can correctly compile sqlite and pass the test suite at any speed is damn hard.

It's even harder to have a C compiler that can correctly compile SQLite and pass the test suite but then the SQLite binary itself fails to execute certain queries (see https://github.com/anthropics/claudes-c-compiler/issues/74).

> which, in comparison with a correct modern C compiler, is far less performance critical, complex, broad, etc.

That code might be less complex for us, but more complex for an LLM if it has to deal with lots of domain-specific context and without a test suite that has been developed for 40 years.

Also, if the end result of the LLM has the same problem that Anthropic concedes here, which is that the project is so fragile that bug fixes or improvements are really hard/almost impossible, that still matters.

> it really seems that the complaints here aren't about the LLMs themselves, or the agents, but about what people/organizations do with them, which is then a complaint about people, but not the technology

It's a discussion about what the LLMs can actually do and how people represent those achievements. We're point out that LLMs, without human supervision, generate bad code, code that's hard to change, with modifications specifically made to address failing tests without challenging the underlying assumptions, code that's inconsistent and hard to understand even for the LLMs.

But some people are taking whatever the LLM outputs at face value, and then claiming some capabilities of the models that are not really there. They're still not viable for using without human supervision, and because the AI labs are focusing on synthetic benchmarks, they're creating models that are better at pushing through crappy code to achieve a goal.

jamesnorden|20 days ago

"sour grapes" means nothing in this context

jrflowers|21 days ago

It seems like if Anthropic released a super cool and useful _free_ utility (like a compiler, for example) that was better than existing counterparts or solved a problem that hadn’t been solved before[0] and just casually said “Here is this awesome thing that you should use every day. By the way our language model made this.” it would be incredible advertising for them.

But they instead made a blog post about how it would cost you twenty thousand dollars to recreate a piece of software that they do not, with a straight face, actually recommend that you use in any capacity beyond as a toy.

[0] I am categorically not talking about anything AI related or anything that is directly a part of their sales funnel. I am talking about a piece of software that just efficiently does something useful. GCC is an example, Everything by voidtools is an example, Wireshark is an example, etc. Claude is not an example.

vidarh|20 days ago

They made a blog post about it because it's an amazing test of the abilities of the models to deliver a working C-compiler, even with lots of bugs and serious caveats, for $20k of tokens, without a human babysitting it.

I'd challenge anyone who are negative to this to try to achieve what they did by hand, with the same restrictions (e.g. generating full SSA form instead of just directly emitting code, capable of compiling Linux), and log their time doing it.

Having written several compilers, I'll say with some confidence that not many developers would succeed. Far fewer would succeed fast enough to compete with $20k cost. Even fewer would do that and deliver decent quality code.

Now notice the part where they've done this experiment before. This is the first time it succeeded. Give it another model iteration or two, and expect quality to increase, and price to drop.

This is the new floor.

tjr|21 days ago

I wonder how much more it would take Anthropic to make CCC on par with, or even better than, GCC.

fc417fc802|20 days ago

> Combined over a billion iterations: 158,000x total slowdown

I don't think that's a valid explanation. If something takes 8x as long then if you do it a billion times it still takes 8x as long. Just now instead of 1 vs 8 it's 1 billion vs 8 billion.

I'd be curious to know what's actually going on here to cause a multiple order of magnitude degradation compared to the simpler test cases (ie ~10x becomes ~150,000x). Rather than I-cache misses I wonder if register spilling in the nested loop managed to completely overwhelm L3 causing it to stall on every iteration waiting for RAM. But even that theory seems like it could only account for approximately 1 order of magnitude, leaving an additional 3 (!!!) orders of magnitude unaccounted for.

I think there's a lot more to the story here.

piinbinary|20 days ago

That stuck out to me as well.

I wonder if there could be a bug where extra code runs but the result is discarded (and the code that runs happens to have no side effects).

The post also says

> That is roughly 1 billion iterations

but that doesn't sound right because GCC's version runs in only 0.047s, and no CPU can do a billion iterations that quickly.

tngranados|20 days ago

Building a C compiler is definitely hard for humans, but I don’t think it’s particularly strong evidence of "intelligence" from an LLM. It’s a very well understood, heavily documented problem with lots of existing implementations and explanations in the training data.

These kinds of tasks are relatively easy for LLMs, they’re operating in a solved design space and recombining known patterns. It looks impressive to us because writing a compiler from scratch is difficult and time consuming for a human, not because of the problem itself.

That doesn’t mean LLMs aren’t useful, even if progress plateaued tomorrow, they’d still be very valuable tools. But building yet another C compiler or browser isn’t that compelling as a benchmark. The industry keeps making claims about reasoning and general intelligence, but I’d expect to see systems producing genuinely new approaches or clearly better solutions, not just derivations of existing OSS.

Instead of copying a big project, I'd be more impressed if they could innovate in a small one.

stevefan1999|21 days ago

I think one of the issue is that the register allocation algorithm -- alongside the SSA generation -- is not enough.

Generally after the SSA pass, you convert all of them into register transfer language (RTL) and then do register allocation pass. And for GCC's case it is even more extreme -- You have GIMPLE in the middle that does more aggressive optimization, similar to rustc's MIR. CCC doesn't have all that, and for register allocation you can try to do simple linear scan just as the usual JIT compiler would do though (and from my understanding, something CCC should do at a simple cost), but most of the "hard part" of compiler today is actually optimization -- frontend is mostly a solved problem if you accept some hacks, unlike me who is still looking for an elegant academic solution to the typedef problem.

adgjlsfhk1|21 days ago

Note that the LLVM approach to IR is probably a bit more sane than the GCC one. GCC has ~3 completely different IRs at different stages in the pipeline, while LLVM mostly has only canonical IR form for passing data around through the optimization passes (and individual passes will sometimes make their own temporary IR locally to make a specific analysis easier).

hackyhacky|21 days ago

What is the typedef problem?

thefourthchime|21 days ago

"The miracle is not that the bear can dance well, it's that the bear can dance at all."

- Old Russian proverb.

butipaidfor|21 days ago

But the poster, the ticket seller, and the ringmaster all said "Anna Pavlova reincarnated, a Bear that can dance as well as famous Ice Skaters!"

antirez|21 days ago

A few things to note:

1. In the real world, for a similar task, there are little reasons for: A) not giving the compiler access to all the papers about optimizations, ISAs PDFs, MIT-licensed compilers of all the kinds. It will perform much better, and this is a proof that the "uncompressing GCC" is just a claim (but even more point 2).

2. Of all the tasks, the assembler is the part where memorization would help the most. Instead the LLM can't perform without the ISA documentation that it saw repreated infinite number of times during pre-training. Guess what?

3. Rust is a bad language for the test, as a first target, if you want an LLM-coded Rust C compiler, and you have LLM experience, you would go -> C compiler -> Rust port. Rust is hard when there are mutable data structures with tons of references around, and a C compiler is exactly that. To compose complexity from different layers is an LLM anti pattern that who worked a lot with automatic programming knows very well.

4. In the real world, you don't do a task like that without steering. And steering will do wonders. Not to say that the experiment was ill conceived. The fact is that the experimenter was trying to show a different point of what the Internet got (as usually).

vidarh|20 days ago

> the experimenter was trying to show a different point of what the Internet got (as usually)

All of your points are important, but I think this is the most important one.

Having written compilers, $20k in tokens to get to a foundation for a new compiler with the feature set of this one is a bargain. Now, the $20k excludes the time of to set up the harness, so the total cost would be significantly higher, but still.

The big point here is that the researchers in question demonstrated that a complex task such as this could be achived shockingly cheaply, even when the agents were intentionally forced to work under unrealistically harsh conditions, with instructions to include features (e.g. SSA form) that significantly complicated the task but made the problem closer to producing the foundation for a "proper" compiler rather than a toy compiler, even if the outcome isn't a finished production-ready multi-arch C-compiler.

mdavid626|21 days ago

Can someone explain to me, what’s the big deal about this? The AI model was trained on lots of code and spit out sonething similar than gcc. Why is this revolutionary?

unknown|21 days ago

[deleted]

measurablefunc|21 days ago

It's a marketing gimmick. Cursor did the same recently when they claimed to have created a working browsers but it was basically just a bunch of open source software glued together into something barely functional for a PR stunt.

Ar-Curunir|21 days ago

If someone told you 5 years ago that a computer generated a working C compiler, would you think it was a big deal or not?

zidoo|21 days ago

CCC was and is a marketing stunt for a new model launch. Impressive, but still suffers from the same 80:20 rule. These 20% are optimizations, and we all know where the devel in “let me write my own language”.

emporas|21 days ago

Vibe coding is entertainment. Nothing wrong about entertainment, but when totally clueless people connect to their bank account, or control their devices with vibe coded programs, someone will be entertained for sure.

Large language models and small language models are very strong for solving problems, when the problem is narrow enough.

They are above human average for solving almost any narrow problem, independent of time, but when time is a factor, let's say less than a minute, they are better than experts.

An OS kernel is exactly a problem, that everyone prefers to be solved as correct as possible, even if arriving at the solution takes longer.

The author mentions stability and correctness of CCC, these are properties of Rust and not of vibe coding. Still impressive feat of claude code though.

Ironically, if they populated the repo first with objects, functions and methods with just todo! bodies, be sure the architecture compiles and it is sane, and only then let the agent fill the bodies with implementations most features would work correctly.

I am writing a program to do exactly that for Rust, but even then, how the user/programmer would know beforehand how many architectural details to specify using todo!, to be sure that the problem the agent tries to solve is narrow enough? That's impossible to know! If the problem is not narrow enough, then the implementation is gonna be a mess.

cleandreams|21 days ago

The prospect of going the last mile to fix the remaining problems reminds me of the old joke:

"The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time."

travisgriggs|21 days ago

I’ve always heard/repeated it as: “The first 90% is easy, it’s the second 90% that gets you. No one’s willing to talk about the third 90%.”

IhateAI|21 days ago

Yeah, this is why I dont get the argument that LLMs are good for bootstrapping. Especially anything serious.

Sure these things can technically frontload a lot of work at the beginning of a project, but I would argue the design choices made at the beginning of a project set the tone for the entire project, and its best those be made with intention, not stochastic text extruders.

Lets be real these things are shortcut machines that appeal to people's laziness, and as with most shortcuts in life, they come with consequences.

Have fun with your "Think for me SaaS" im not going to let my brain atrophy to the point where my competency is 1:1 correlated to the quantity and quality or tokens I have access too.

torginus|20 days ago

My 2 cents: just like Cursor's browser, it seems the AI attempted a really ambitious technical design, generally matching the bells and whistles of a true industrial strength compiler, with SSA optimization passes etc.

However looking at the assembly, it's clear to me the opt passes do not work, an I suspect it contains large amounts of 'dead code' - where the AI decided to bypass non-functioning modules.

If a human expert were to write a compiler not necessarily designed to match GCC, but provide a really good balance of features to complexity, they'd be able to make something much simpler. There are some projects like this (QBE,MIR), which come with nice technical descriptions.

Likewise there was a post about a browser made by a single dude + AI, which was like 20k lines, and worked about as well as Cursor's claimed. It had like 10% of the features, but everything there worked reasonably well.

So while I don't want to make predictions, but it seems for now, the human-in-the-loop method of coding works much better (and cheaper!) than getting AI to generate a million lines of code on its own.

vidarh|20 days ago

> My 2 cents: just like Cursor's browser, it seems the AI attempted a really ambitious technical design, generally matching the bells and whistles of a true industrial strength compiler, with SSA optimization passes etc.

Per the article from the person who directed this, the user directed the AI to use SSA form.

> However looking at the assembly, it's clear to me the opt passes do not work, an I suspect it contains large amounts of 'dead code' - where the AI decided to bypass non-functioning modules.

That is quite possibly true, but presumably at least in part reflects the fact that it has been measured on completeness, not performance, and so that is where the compiler has spent time. That doesn't mean it'd necessarily be successful at adding optimisation passes, but we don't really know. I've done some experiments with this (a Ruby ahead-of-time compiler) and while Claude can do reasonably well with assembler now, it's by no means where it's strongest (it is, however, far better at operating gdb than I am...), but it can certainly do some of it.

> So while I don't want to make predictions, but it seems for now, the human-in-the-loop method of coding works much better (and cheaper!) than getting AI to generate a million lines of code on its own.

Yes, it absolutely is, but the point in both cases was to test the limits of what AI can do on their own, and you won't learn anything about that if you let a human intervene.

$20k in tokens to get to a surprisingly working compiler from agents working on their own is at a point where it is hard to assess how much money and time you'd save once considering the cleanup job you'd probably want to do on it before "taking delivery", but had you offered me $20k to write a working C-compiler with multiple backends that needed to be capable of compiling Linux, I'd have laughed at the funny joke.

But more importantly, even if you were prepared to pay me enough, delivering it as fast if writing it by hand would be a different matter. Now, if you factor in the time used to set up the harness, the calculation might be different.

But now that we know models can do this, efforts to make the harnesses easier to set up (for my personal projects, I'm experimenting with agents to automatically figure out suitable harnesses), and to make cleanup passes to review, simplify, and document, could well end up making projects like this far more viable very quickly (at the cost of more tokens, certainly, but even if you double that budget, this would be a bargain for many tasks).

I don't think we're anywhere near taking humans out of the loop for many things, but I do see us gradually moving up the abstraction levels, and caring less about the code at least at early stages and more about the harnesses, including acceptance tests and other quality gates.

rayiner|20 days ago

I don't understand how this isn't a bigger deal. Why are people are quibbling about how it isn't a particularly good C compiler. It seems earth shattering that an AI can write a C compiler in the first place.

Am I just old? "How did they fit those people into the television?!"

enum|20 days ago

Nice article. I believe the Claude C Compiler is an extraordinary research result.

The article is clear about its limitations. The code README opens by saying “don’t use this” which no research paper I know is honest enough to say.

As for hype, it’s less hyped than most university press releases. Of course since it’s Anthropic, it gets more attention than university press.

I think the people most excited are getting ahead of themselves. People who aren’t impressed should remember that there is no C compiler written in Rust for it to have memorized. But, this is going to open up a bunch of new and weird research directions like this blog post is beginning to do.

geraneum|20 days ago

This compiler experiment mirrors the recent work of Terence Tao and Google. The "recipe" is an LLM paired with an external evaluator (GCC) in a feedback loop.

By evaluating the objective (successful compilation) in a loop, the LLM effectively narrows the problem space. This is why the code compiles even when the broader logic remains unfinished/incorrect.

It’s a good example of how LLMs navigate complex, non-linear spaces by extracting optimal patterns from their training data. It’s amazing.

p.s. if you translate all this to marketing jargon, it’ll become “our LLM wrote a compiler by itself with a clean room setup”.

Edit: typo

enum|20 days ago

This is a conjecture: modern chips are optimized to make the output code style of GCC/Clang go fast. So, the compilers optimize for the chip, and the chip optimizes for the popular compilers.

worldsavior|21 days ago

Seeing that Claude can code a compiler doesn't help anyone if it's not coded efficiently, because getting it to be efficient is the hardest part, and it will be interesting seeing how long it will take to make it efficient. No one is gonna use some compiler that makes binaries run 700x longer.

I'm surprised that this wasn't possible before with just a bigger context size.

kachapopopow|21 days ago

They should have gone one step further and also optimized for query performance (without editing the source code).

I have cough AI generated an x86 to x86 compiler (takes x86 in, replaces arbitrary instructions with functions and spits x86 out), at first it was horrible, but letting it work for 2 more days it was actually close to only 50% to 60% slowdown when every memory read instruction was replaced.

Now that's when people should get scared. But it's also reasonable to assume that CCC will look closer to GCC at that point, maybe influenced by other compilers as well. Tell it to write an arm compiler and it will never succeed (probably, maybe can use an intermeriadry and shove it into LLVM and it'll work, but at that point it is no longer a "C" compiler).

chadcmulligan|21 days ago

> Someone got it working on Compiler Explorer and remarked that the assembly output “reminds me of the quality of an undergraduate’s compiler assignment”. Which, to be fair, is both harsh and not entirely wrong when you look at the register spilling patterns.

This is what I've noticed about most LLM generated code, its about the quality of an undergrad, and I think there's a good reason for this - most of the code its been trained on is of undergrad quality. Stack overflow questions, a lot of undergrad open source projects, there are some professional quality open source projects (eg SqlLite) but they are outweighed by the mass of other code. Also things like Sqllite don't compare to things like Oracle or Sql Server which are proprietary.

bsaul|21 days ago

One missing analysis, that IMHO is the most important right now, is : what is the quality of the generated code ?

Having LLM generates a first complete iteration of a C compiler in rust is super useful if the code is of good enough quality that it can be maintained and improved by humans (or other AIs). It is (almost) completely useless otherwise.

And that is the case for most of today's code generated by AIs. Most of it will still have to be maintained by humans, or at least a human will ultimately be responsible for it.

What i would like to see is whether that C compiler is a horrible mess of tangled spaghetti code with horrible naming. Or something with a clear structure, good naming, and sensible comments.

1718627440|20 days ago

> with a clear structure, good naming, and sensible comments.

Additionally there is the additional problem, that LLM comments often represent what the code would be supposed to do, not what it actually does. People write comments to point out what was weird during implementation and what they found out during testing the implementation. LLM comments seems more to reflect the information present before writing the implementation, i.e. the use it as an internal check list what to generate.

In my opinion deceiving comments are worse than no comments at all.

hulitu|18 days ago

> what is the quality of the generated code ?

It seems to run.

Testing will be implemented in another release.

Looking at Readme.md it downloads a particular kernel version with a particular busybox version and runs them in qemu.

A parody.

tcper|20 days ago

I curious, maybe AI learn too much code from human writed compilers. What if invent a fresh new language, and let AI write the compiler, if the compiler works well I think that is the true intelligent.

hulitu|18 days ago

> maybe AI learn too much code from human writed compilers

This was the aim. The reality is far away from it.

adornKey|20 days ago

I think AI will definitely help to get new compilers going. Maybe not the full product, yet. But it helps a lot to create all the working parts you need to get going. Taking lengthy specs and translating them into code is something AI does quite well - I asked it to give me a disassembler - and it did well. So, if you want to make a new compiler, you now don't have to read all the specs and details beforehand. Just let the AI mess with e.g. PE-Headers and only take care later if something in that area doesn't work.

jruz|20 days ago

Great article but you have to keep in mind that it was pure marketing, the real interesting question is to pass the same benchmark to CC an ask it to optimize in a loop, and see how long it takes for it to come up with something decent.

That’s the whole promise to reach AGI that it will be able to improve itself.

I think Anthropic ruined this by releasing it too early would have been way more fun to have seen a live website where you can see it iterating and the progress is making.

thewhitetulip|21 days ago

Correct me if I am wrong. But Claude has probably been trained on gcc, so why oh why doesn't it one shot a faster and better compiler?

shevy-java|21 days ago

> CCC compiled every single C source file in the Linux 6.9 kernel without a single compiler error (0 errors, 96 warnings). This is genuinely impressive for a compiler built entirely by an AI.

It would be interesting to compare the source code used by CCC to other projects. I have a slight suspicion that CCC stole a lot of code from other projects.

saati|21 days ago

It's less impressive when you realize CCC happily compiles invalid C without emitting any errors.

Chamix|21 days ago

You, know, it sure does add some additional perspective to the original Anthropic marketing materia... ahem, I mean article, to learn that the CCC compiled runtime for SQLite could potentially run up to 158,000 times slower than a GCC compiled one...

Nevertheless, the victories continue to be closer to home.

Fokamul|20 days ago

The time will come (and it's not far off) when LLM agents will be able to RE the program and re-implement it just by pointing to the program's directory.

We'll see how fun that will be for these big corporations.

For example: "Hey, Claude, re-implement Adobe Photoshop in Rust."

matt3210|21 days ago

Gcc and clang are part of the training set, the fact that it did as bad as it did is what’s shocking

marmakoide|20 days ago

There are lots of C compilers (LCC, TCC, SDCC, an army of hobby projects C compilers) available as open-source.

I am curious about what results would be for something like a lexer + parser + abstract machine code generator generation for a made up language

chvid|21 days ago

What does the smallest (simplest in terms of complexity / lines of code) C-compiler that can compile and run SQLite look like?

Perhaps that would be a more telling benchmark to evaluate the Claude compiler against.

measurablefunc|21 days ago

Not as simple as it could be but I doubt anyone will manage to beat Fabrice Bellard: https://www.bellard.org/tcc/

tmtvl|20 days ago

I don't know for certain that it can compile and run SQLite, but the smallest C compiler I know of is SectorC: <https://xorvoid.com/sectorc.html>

phplovesong|21 days ago

This is a good example of ALL AI slop. You get something barely working, and are faced with the next problem:

- Deal with legacy code from day one.

- Have mess of a codebase that is most likely 10-20x the amount of LOC compared to human code

- Have your program be really slow and filled with bugs and edge cases.

This is the battlefield for programmers. You either just build the damn thing or fix bugs for the next decade.

benob|21 days ago

Give me self hosting: LLM generates compiler which compiles LLM training and inference suite, which then generates compiler which...

iv11|21 days ago

I wonder how well an LLM would do for a new CPU architecture for which no C compiler exists yet, just assembler.

lelanthran|20 days ago

> I wonder how well an LLM would do for a new CPU architecture for which no C compiler exists yet, just assembler.

Quite well, possibly.

Look, I wasn't even aware of this until it popped up a few days ago on HN, I am not privy to the details of Anthropics engineers in general, or the specific engineer who curated this marathon multi-agent dev cycle, but I can tell you how anyone familiar with compilers or programming language development will proceed:

1. Vibe an IL (intermediate language) specification into existence (even if it is only held in RAM as structures/objects)

2. Vibe some utility functions for the IL (dump, search, etc)

3. Vibe a set of backends, that take IL as input and emit ISA (Instruction Set Architecture), with a set of tests for each target ISA

4. Vibe a front-end that takes C language input and outputs the IL, with a set of tests for each language construct.

(Everything from #2 onwards can be done in parallel)

I have no reason to believe that the engineer who vibe-coded CCC is anything other than competent and skillful, so lets assume he did at least the above (TBH, he probably did more)[1].

This means that CCC has, in its code, everything needed to vibe a never-before-seen ISA, given the ISA spec. It also means it has everything needed to support a new front-end language as long as it is similar enough to C (i.e. language constructs can map to the IL constructs).

So, this should be pretty easy to expand on, because I find it unlikely that the engineer who supervised/curated the process would be anything less than an expert.

The only flaw in my argument is that I am assuming that effort from CC was so large because it did the src -> IL -> ISA route. If my assumption is wrong, it might be well-nigh impossible to add support for a new ISA.

------------------------------

[1] When I agreed to a previous poster on a previous thread that I can recreate the functionality of CCC for $20k, these are the steps I would have followed, except I would not have LLM-generated anything.

skybrian|21 days ago

It might be interesting to feed this report in and see what the coding agent swarm can improve on.

matt3210|21 days ago

Does it work better for the intended purpose than their browser experiments? No… no it doesn’t

bambax|21 days ago

I had no idea that SQLite performance was in fact compiler-dependent. The more you know!

xigoi|20 days ago

The performance of any software is compiler-dependent.

ares623|21 days ago

Did Anthropic release the scaffolding, harnesses, prompts, etc. they used to build their compiler? That would be an even cooler flex to be able to go and say "Here, if you still doubt, run this and build your own! And show us what else you can build using these techniques."

bw86|21 days ago

That would still require someone else to burn 20000$ to try it themselves.

benob|21 days ago

Why don't LLMs directly generate machine code?

jpalomaki|20 days ago

Now that we have seen this can be done, the next question is how much effort it takes to improve it 1%. And then the next 1%. Can we make consistent improvements without spending more and more compute on each step.

beklein|21 days ago

Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?

I am pretty sure everybody agrees that this result is somewhere between slop code that barely works and the pinnacle of AI-assisted compiler technology. But discussions should not be held from the extreme points. Instead, I am looking for a realistic estimation from the HN community about where to place these results in a human context. Since I have no experience with compilers, I would welcome any of your opinions.

lelanthran|21 days ago

> Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?

I offered to do it, but without a deadline (I work f/time for money), only a cost estimation based on how many hours I think it should take me: https://news.ycombinator.com/item?id=46909310

The poster I responded to had claimed that it was not possible to produce a compiler capable of compiling a bootable Linux kernel within the $20k cost, nor for double that ($40k).

I offered to do it for $40k, but no takers. I initially offered to do it for $20k, but the poster kept evading, so I settled on asking for the amount he offered.

pertymcpert|20 days ago

The level of discourse I've seen on HN about this topic is really disappointing. People not reading the actual article in detail, just jumping to conclusions "it basically copied gcc" etc etc. Taking things out of context, or worse completely misrepresenting what the author of the article was trying to communicate.

We act so superior to LLMs but I'm very unimpressed with humanity at this stage.

saintfts|19 days ago

It is a very controversial topic imo. I get that claude devs wants to show that their llm is capable of such a tedious task as building a compiler... But Pro-LLM people don't really get the idea of LLM.

Disclaimer: I have a near-zero competence in compilers and compiler-building but i just want to summarize what's going on in my opinion.

It's the same thing if i was given millions of repos of already-built compiler and had an ability to only wield these parts together. Yeah, it TECHNICALLY will work, but what's the point of building on top of the garbage afterwards?

You'll definitely want to refactor it, and it will not really be a pleasant experience to begin with. You have to have a certain amount of dedication and knowledge to contribute to this compiler, which you don't have if you're a plain vibe-coder. The things that are most difficult part of c compilers (and basically any compiler whatsoever) are optimizations and portability. Will you be able to have these things in a full claude-generated repo? Who knows! Maybe you'll cause an irreversible damage to the system of the end user, no one knows! There are so many snippets of code in the world, and you can't just filter-out the malicious and stupid ones.

The thing is, LLM's are stupid. I partially agree with Richard Stallman's take on current AI state - these are not intelligence, more of a bullshit generators if improperly used. Well to think, humans are partially LLM's themselves, but we have much more than that. LLM can only be used as a tool to help developers. My bet - never in the future the LLMs will be able to supply 100% prod-ready code by themselves. They are just not capable of that, it's in their nature to mimic and not to think.

LLMs in education and fast information fetching are blessing. It's the best thing that's happened since the invention of search engines. But never in my life will i blindly copypaste some shell-script or code that i don't know is not harmful or the code snippet lacks hyperlink to the original snippet of the code.

Vibe-coders imo are guys that copy-pasted stuff from internet back in... well, anytime since 2000s. They just evolved into guys that blindly copy-paste average result of their requests given by more convenient search engines. Not that it's a bad evolution step, it's just pretty much the same thing, but maybe it's less harmful to copypasters themselves.

THE BAD THING in CCC's creation is that some non-technical people are degenerates that will take this repo and say "LOOK, A COMPILER BUILT BY AN AI. AI!!! IT'S LIKE... A REALLY TEDIOUS TASK TO BUILD A COMPILER YKNOW. AND IT WAS BUILT (wielded from others people's repos) BY AI WITH NO HUMAN INTERVENTION. AND IT WORKS!!!!". No, it kind of doesn't. It even lacks "--help" lol. With every update, every pull request there is no guarantee that it will not become such an unstable codebase that any of its future extensions will either fail or misbehave. AI is only an option when ruled by the one who knows their stuff. They'll look at the code and say - well, that part is crappy, we need to refactor it", or "hey, that snippet is pretty good, didn't know you can do it that simple".

LLMs are just a big dictionary that you can either use to expand your knowledge about certain things you're interested into or to just blindly look for stuff you urgently need to use it once. If you want to ask somebody polish if you can borrow their phone, you certainly can grab Polish language dictionary, go to the part with sentences and read aloud: "Czy może skorzystac z twojego telefonu?". Will it help you learn? Technically yes, realistically - absolutely not. These snippets are only useful if you know how to use them right, how to form something with meaning out of them.

Pro-LLM people are dumb. But so are the Anti-LLM peoples. And what i mean by that is not "WE NEED AI EVERYWHERE!", but to acknowledge AI as a tool, not the worker.

As post-scriptum i want to add one thing - Pro-LLM mindset is a lot worse than Anti-LLM. AI guys, don't you see that the Bubble has already grown and becomes bigger and bigger as we go on? AI integration as of today is a really dum and frightening process. When you want to debate with Pro-LLM folks, please, don't act all high and mighty, you're not really in the situation to forbid someone from using something, especially CEOs, ESPECIALLY CEOs. With this attitude you're only contributing to building a wall with echo chamber for vibe coders. Monkey (ceo) see AI is capable of building something - monkey fire an entire department to save money on development team. Is the end result worse? Yes. But does it really bother mister Monkey - no, for him it's his another win for company's profit. He will not hear your point of view if you won't prove him the opposite - and yet again, you cannot do this if you're gonna act like you he doesn't know shit in business. It's literally the same thing that's happened to tons of job positions prior in human history, but with one small change - now it's tech, and every businessman thinks they knows tech because they use technical devices (idk, his smarthone or pc). BUSINESS DEMANDS PROFIT RAISE - always has been. You're gonna stand for your right to only integrate with AI wisely, not pushing it everywhere, and it is really important that you know how to do it.

If you're capable of boosting yourself with a bit of AI - why not? Performance boost will bend the learning curve in your favor, you only gonna win from that. And when the bubble will pop, the demand for real workers who know their stuff and who know how to boost themselves with right tools will skyrocket. That is, my bet.

yatharthx|21 days ago

mehh

afro88|21 days ago

But gcc is part of it's training data so of course it spit out an autocomplete of a working compiler

/s

This is actually a nice case study in why agentic LLMs do kind of think. It's by no means the same code or compiler. It had to figure out lots and lots of problems along the way to get to the point of tests passing.

the_fall|21 days ago

> But gcc is part of it's training data so of course it spit out an autocomplete of a working compiler /s

Why the sarcasm tag? It is almost certainly trained on several compiler codebases, plus probably dozens of small "toy" C compilers created as hobby / school projects.

It's an interesting benchmark not because the LLM did something novel, but because it evidently stayed focused and maintained consistency long enough for a project of this complexity.

IhateAI|21 days ago

[deleted]

deyiao|21 days ago

Since Claude Code can browse the web, is it fair to think of it as “rewriting and simplifying a compiler originally written in C++ into Rust”?

MrPowerGamerBR|21 days ago

In the original post Anthropic did point out that Claude Code did not have access to the internet

unknown|21 days ago

[deleted]

360 comments