top | item 46905771

(no title)

ndesaulniers | 24 days ago

I spent a good part of my career (nearly a decade) at Google working on getting Clang to build the linux kernel. https://clangbuiltlinux.github.io/

This LLM did it in (checks notes):

> Over nearly 2,000 Claude Code sessions and $20,000 in API costs

It may build, but does it boot (was also a significant and distinct next milestone)? (Also, will it blend?). Looks like yes!

> The 100,000-line compiler can build a bootable Linux 6.9 on x86, ARM, and RISC-V.

The next milestone is:

Is the generated code correct? The jury is still out on that one for production compilers. And then you have performance of generated code.

> The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.

Still a really cool project!

discuss

order

shakna|24 days ago

> Opus was unable to implement a 16-bit x86 code generator needed to boot into 16-bit real mode. While the compiler can output correct 16-bit x86 via the 66/67 opcode prefixes, the resulting compiled output is over 60kb, far exceeding the 32k code limit enforced by Linux. Instead, Claude simply cheats here and calls out to GCC for this phase

Does it really boot...?

ndesaulniers|24 days ago

> Does it really boot...?

They don't need 16b x86 support for the RISCV or ARM ports, so yes, but depends on what 'it' we're talking about here.

Also, FWIW, GCC doesn't directly assemble to machine code either; it shells out to GAS (GNU Assembler). This blog post calls it "GCC assembler and linker" but to be more precise the author should edit this to "GNU binutils assembler and linker." Even then GNU binutils contains two linkers (BFD and GOLD), or did they excise GOLD already (IIRC, there was some discussion a few years ago about it)?

TheCondor|24 days ago

The assembler seems like nearly the easiest part. Slurp arch manuals and knock it out, it’s fixed and complete.

brundolf|24 days ago

One thing people have pointed out is that well-specified (even if huge and tedious) projects are an ideal fit for AI, because the loop can be fully closed and it can test and verify the artifact by itself with certainty. Someone was saying they had it generate a rudimentary JS engine because the available test suite is so comprehensive

Not to invalidate this! But it's toward the "well-suited for AI" end of the spectrum

HarHarVeryFunny|23 days ago

Yes - the gcc "torture test suite" that is mentioned must have been one of the enablers for this.

It's notable that the article says Claude was unable to build a working assembler (& linker), which is nominally a much simpler task than building a compiler. I wonder if this was at least in part due to not having a test suite, although it seems one could be auto generated during bootstrapping with gas (GNU assembler) by creating gas-generated (asm, ELF) pairs as the necessary test suite.

It does beg the question of how they got the compiler to point of correctness of generating a valid C -> asm mapping, before tackling the issue of gcc compatibility, since the generated code apparently has no relation to what gcc generates. I wonder which compilers' source code Claude has been trained on, and how closely this compiler's code generation and attempted optimizations compares to those?

qarl|24 days ago

> Still a really cool project!

Yeah. This test sorta definitely proves that AI is legit. Despite the millions of people still insisting it's a hoax.

The fact that the optimizations aren't as good as the 40 year gcc project? Eh - I think people who focus on that are probably still in some serious denial.

PostOnce|24 days ago

It's amazing that it "works", but viability is another issue.

It cost $20,000 and it worked, but it's also totally possible to spend $20,000 and have Claude shit out a pile of nonsense. You won't know until you've finished spending the money whether it will fail or not. Anthropic doesn't sell a contract that says "We'll only bill you if it works" like you can get from a bunch of humans.

Do catastrophic bugs exist in that code? Who knows, it's 100,000 lines, it'll take a while to review.

On top of that, Anthropic is losing money on it.

All of those things combined, viability remains a serious question.

thesz|24 days ago

  > This test sorta definitely proves that AI is legit.
This is an "in distribution" test. There are a lot of C compilers out there, including ones with git history, implemented from scratch. "In distribution" tests do not test generalization.

The "out of distribution" test would be like "implement (self-bootstrapping, Linux kernel compatible) C compiler in J." J is different enough from C and I know of no such compiler.

LinXitoW|24 days ago

How does 20K to replicate code available in the thousands online (toy C compilers) prove anything? It requires a bunch of caveats about things that don't work, it requires a bunch of other tools to do stuff, and an experienced developer had to guide it pretty heavily to even get that lackluster result.

soperj|24 days ago

Only if we take them at their word. I remember thinking things were in a completely different state when Amazon had their shop and go stores, but then finding out it was 1000s of people in Pakistan just watching you via camera.

cardanome|23 days ago

If will write you an C compiler by hand for 19k and it will be better than what Claude made.

Writing a toy C compiler isn't that hard. Any decent programmer can write one in a few weeks or months. The optimizations are the actually interesting part and Claude fails hard at that.

kvemkon|24 days ago

> optimizations aren't as good as the 40 year gcc project

with all optimizations disabled:

> Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.

dwaite|22 days ago

It is legit - with some pretty severe caveats. I am pressed to come up with an example that has more formal specification, published source implementations, and public unit test coverage than a C compiler.

It is not feasible that someone will use AI to tackle genuinely new software and provide a tenth of the level of guide-rails Anthropic had for this project. They were able to keep the million monkeys on their million typewriters on an extremely short leash, and able to have it do the vast majority of iteration without human intervention.

byzantinegene|24 days ago

it costs $20,000 to reinvent the wheel, that it probably trained on. If that's your definition of legit, sure

wqaatwt|22 days ago

The full source of several compilers being in its training set is somewhat helpful though. It’s not exactly a novel problem and those optimizations and edge cases which it seemingly is struggling are the overwhelming majority of the work anyway.

Do we know it just didn’t shuffle gcc’s source code around a bit?

miohtama|24 days ago

GCC had 40 years headstart

qarl|23 days ago

[deleted]

ip26|24 days ago

I’m excited and waiting for the team that shows with $20k in credits they can substantially speed up the generated code by improving clang!

byzantinegene|24 days ago

i'm sorry but that will take another $20 billion in AI capex to train our latest SOTA model so that it will cost $20k to improve the code.

9rx|24 days ago

> I spent a good part of my career (nearly a decade) at Google working on getting Clang to build the linux kernel.

How much of that time was spent writing the tests that they found to use in this experiment? You (or someone like you) were a major contributor to this. All Opus had to do here was keep brute forcing a solution until the tests passed.

It is amazing that it is possible at all, but remains an impossibly without a heavy human hand. One could easily still spend a good part of their career reproducing this if they first had to rewrite all of the tests from scratch.

beambot|24 days ago

This is getting close to a Ken Thompson "Trusting Trust" era -- AI could soon embed itself into the compilers themselves.

bopbopbop7|24 days ago

A pay to use non-deterministic compiler. Sounds amazing, you should start.

ndesaulniers|24 days ago

We're already starting to see people experimenting with applying AI towards register allocation and inlining heuristics. I think that many fields within a compiler are still ripe for experimentation.

https://llvm.org/docs/MLGO.html

int_19h|24 days ago

What I want to know is when we get AI decompilers

Intuitively it feels like it should be a straightforward training setup - there's lots of code out there, so compile it with various compilers, flags etc and then use those pairs of source+binary to train the model.

jojobas|24 days ago

Sorry, clang 26.0 requires an Nvidia B200 to run.

greenavocado|24 days ago

Then i'll be left wondering why my program requires 512TB of RAM to open

andai|24 days ago

The asymmetry will be between the frontier AI's ability to create exploits vs find them.

dnautics|24 days ago

would be hard to miss gigantic kv cache matrix multiplications

iberator|24 days ago

Claude did not wrote it. you wrote it with PREVIOUS EXPERIENCE with 20.000 long commandshyellihg him exactly what to do.

Real usable AI would create it with simple: 'make c compilers c99 faster than GCC'.

AI usage should be banned in general. It takes jobs faster than creating new ones ..

arcanemachiner|24 days ago

That's actually pretty funny. They're patting it on the back for using, in all likelihood, some significant portions of code that they actually wrote, which was stolen from them without attribution so that it could be used as part of a very expensive parlour trick.

embedding-shape|24 days ago

> AI usage should be banned in general. It takes jobs faster than creating new ones ..

I don't have an strong opinion about that in either direction, but curious: Do you feel the same about everything, or is just about this specific technology? For example, should the nail gun have been forbidden if it was invented today, as one person with a nail gun could probably replace 3-4 people with normal "manual" hammers?

You feel the same about programmers who are automating others out of work without the use of AI too?

wiseowise|24 days ago

> It takes jobs faster than creating new ones ..

You think compiler engineer from Google gives a single shit about this?

They’ll automate millions out of career existence for their amusement while cashing out stock money and retiring early comfortably.

benterix|24 days ago

> It takes jobs faster than creating new ones ..

I have no problems with tech making some jobs obsolete, that's normal. The problem is, the job being done with the current generation of LLMs are, at least for now, mostly of inferior quality.

The tools themselves are quite useful as helpers in several domains if used wisely though.

7thpower|24 days ago

Businesses do not exist to create jobs; jobs are a byproduct.

unglaublich|24 days ago

Jobs are a means, not a goal.

MaskRay|24 days ago

I want to verify the claim that it builds the Linux kernel. It quickly runs into errors, but yeah, still pretty cool!

make O=/tmp/linux/x86 ARCH=x86_64 CC=/tmp/p/claudes-c-compiler/target/release/ccc -j30 defconfig all

``` /home/ray/Dev/linux/arch/x86/include/asm/preempt.h:44:184: error: expected ';' after expression before 'pto_tmp__' do { u32 pto_val__ = ((u32)(((unsigned long) ~0x80000000) & 0xffffffff)); if (0) { __typeof_unqual__((__preempt_count)) pto_tmp__; pto_tmp__ = (~0x80000000); (void)pto_tmp__; } asm ("and" "l " "%[val], " "%" "[var]" : [var] "+m" (((__preempt_count))) : [val] "ri" (pto_val__)); } while (0); ^~~~~~~~~ fix-it hint: insert ';' /home/ray/Dev/linux/arch/x86/include/asm/preempt.h:49:183: error: expected ';' after expression before 'pto_tmp__' do { u32 pto_val__ = ((u32)(((unsigned long) 0x80000000) & 0xffffffff)); if (0) { __typeof_unqual__((__preempt_count)) pto_tmp__; pto_tmp__ = (0x80000000); (void)pto_tmp__; } asm ("or" "l " "%[val], " "%" "[var]" : [var] "+m" (((__preempt_count))) : [val] "ri" (pto_val__)); } while (0); ^~~~~~~~~ fix-it hint: insert ';' /home/ray/Dev/linux/arch/x86/include/asm/preempt.h:61:212: error: expected ';' after expression before 'pao_tmp__' ```

silver_sun|24 days ago

They said it builds Linux 6.9, maybe you are trying to compile a newer version there?

the_jends|24 days ago

Being just a grunt engineer in a product firm I can't imagine being able to spend multiple years on one project. If it's something you're passionate about, that sounds like a dream!

ndesaulniers|23 days ago

This work originally wasn't my 100% project, it was my 20% project (or as I prefer to call it, 120% project).

I had to move teams twice before a third team was able to say: this work is valuable to us, please come work for us and focus just on that.

I had to organize multiple internal teams, then build an external community of contributors to collaborate on this shared common goal.

Having carte blanche to contribute to open source projects made this feasible at all; I can see that being a non-starter at many employers, sadly. Having low friction to change teams also helped a lot.

HarHarVeryFunny|23 days ago

> I spent a good part of my career (nearly a decade) at Google working on getting Clang to build the linux kernel

Did this come down to making Clang 100% gcc compatible (extensions, UDB, bugs and all), or were there any issues that might be considered as specific to the linux kernel?

Did you end up building a gcc compatability test suite as a part of this? Did the gcc project themselves have a regression/test suite that you were able to use as a starting point?

ndesaulniers|23 days ago

> extensions

Some were necessary (asm goto), some were not (nested functions, flexible array members not at the end of structs).

> UDB, bugs and all

Luckily, the kernel didn't intentionally rely on GCC specifics this way. Where it did unintentionally, we fixed the kernel sources properly with detailed commit messages explaining why.

> or were there any issues that might be considered as specific to the linux kernel?

Yes, https://github.com/ClangBuiltLinux/linux/issues is our issue tracker. We use tags extensively to mark if we triage the issue to be kernel-side vs toolchain-side.

> Did you end up building a gcc compatability test suite as a part of this?

No, but some tricky cases LLVM got wrong were distilled from kernel sources using either:

- creduce - cvise (my favorite) - bugpoint - llvm-reduce

and then added to LLVM's existing test suite. Many such tests were also simply manually written.

> Did the gcc project themselves have a regression/test suite that you were able to use as a starting point?

GCC and binutils have their own test suites. Folks in the LLVM community have worked on being able to test clang against GCC's test suite. I personally have never run GCC's test suite or looked at its sources.

TZubiri|24 days ago

>Is the generated code correct? The jury is still out on that one for production compilers. And then you have performance of generated code.

It's worth noting that this was developed by compiling Linux and running tests, so at least that is part of the training set and not the testing set.

But at least for linux, I'm guessing the tests are very robust and I'm guessing that will work correctly. That said, if any bugs pop up, it will show weak points in the linux tests.

VladVladikoff|24 days ago

>$20,000 of tokens. >less efficient than existing compilers

what is the ecological cost of producing this piece of software that nobody will ever use?

ryanjshaw|24 days ago

If you evaluate the cost/benefit in isolation? It’s net negative.

If you see this as part of a bigger picture to improve human industrial efficiency and bring us one step closer to the singularity? Most likely net positive.

thefounder|24 days ago

With that way of thinking you would just move in a cave.

grey-area|24 days ago

Isn't the AI basing what it does heavily on the publicly available source code for compilers in C though? Without that work it would not be able to generate this would it? Or in your opinion is it sufficiently different from the work people like you did to be classed as unique creation?

I'm curious on your take on the references the GAI might have used to create such a project and whether this matters.

zaphirplane|24 days ago

What were the challenges out of interest. Some of it is the use of gcc extensions? Which needed an equivalent and porting over to the equivalent

ndesaulniers|24 days ago

`asm goto` was the big one. The x86_64 maintainers broke the clang builds very intentionally just after we had gotten x86_64 building (with necessary patches upstreamed) by requiring compiler support for that GNU C extension. This was right around the time of meltdown+spectre, and the x86_64 maintainers didn't want to support fallbacks for older versions of GCC (and ToT Clang at the time) that lacked `asm goto` support for the initial fixes shipped under duress (embargo). `asm goto` requires plumbing throughout the compiler, and I've learned more about register allocation than I particularly care...

Fixing some UB in the kernel sources, lots of plumbing to the build system (particularly making it more hermetic).

Getting the rest of the LLVM binutils substitutes to work in place of GNU binutils was also challenging. Rewriting a fair amount of 32b ARM assembler to be "unified syntax" in the kernel. Linker bugs are hard to debug. Kernel boot failures are hard to debug (thank god for QEMU+gdb protocol). Lots of people worked on many different parts here, not just me.

Evangelism and convincing upstream kernel developers why clang support was worth anyones while.

https://github.com/ClangBuiltLinux/linux/issues for a good historical perspective. https://github.com/ClangBuiltLinux/linux/wiki/Talks,-Present... for talks on the subject. Keynoting LLVM conf was a personal highlight (https://www.youtube.com/watch?v=6l4DtR5exwo).

m463|23 days ago

> getting Clang to build the linux kernel.

wonder if clang source is part of its model :)

ur-whale|24 days ago

> This LLM did it

You do realize the LLM had access (via his training set) and "reused" (not as is, of course) your own work, right?

phillmv|24 days ago

i mean… your work also went into the training set, so it's not entirely surprising that it spat a version back out!

underdeserver|24 days ago

Anthropic's version is in Rust though, so at least a little different.

GaggiX|24 days ago

Clang is not written in Rust tho

jbjbjbjb|24 days ago

It’s cool but there’s a good chance it’s just copying someone else’s homework albeit in an elaborate round about way.

nomel|24 days ago

I would claim that LLMs desperately need proprietary code in their training, before we see any big gains in quality.

There's some incredible source available code out there. Statistically, I think there's a LOT more not so great source available code out there, because the majority of output of seasoned/high skill developers is proprietary.

To me, a surprising portion of Claude 4.5 output definitely looks like student homework answers, because I think that's closer to the mean of the code population.

wvenable|24 days ago

This is cool and actually demonstrates real utility. Using AI to take something that already exists and create it for a different library / framework / platform is cool. I'm sure there's a lot of training data in there for just this case.

But I wonder how it would fare given a language specification for a non-existent non-trivial language and build a compiler for that instead?

nlawalker|24 days ago

I see that as the point that all this is proving - most people, most of the time, are essentially reinventing the wheel at some scope and scale or another, so we’d all benefit from being able to find and copy each others’ homework more efficiently.

computerex|24 days ago

And the goal post shifts.

kreelman|24 days ago

..A small thing, but it won't compile the RISCV version of hello.c if the source isn't installed on the machine it's running on.

It is standing on the shoulders of giants (all of the compilers of the past, built into it's training data... and the recent learnings about getting these agents to break up tasks) to get itself going. Still fairly impressive.

On a side-quest, I wonder where Anthropic is getting there power from. The whole energy debacle in the US at the moment probably means it made some CO2 in the process. Would be hard to avoid?

tdemin|24 days ago

[deleted]

eek2121|24 days ago

Also: a large amount of folks seem to think Claude code is losing a ton of money. I have no idea where the final numbers land, however, if the $20,000 figure is accurate and based on some of the estimates I've seen, they could've hired 8 senior level developers at a quarter million a year for the same amount of money spent internally.

Granted, marketing sucks up far too much money for any startup, and again, we don't know the actual numbers in play, however, this is something to keep in mind. (The very same marketing that likely also wrote the blog post, FWIW).

willsmith72|24 days ago

this doesn't add up. the 20k is in API costs. people talk about CC losing money because it's way more efficient than the API. I.e. the same work with efficient use of CC might have cost ~$5k.

but regardless, hiring is difficult and high-end talent is limited. If the costs were anywhere close to equivalent, the agents are a no-brainer

GorbachevyChase|24 days ago

Even if the dollar cost for product created was the same, the flexibility of being able to spin a team up and down with an API call is a major advantage. That AI can write working code at all is still amazing to me.

bloaf|24 days ago

This thing was done in 2 weeks. In the orgs I've worked in, you'd be lucky to get HR approval to create a job posting within 2 weeks.