I think the most interesting thing about this is how it demonstrates that a very particular kind of project is now massively more feasible: library porting projects that can be executed against implementation-independent tests.
The Servo html5ever Rust codebase uses them. Emil's JustHTML Python library used them too. Now my JavaScript version gets to tap into the same collection.
This meant that I could set a coding agent loose to crunch away on porting that Python code to JavaScript and have it keep going until that enormous existing test suite passed.
Sadly conformance test suites like html5lib-tests aren't that common... but they do exist elsewhere. I think it would be interesting to collect as many of those as possible.
The html5lib conformance tests when combined with the WHATWG specs are even more powerful! I managed to build a typed version of this in OCaml in a few hours ( https://anil.recoil.org/notes/aoah-2025-15 ) yesterday, but I also left an agent building a pure OCaml HTML5 _validator_ last night.
This run has (just in the last hour) combined the html5lib expect tests with https://github.com/validator/validator/tree/main/tests (which are a complex mix of Java RELAX NG stylesheets and code) in order to build a low-dependency pure OCaml HTML5 validator with types and modules.
This feels like formal verification in reverse: we're starting from a scattered set of facts (the expect tests) and iterating towards more structured specifications, using functional languages like OCaml/Haskell as convenient executable pitstops while driving towards proof reconstruction in something like Lean.
Was struggling yesterday with porting something (python->rust). LLM couldn't figure out what was wrong with rust one no matter how I came at it (even gave it wireshark traces). And being vibecoded I had no idea either. Eventually copied in python source into rust project asked it to compare...immediate success
Turns out they're quite good at that sort of pattern matching cross languages. Makes sense from a latent space perspective I guess
I’ve idly wondered about this sort of thing quite a bit. The next step would seem to be taking a project’s implementation dependent tests, converting them to an independent format and verifying them against the original project, then conducting the port.
This is amazing. Porting library from one language to one language are easy for LLMs, LLMs are tired-less and aware of coding syntax very well. What I like in machine learning benchmarks is that agents develop and test many solutions, and this search process is very human-alike. Yesterday, I was looking into MLE-Bench for benchamrking coding Agents on machine learning tasks from Kaggle https://github.com/openai/mle-bench There are many projects that provide agents which performance is simply incredible, they can solve several Kaggle competitions under 24 hours and be on medal place. I think this is already above human level. I was reading ML-Master article and they describe AI4AI where AI is used to create AI systems: https://arxiv.org/abs/2506.16499
This is one of the reasons I'm keeping tests to myself for a current project. Usually I release libraries as open source, but I've been rethinking that, as well.
I wonder if this makes AI models particularly well-suited to ML tasks, or at least ML implementation tasks, where you are given a target architecture and dataset and have to implement and train the given architecture on the given dataset. There are strong signals to the model, such as loss, which are essentially a slightly less restricted version of "tests".
If you're porting a library, you can use the original implementation as an 'oracle' for your tests. Which means you only need a way to write/generate inputs, then verify the output matches the original implementation.
It doesn't work for everything of course but it's a nice way to bug-for-bug compatible rewrites.
I see it as a learning or training tool for AI. The same way we use mock exams/tests, to verify our skill and knowledge absorption ans prepare for the real thing or career. This could one of many obstacles in an obstacle course which a coding AI would have to navigate in order to "graduate"
Few know that Firefox's HTML5 parser was originally written in Java, and only afterward semi-mechanically translated (pre-LLMs) to the dialect of C++ used in the Gecko codebase.
This blog post isn't really about HTML parsers, however. The JustHTML port described in this blog post was a worthwhile exercise as a demonstration on its own.
Even so, I suspect that for this particular application, it would have been more productive/valuable to port the Java codebase to TypeScript rather than using the already vibe coded JustHTML as a starting point. Most of the value of what is demonstrated by JustHTML's existence in either form comes from Stenström's initial work.
There are certainly dozens of better ways to do what I did here.
I picked JustHTML as a base because I really liked the API Emil had designed, and I also thought it would be darkly amusing to take his painstakingly (1,000+ commits, 2 months+ of work) constructed library and see if I could port it directly to Python in an evening, taking advantage of everything he had already figured out.
IANAL. In my opinion, porting code to a different language is still derivative work of the code you are porting it from. Whether done by hand or with an LLM. And in my opinion, the license of the original code still applies. Which means that not only should one link to the repo for the code that was ported, but also make sure to adhere to the terms to the license.
The MIT family of licenses state that the copyright notice and terms shall be included in all copies of the software.
Porting code to a different language is in my opinion not much different from forking a project and making changes to it, small or big.
I therefore think the right thing to do is to keep the original copyright notice and license file, and adding your additional copyright line to it.
So for example if the original project had an MIT license file that said
Copyright 2019 Suchandsuch
Permission is hereby granted and so on
You should keep all of that and add your copyright year and author name on the next line after the original line or lines of the authors of the repo you took the code from.
Surely for debugging and auditing it's always better to write libs in JavaScript? Also, given that much of TypeScripts utilty is for improving the developer experience- is it still as relevant for machine-generated code?
> Code is so cheap it’s practically free. Code that works continues to carry a cost, but that cost has plummeted now that coding agents can check their work as they go.
I personally think that even before LLMs, the cost of code wasn't necessarily the cost of typing out the characters in the right order, but having a human actually understand it to the extent that changes can be made. This continues to be true for the most part. You can vibe code your way into a lot of working code, but you'll inevitably hit a hairy bug or a real world context dependency that the LLM just cannot solve, and that is when you need a human to actually understand everything inside out and step in to fix the problem.
I wonder if we will trend towards a world where maintainability is just a waste of time and money, when you can just knock together a new flimsy thing quicker and cheaper than maintaining one thing over multiple iterations.
This is a namespacing test. The reason the tag is <svg title> is that the parser is handling the title tag as the svg version of it. SVG has other handling rules, so unless the parser knows that it won't work right. I would be interesting to run the tests against Chrome as well!
You are also looking at the test format of the tag, when serialized to HTML the svg prefixes will disappear.
> Does this library represent a legal violation of copyright of either the Rust library or the Python one? Even if this is legal, is it ethical to build a library in this way?
Currently, I am experimenting with two projects in Claude Code: a Rust/Python port of a Python repo which necessitates a full rewrite to get the desired performance/feature improvements, and a Rust/Python port of a JavaScript repo mostly because I refuse to install Node (the speed improvement is nice though).
In both of those cases, the source repos are permissively licensed (MIT), which I interpret as the developer intent as to how their code should used. It is in the spirit of open source to produce better code by iterating on existing code, as that's how the software ecosystem grows. That would be the case whether a human wrote the porting code or not. If Claude 4.5 Opus can produce better/faster code which has the same functionality and passes all the tests, that's a win for the ecosystem.
As courtesy and transparency, I will still link and reference the original project in addition to disclosing the Agent use, although those things aren't likely required and others may not do the same. That said, I'm definitely not using an agent to port any GPL-licensed code.
> As courtesy and transparency, I will still link and reference the original project in addition to disclosing the Agent use, although those things aren't likely required and others may not do the same. That said, I'm definitely not using an agent to port any GPL-licensed code.
IANAL but regardless of the license, you have to respect their copyright and it’s hard to argue that an LLM ported library is anything but a derivative work. You would still have to include the original copyright notices and retain the license (again IANAL).
That's about where I'm settled on this right now. I feel like authors who select the GPL have made a robust statement about their intent. It may be legal for me to copyright-launder their library (maybe using the trick where one LLM turns their code into a spec and another turns that spec into fresh code) but I wouldn't do that because it would subvert the spirit of the license.
The reason is that the post you link to is overly simplistic. The only reason why Simon's experiment works is because there is a pre-existing language agnostic testing framework of 9000 tests that the agent can hold itself accountable to. Additionally, there is a pre-existing API design that it can reuse/reappropriate.
These two preconditions don't generally apply to software projects. Most of the time there are vague, underspecified, frequently changing requirements, no test suite, and no API design.
If all projects came with 9000 pre-existing tests and fleshed-out API, then sure, the article you linked to could be correct. But that's not really the case.
Wild to ask, "Is it legal, ethical, responsible or even harmful to build in this way and publish it?" AFTER building and publishing it. Author made up his mind already, or doesn't actually care. Ethics and responsibility should guide one's actions, not just be engagement fodder after the fact.
If I thought this was clear-cut 100% unethical and irresponsible I wouldn't have done it. I think there's ample room for conversation about this. I'd like to help instigate that conversation.
I'm ready to take a risk to my own reputation in order to demonstrate that this kind of thing is possible. I think it's useful to help people understand that this kind of thing isn't just feasible now, it's somewhat terrifyingly easy.
> It took two initial prompts and a few tiny follow-ups. GPT-5.2 running in Codex CLI ran uninterrupted for several hours, burned through 1,464,295 input tokens, 97,122,176 cached input tokens and 625,563 output tokens and ended up producing 9,000 lines of fully tested JavaScript across 43 commits.
Using a random LLM cost calculator, this amounts to $28.31... pretty reasonable for functional output.
I am now confident that within 5-10 years (most/all?) junior & mid and many senior dev positions are going to drop out enormously.
People say this kind of thing a lot, but in reality the concept of "software engineer" will change and there will still be experience levels with different expectations
I think a big factor (of many probably) is there is a ~150x difference in bytes of source vs number of tests for them. I.e. I wonder what other projects are easy wins, which are hard ones, and which can be accomplished quickly with a certain approach.
It'd be really interesting if Simon gave a crack at the above and wrote about his findings in doing so. Or at least, I'd find it interesting :).
The oracle approach mentioned downthread is what makes this practical even without conformance test suites. Run the original, capture input/output pairs, use those as your tests. Property-based testing tools like Hypothesis can generate thousands of edge cases automatically.
For solo devs this changes the calculus entirely. Supporting multiple languages used to mean maintaining multiple codebases - now you can treat the original as canonical and regenerate ports as needed. The test suite becomes the actual artifact you maintain.
The biggest challenge an agent will face with tasks like these is the diminishing quality in relation to the size of the input, specifically I find input of above say 10k tokens dramatically reduced quality of generated output.
This specific case worked well, I suspect, since LLMs have a LOT of previous knowledge with HTML, and saw multiple impl and parsing of HTML in the training.
Thus I suspect that in real world attempts of similar projects and any non well domain will fail miserably.
While this example is explicitly asking for a port (thus a copy), I also find in general that LLM's default behavior is to spit out new code from their vast pre-trained encyclopedia, vs adding an import to some library that already serves that purpose.
I'm curious if this will implicitly drive a shift in the usage of packages / libraries broadly, and if others think this is a good or bad thing. Maybe it cuts down the surface of upstream supply-chain attacks?
The problem with translating between languages is that code that "looks the same and runs" are not equivalently idiomatic or "acceptable". It seems to turn into long files of if-statements, flags and checks and so on. This might be considered idiomatic enough in python, but not something you'd want to work with in functional or typed code.
That doesn't sound right to me. If it's a derivative work I can still assert copyright over the modifications I have made, but not over the original material.
Couple quick points from the read - cool, btw! It's not trivial that Simon poked the LLM to get something up and running and working ASAP - that's always been a good engineering behavior in my opinion - building on a working core - but I have found it's extra helpful/needed when it comes to LLM coding - this brings the compiler and tests "in the loop" for the LLM, and helps keep it on the rails - otherwise you may find you get 1,000s of lines of code that don't work or are just sort of a goose chase, or all gilding of lilies.
As is mentioned in the comments, I think the real story here is two fold - one, we're getting longer uninterrupted productive work out of frontier models - yay - and a formal test suite has just gotten vastly more useful in the last few months. I'd love to see more of these made.
It is enormously useful for the author to know that the code works, but my intuition is if you asked an agent to port files slowly, forming its own plan, making commits every feature, it would still get reasonably close, if not there.
Basically, I am guessing that this impressive output could have been achieved based on how good models are these days with large amounts of input tokens, without running the code against tests.
I think the reason this was an evening project for Simon is based on both the code and the tests and conjunction. Removing one of them would at least 10x the effort is my guess.
"If you can reduce a problem to a robust test suite you can set a coding agent loop loose on it with a high degree of confidence that it will eventually succeed"
I'm a bit sad about this; I'd rather have "had fun" doing the coding, and get AI to create the test cases, than vice versa.
> How much better would this library be if an expert team hand crafted it over the course of several months?
It's an interesting assumption that an expert team would build a better library. I'd change this question to: would an expert team build this library better?
> How much better would this library be if an expert team hand crafted it over the course of several months?
i think the fun conclusion would be: ideally no better, and no worse. that is the state you arrive it IFF you have complete tests and specs (including probably for performance). now a human team handcrafting would undoubtedly make important choices not clarified in specs, thereby extending the spec. i would argue that human chain of thought from deep involvement in building and using the thing is basically 100% of the value of human handcrafting, because otherwise yeah go nuts giving it to an agent.
I think specs + tests are the new source of truth, code is disposable and rebuildable. A well tested project is reliable both for humans and AI, a badly tested one is bad for both. When we don't test well I call it "vibe testing, or LGTM testing"
What would be incredible amusing would be re-implementing the java api in some other language using only the api documentation. The Supreme Court has ruled that is fair use, so what could possibly go wrong?
What was your prompt to get it to run the test suite and heal tests at every step? I didn’t see that mentioned in your write up. Also, any specific reason you went with Codex over Claude Code?
For me (original author of JustHTML), it was enough the put the instructions on how to run tests in the AGENTS.md. It knows enough about coding to run tests by itself.
Talking about "thieves" is very much going back to the idea that software is the same thing as physical things. When talking about software we have a very simple concept to guide us: the license.
The license of html5ever is MIT, meaning the original authors are OK that people do whatever they want with it. I've retained that license and given them acknowledgement (not required by the license) in the README. Simon has done the same, kept the license and given acknowledgement (not required) to me.
ChatGPT Plus with Codex CLI provides "45-225 local messages per 5 hour period".
The https://chatgpt.com/codex/settings/usage is pretty useless right now - it shows that I used "100%" on December 14th - the day I ran this experiment - which presumably matches that Codex stopped working at 6:30pm but then started again when the 5 hour window reset at 7:14pm.
Running this command:
npx @ccusage/codex@latest
Reports these numbers for December 14th along with a pricing estimate:
simonw|2 months ago
The big unlock here is https://github.com/html5lib/html5lib-tests - a collection of 9,000+ HTML5 parser tests that are their own independent file format, e.g. this one: https://github.com/html5lib/html5lib-tests/blob/master/tree-...
The Servo html5ever Rust codebase uses them. Emil's JustHTML Python library used them too. Now my JavaScript version gets to tap into the same collection.
This meant that I could set a coding agent loose to crunch away on porting that Python code to JavaScript and have it keep going until that enormous existing test suite passed.
Sadly conformance test suites like html5lib-tests aren't that common... but they do exist elsewhere. I think it would be interesting to collect as many of those as possible.
avsm|2 months ago
This run has (just in the last hour) combined the html5lib expect tests with https://github.com/validator/validator/tree/main/tests (which are a complex mix of Java RELAX NG stylesheets and code) in order to build a low-dependency pure OCaml HTML5 validator with types and modules.
This feels like formal verification in reverse: we're starting from a scattered set of facts (the expect tests) and iterating towards more structured specifications, using functional languages like OCaml/Haskell as convenient executable pitstops while driving towards proof reconstruction in something like Lean.
Havoc|2 months ago
Turns out they're quite good at that sort of pattern matching cross languages. Makes sense from a latent space perspective I guess
gwking|2 months ago
pplonski86|2 months ago
heavyset_go|2 months ago
aadishv|2 months ago
tracnar|2 months ago
It doesn't work for everything of course but it's a nice way to bug-for-bug compatible rewrites.
bzmrgonz|2 months ago
cies|2 months ago
Also: it may be interesting to port it to other languages too and see how they do.
JS and Py are but runtime-typed and very well "spoken" by LLMs. Other languages may require a lot more "work" (data types, etc.) to get the port done.
exclipy|2 months ago
cxr|2 months ago
This blog post isn't really about HTML parsers, however. The JustHTML port described in this blog post was a worthwhile exercise as a demonstration on its own.
Even so, I suspect that for this particular application, it would have been more productive/valuable to port the Java codebase to TypeScript rather than using the already vibe coded JustHTML as a starting point. Most of the value of what is demonstrated by JustHTML's existence in either form comes from Stenström's initial work.
simonw|2 months ago
Here's the relevant folder:
https://github.com/mozilla-firefox/firefox/tree/main/parser/...
And active commits to that javasrc folder - the last was in November: https://github.com/mozilla-firefox/firefox/commits/main/pars...simonw|2 months ago
I picked JustHTML as a base because I really liked the API Emil had designed, and I also thought it would be darkly amusing to take his painstakingly (1,000+ commits, 2 months+ of work) constructed library and see if I could port it directly to Python in an evening, taking advantage of everything he had already figured out.
QuantumNomad_|2 months ago
The MIT family of licenses state that the copyright notice and terms shall be included in all copies of the software.
Porting code to a different language is in my opinion not much different from forking a project and making changes to it, small or big.
I therefore think the right thing to do is to keep the original copyright notice and license file, and adding your additional copyright line to it.
So for example if the original project had an MIT license file that said
Copyright 2019 Suchandsuch
Permission is hereby granted and so on
You should keep all of that and add your copyright year and author name on the next line after the original line or lines of the authors of the repo you took the code from.
fergie|2 months ago
aster0id|2 months ago
I personally think that even before LLMs, the cost of code wasn't necessarily the cost of typing out the characters in the right order, but having a human actually understand it to the extent that changes can be made. This continues to be true for the most part. You can vibe code your way into a lot of working code, but you'll inevitably hit a hairy bug or a real world context dependency that the LLM just cannot solve, and that is when you need a human to actually understand everything inside out and step in to fix the problem.
monkpit|2 months ago
f311a|2 months ago
One of the tests:
It fails for selectolax: But you get this in Chrome and selectolax:EmilStenstrom|2 months ago
You are also looking at the test format of the tag, when serialized to HTML the svg prefixes will disappear.
minimaxir|2 months ago
> Does this library represent a legal violation of copyright of either the Rust library or the Python one? Even if this is legal, is it ethical to build a library in this way?
Currently, I am experimenting with two projects in Claude Code: a Rust/Python port of a Python repo which necessitates a full rewrite to get the desired performance/feature improvements, and a Rust/Python port of a JavaScript repo mostly because I refuse to install Node (the speed improvement is nice though).
In both of those cases, the source repos are permissively licensed (MIT), which I interpret as the developer intent as to how their code should used. It is in the spirit of open source to produce better code by iterating on existing code, as that's how the software ecosystem grows. That would be the case whether a human wrote the porting code or not. If Claude 4.5 Opus can produce better/faster code which has the same functionality and passes all the tests, that's a win for the ecosystem.
As courtesy and transparency, I will still link and reference the original project in addition to disclosing the Agent use, although those things aren't likely required and others may not do the same. That said, I'm definitely not using an agent to port any GPL-licensed code.
throwup238|2 months ago
IANAL but regardless of the license, you have to respect their copyright and it’s hard to argue that an LLM ported library is anything but a derivative work. You would still have to include the original copyright notices and retain the license (again IANAL).
simonw|2 months ago
seinecle|2 months ago
https://martinalderson.com/posts/has-the-cost-of-software-ju...
This last post was largely dismissed in the comments here on HN. Simon's experiment brings new ground for the argument.
akie|2 months ago
These two preconditions don't generally apply to software projects. Most of the time there are vague, underspecified, frequently changing requirements, no test suite, and no API design.
If all projects came with 9000 pre-existing tests and fleshed-out API, then sure, the article you linked to could be correct. But that's not really the case.
mirthturtle|2 months ago
simonw|2 months ago
I'm ready to take a risk to my own reputation in order to demonstrate that this kind of thing is possible. I think it's useful to help people understand that this kind of thing isn't just feasible now, it's somewhat terrifyingly easy.
ethanpil|2 months ago
I am now confident that within 5-10 years (most/all?) junior & mid and many senior dev positions are going to drop out enormously.
Source: https://www.llm-prices.com/#it=1464295&cit=97123000&ot=62556...
elcritch|2 months ago
However this changes the economics for languages with smaller ecosystems!
almostgotcaught|2 months ago
yes because this is what we do all day every day (port existing libraries from one language to another)....
like do y'all hear yourselves or what?
afro88|2 months ago
cjlm|2 months ago
[0] https://ammil.industries/the-port-i-couldnt-ship/
zamadatix|2 months ago
It'd be really interesting if Simon gave a crack at the above and wrote about his findings in doing so. Or at least, I'd find it interesting :).
jackfranklyn|2 months ago
For solo devs this changes the calculus entirely. Supporting multiple languages used to mean maintaining multiple codebases - now you can treat the original as canonical and regenerate ports as needed. The test suite becomes the actual artifact you maintain.
solvedd|2 months ago
fithisux|2 months ago
There are many OSe out there suffering from the same problem. Lack of drivers.
AI can change it.
leroman|2 months ago
This specific case worked well, I suspect, since LLMs have a LOT of previous knowledge with HTML, and saw multiple impl and parsing of HTML in the training.
Thus I suspect that in real world attempts of similar projects and any non well domain will fail miserably.
adastra22|2 months ago
No, seriously. If you break your task into bite sized chunks, do you really need more than that at a time? I rarely do.
mNovak|2 months ago
I'm curious if this will implicitly drive a shift in the usage of packages / libraries broadly, and if others think this is a good or bad thing. Maybe it cuts down the surface of upstream supply-chain attacks?
MangoToupe|2 months ago
The package import thing seems like a red herring
yobbo|2 months ago
tantalor|2 months ago
No, because it's a derivative work of the base library.
simonw|2 months ago
vessenes|2 months ago
As is mentioned in the comments, I think the real story here is two fold - one, we're getting longer uninterrupted productive work out of frontier models - yay - and a formal test suite has just gotten vastly more useful in the last few months. I'd love to see more of these made.
orange_puff|2 months ago
It is enormously useful for the author to know that the code works, but my intuition is if you asked an agent to port files slowly, forming its own plan, making commits every feature, it would still get reasonably close, if not there.
Basically, I am guessing that this impressive output could have been achieved based on how good models are these days with large amounts of input tokens, without running the code against tests.
EmilStenstrom|2 months ago
xarope|2 months ago
I'm a bit sad about this; I'd rather have "had fun" doing the coding, and get AI to create the test cases, than vice versa.
EmilStenstrom|2 months ago
Mystery-Machine|2 months ago
It's an interesting assumption that an expert team would build a better library. I'd change this question to: would an expert team build this library better?
swyx|2 months ago
i think the fun conclusion would be: ideally no better, and no worse. that is the state you arrive it IFF you have complete tests and specs (including probably for performance). now a human team handcrafting would undoubtedly make important choices not clarified in specs, thereby extending the spec. i would argue that human chain of thought from deep involvement in building and using the thing is basically 100% of the value of human handcrafting, because otherwise yeah go nuts giving it to an agent.
visarga|2 months ago
WhyOhWhyQ|2 months ago
^Claude still thinks it's 2024. This happens to me consistently.
sgc|2 months ago
febed|2 months ago
simonw|2 months ago
I used Codex for a few reasons:
1. Claude was down on Sunday when I kicked off tbis project
2. Claude Code is my daily driver and I didn't want to burn through my token allowance on an experiment
3. I wanted to see how well the new GPT-5.2 could handle a long running project
EmilStenstrom|2 months ago
p0w3n3d|2 months ago
dimava|2 months ago
> I was running this against my $20/month ChatGPT Plus account
rcaught|2 months ago
bgwalter|2 months ago
EmilStenstrom|2 months ago
The license of html5ever is MIT, meaning the original authors are OK that people do whatever they want with it. I've retained that license and given them acknowledgement (not required by the license) in the README. Simon has done the same, kept the license and given acknowledgement (not required) to me.
We're all good to go.
RobertoG|2 months ago
bambax|2 months ago
ulrischa|2 months ago
deanc|2 months ago
simonw|2 months ago
https://developers.openai.com/codex/pricing#what-are-the-usa...
ChatGPT Plus with Codex CLI provides "45-225 local messages per 5 hour period".
The https://chatgpt.com/codex/settings/usage is pretty useless right now - it shows that I used "100%" on December 14th - the day I ran this experiment - which presumably matches that Codex stopped working at 6:30pm but then started again when the 5 hour window reset at 7:14pm.
Running this command:
Reports these numbers for December 14th along with a pricing estimate: You can spend a lot of tokens on that $20/month plan!It's possible OpenAI are being generous right now because they see Claude Code as critical competition.
EmilStenstrom|2 months ago
pietz|2 months ago
teppic|2 months ago
StarterPro|2 months ago
simonw|2 months ago