(no title)
minusf | 2 months ago
isn't this more like a port of `html5ever` from rust to python using LLM, as opposed to creating something "new" based on the test suite alone?
if yes, wouldn't be the distinction rather important?
minusf | 2 months ago
isn't this more like a port of `html5ever` from rust to python using LLM, as opposed to creating something "new" based on the test suite alone?
if yes, wouldn't be the distinction rather important?
EmilStenstrom|2 months ago
The first iteration of the project created a library from scratch, from the tests all the way to 100% test coverage. So even without the second iteration, it's still possible to create something new.
In an attempt to speed it up, I (with coding agent) rewrote it again based on html5ever's code structure. It's far from a clean port, because it's heavily optimized Rust code, that isn't possible to port to Python (Rust marcos). And it still depended on a lot of iteration and rerunning tests to get it anywhere.
I'm not pushing any agenda here, you're free to take what you want from it!
simonw|2 months ago
It looks to me like this is the last commit before the rewrite: https://github.com/EmilStenstrom/justhtml/tree/989b70818874d...
The commit after that is https://github.com/EmilStenstrom/justhtml/commit/7bab3d2 "radical: replace legacy TurboHTML tree/handler stack with new tokenizer + treebuilder scaffold"
It also adds this document called html5ever_port_plan.md: https://github.com/EmilStenstrom/justhtml/blob/7bab3d22c0da0...
Here's the Codex CLI transcript I used to figure this out: https://gistpreview.github.io/?53202706d137c82dce87d729263df...
minusf|2 months ago
You also mention that the current "optimised" version is "good enough" for every-day use (I use `bs4` for working with html), was the first iteration also usable in that way? Did you look at `html5ever` because the LLM hit a wall trying to speed it up?