top | item 46343194

(no title)

lifis | 2 months ago

Quickly looking at the source code, mostly treeBuilder and tokenizer, I do see several possible improvements: - Use Typescript instead of JavaScript - Use perfect hashes instead of ["a', "b", "c"].includes() idioms, string equalities, Seys, etc. - Use a single perfect hash to match all tags/attribute names and then use enums in the rest of the codebase - Use a single if (token.kind === Tag.START instead of repeating that for 10 consecutive conditionals - Don't return the "reprocess" constant, but use an enum or perhaps nothing if "reprocess" is the only option - Try tail recursion instead of a switch over the state in the tokenizer - Use switches (best after a perfect hash lookup) instead of multiple ifs on characters in the tokenizer - "treeBuilder.openElements = treeBuilder.open_elements;" can't possibly be good code

Perhaps the agent can find these themselves if told to make the code perfect and not just pass tests

discuss

simonw|2 months ago

Thanks for the feedback - I pasted it into a Claude Code session on my phone, here's the resulting PR: https://github.com/simonw/justjshtml/pull/7

I didn't include the TypeScript bit though - it didn't use TypeScript because I don't like adding a build step to my JavaScript projects if I can possible avoid it. The agent would happily have used TypeScript if I had let it.

I don't like that openElements = open_elements pattern either - it did that because I asked it for a port of a Python library and it decided to support the naming conventions for both Python and JavaScript at once. I told it to remove all of those.

I had it run a micro benchmark too against the before and after - here's the code it used for that: https://github.com/simonw/justjshtml/blob/a9dbe2d7c79522a76f...

  BEFORE benchmark:
  Input: 87,707 bytes
  Average: 7.846 ms
  Ops/sec: 127.5

After applying your suggestions:

  AFTER: 
  Average: 7.769ms
  Ops/sec: 128.7  (1% improvement)

It pushed back against the tail recursion suggestion:

> The current implementation uses a switch statement in step(). JavaScript doesn’t have proper tail call optimization (only Safari implements it), so true tail recursion would cause stack overflow on large documents.