top | item 47155597

Show HN: I ported Tree-sitter to Go

222 points| odvcencio | 4 days ago |github.com

This started as a hard requirement for my TUI-based editor application, it ended up going in a few different directions.

A suite of tools that help with semantic code entities: https://github.com/odvcencio/gts-suite

A next-gen version control system called Got: https://github.com/odvcencio/got

I think this has some pretty big potential! I think there's many classes of application (particularly legacy architecture) that can benefit from these kinds of analysis tooling. My next post will be about composing all these together, an exciting project I call GotHub. Thanks!

106 comments

order

sluongng|4 days ago

Oh this is really neat for the Bazel community, as depending on tree-sitter to build a gazelle language extension, with Gazelle written in Go, requires you to use CGO.

Now perhaps we can get rid of the CGO dependency and make it pure Go instead. I have pinged some folks to take a look at it.

dilyevsky|4 days ago

would also be nice to have this support gopackagesdriver backend

odvcencio|4 days ago

thanks so much for the note! i really appreciate it. i built this precisely for folks like yourself with this specific pain, thanks again!

up2isomorphism|4 days ago

"rewrite" a nice code base without mentioning it is vibe coded is not great.

Essentially you use AI to somehow re-implement the original code base in a different language, made it somehow work, and claim it is xx times faster. It is misleading.

odvcencio|4 days ago

i really appreciated this comment the most because of how much work "somehow" is doing here

silverwind|3 days ago

I agree it's misleading. Ideally the author would disclose how much of the tree-sitter test suite passes.

shayief|4 days ago

This is great, I was looking for something like this, thanks for making this!

I imagine this can very useful for Go-based forges that need syntax highlighting (i.e. Gitea, Forgejo).

I have a strict no-cgo requirement, so I might use it in my project, which is Git+JJ forge https://gitncoffee.com.

odvcencio|4 days ago

thank you for the kind words! Very cool project! Very happy you can find some utility in it

silverwind|3 days ago

Gitea is definitely watching this one. Initialy tests show a 20x increase in syntax highlighting speed compared to the previous regexp-based approach.

3rly|4 days ago

Wouldn't `got` be confused with OpenBSD's Got: https://gameoftrees.org/index.html

Thaxll|4 days ago

Why would people be confused with something that the vast majority never heard of. Naming shouldn't care about none mainstream project.

odvcencio|4 days ago

oh wow! i really thought i was being too clever but i shouldve assumed nothing new under the sun. well im taking name suggestions now!

nnx|4 days ago

This looks very interesting, but I wonder how's the rewrite approach gonna impact the long-term maintenance and porting changes _back_ from Tree Sitter.

As you mention WASM-readiness, did you consider using the official Tree Sitter WASM builds nicely packaged with wazero (pure Go WASM runtime) ?

It may help staying sync with upstream for the long term and, while probably a bit slower, has nice security and GC advantages too.

trickypr|4 days ago

Do you have an equivalent of TreeCursors or tree-sitter-generate?

There are at least some use cases where neither queries nor walks are suitable. And I have run into cases where being able to regenerate and compile grammars on the fly is immeasurably helpful.

At least for my use cases, this would be unusable.

Also, what the hell is this:

> partial [..] missing external scanner

Why do you have a parsing mode that guarantees incorrect outputs on some grammars (html comes to mind) and then use it as your “90x faster” benchmark figure?

odvcencio|4 days ago

the 90x figure is on Go source for apples to apples against CGO bound tree-sitter.

your use case is not one i designed for although yeah maybe the readme has some sections too close. the only external scanner missing atm is norg. now that i know your use case i can probably think of a way to close it

CodeCompost|3 days ago

I've seen a lot of "I've ported X to Go/Rust" posts lately. Is it the expectation that we're all supposed to abandon the original projects in favor of the ported versions which use newer and shinier programming languages? Is development going to continue on those new "Go/Rust" ports or are they just one-off karma farming projects?

acedTrex|4 days ago

Claude attempted a treesitter to go port

Better title

AlexeyBelov|2 days ago

Thank you, I flagged the submission. I doubt this project will have activity in 6 months, but I'd love to be proven wrong.

thebackup|4 days ago

This was my first thought as well, just from reading the title.

odvcencio|4 days ago

well how did it do?

red_hare|4 days ago

How is OP using Claude relevant?

gritzko|4 days ago

I work on a revision control system project, except merge is CRDT. On Feb 22 there was a server break-in (I did not keep unencrypted sources on the client, server login was YubiKey only, but that is not 100% guarantee). I reported break-in to my Telegram channel that day.

My design docs https://replicated.wiki/blog/partII.html

I used tree-sitter for coarse AST. Some key parts were missing from the server as well, because I expected problems (had lots of adventures in East Asia, evil maids, various other incidents on a regular basis).

When I saw "tree-sitter in go" title, I was very glad at first. Solves some problems for me. Then I saw the full picture.

gritzko|4 days ago

That is very very interesting. I work on a similar project https://replicated.wiki/blog/partII.html

I use CRDT merge though, cause 3-way metadata-less merges only provide very incremental improvements over e.g. git+mergiraf.

How do you see got's main improvement over git?

odvcencio|4 days ago

primarily, got is structural VCS intended for concurrent edits of the same file.

it does this via gotreesitter and gts-suite abstractions that enable it to: - have entity-aware diffs - not line by line but function by function - structural blame - attribution resolution for the lifetime of the entity - semver from structure - it can recommend bumps because it knows what is breaking change vs minor vs patch - entity history - because entities are tracked independently, file renames or moves dont affect the entity's history

when gotreesitter cant parse a language, the 3way text merge happens as a fallback. what the structural merge enables is no conflicts unless same entity has conflicting changes

monster_truck|3 days ago

Was excited to try this in my project but it doesn't seem like it's truly a complete port.

conartist6|4 days ago

It looks like porting the custom C lexers is a big part of the trouble you had to go to do this.

odvcencio|4 days ago

yes basically about 70% of the engineering effort was spent porting the external scanners and ensuring parity with original (C) tree-sitter

jbreckmckye|4 days ago

Interesting. I have a similar usecase but intended to use CGo tree-sitter with Zig

Are these pretty up-to-date grammars? I'm awfully tempted to switch to your project

How large are your binaries getting? I was concerned about the size of some of the grammars

odvcencio|4 days ago

206 binary blobs = 15MB, so not crazy but i built for this use case where you can declare the registry of languages you want to load and not have to own all the grammar binaries by default

kopirgan|4 days ago

Can someone please explain what's the connection between this and LSP? For example in Helix can one use this instead of various language servers?

mojifwisi|4 days ago

Tree-sitter is merely a tool for generating an AST for a given language. LSPs on the other hand have way more capabilities (formatting, diagnostics, project-wise go to definition, inlay hints, documentation on hover, etc.) as you can see in its specification.[0] They can't really replace one another.

[0]: https://microsoft.github.io/language-server-protocol/specifi...

irishcoffee|4 days ago

Is it a go-ism that source for implementation and test code lives in the root of the repo or is this an LLM thing?

odvcencio|4 days ago

yeah the tests live with the implementation code always (Go thing) and the repo root thing is like a preference, main is an acceptable package to put stuff in (Go thing), i see this a lot with smaller projects or library type projects

skybrian|4 days ago

How about making 'got' compatible with git repos like jujutsu? It would be a lot easier to try out.

odvcencio|4 days ago

it is interoperable with git. we like git when its good but attempted to ease the pains in UX somewhat. you can take advantage of got locally but still push it to git remote forges jsut the same. when you pull stuff in this way, got will load the entity history into the git repo ensuring that you can still do got stuff locally (inspect entity histories, etc)

brodouevencode|3 days ago

Neat, but it really bothers me when projects don't use standard layouts.