top | item 37494595

Let's write a treesitter major mode for Emacs

209 points| nanna | 2 years ago |masteringemacs.org | reply

84 comments

order
[+] G3rn0ti|2 years ago|reply
BTW:

While Emacs 29.1 comes with "treesitter" built-in, you still need to manually build and install any treesitter language plugin implementing the actual language specific parser. This can be fiddly and frustrating doing it yourself.

I had a quick success with using this convenience script: https://github.com/casouri/tree-sitter-module/. It provides fully-automated builds for the most popular languages (including typescript, c and c++).

This is how it works for "typescript":

1. Clone the repository: https://github.com/casouri/tree-sitter-module/

2. Install "build-essentials" (providing a c/c++ compiler if you're on Linux).

3. run "./build typescript" from within the repo

4. Copy the resulting shared library from "dist/libtree-sitter-typescript.so" into your "~/.emacs.d/tree-sitter/".

5. Open a random typescript file and try "M-x typescript-ts-mode" which should not give you any error but instead nice syntax highlighting.

You might find there is a treesitter plugin for your language available and it is even supported by "tree-sitter-module" but there is still no major mode, yet. Happened to me for Perl 5.

[+] nanna|2 years ago|reply
Technically in Emacs 29.1 tree-sitter is still only an optional build option, which a given package maintainer may have 'built in' to your package. It isn't actually a default. If you build it from source you need to pass the --with-tree-sitter flag to ./configure. See:

https://www.masteringemacs.org/article/how-to-get-started-tr...

What I read from this is that tree-sitter isn't considered quite ready by the Emacs maintainers, perhaps because of the restricted number of actual treesitter modes, or maybe because the treesitter support itself is not quite considered there yet?

[+] treeblah|2 years ago|reply
I found this snippet in one of Mickey's earlier tree-sitter posts that works great. It does require searching through the tree-sitter repo to make sure your paths are correct:

  (setq treesit-language-source-alist
      '((typescript "https://github.com/tree-sitter/tree-sitter-typescript" "master" "typescript/src")
        (tsx "https://github.com/tree-sitter/tree-sitter-typescript" "master" "tsx/src")))

  (mapc #'treesit-install-language-grammar (mapcar #'car treesit-language-source-alist))
[+] yougane|2 years ago|reply
Or you can just "M-x treesit-install-language-grammar" then follow the prompts.
[+] shanusmagnus|2 years ago|reply
Is there anything that returns a parse tree of an org document? A while ago I wrote some super hacky elisp to navigate around the structure of a giant org mode doc, but it was rickety and terrible and constantly breaking.

Part of this is surely that I don't know wtf I'm doing, but it seemed like there was not an underlying data structure held in memory that you could conveniently query / manipulate, but rather, most of the existing org functionality built some kind of structure each time you did an operation.

Would appreciate any pointers, code examples, tutorials that show how to effectively navigate / manipulate an org structure and have it reflected in the buffer, if there is such a thing.

[+] imaltont|2 years ago|reply
While it doesn't properly understand the structure, you can move around pretty well with Imenu or (configured) org-goto. I assume it's also possible to make something for it so that it take nesting into consideration like it does for some programming languages. My org files are only a couple 1000 lines though, so don't know how they perform when it gets larger than that.
[+] mark_l_watson|2 years ago|reply
This is from the author of the excellent book Mastering Emacs.

I am very far from being knowledgeable about programming on the Emacs platform, but I am trying to learn. I grabbed the name M-x-AI.com a while back with the goal of integrating other people’s Emacs packages with some of my own hacks into a better AI dev work environment and writing a short book on it. I have been using Emacs since, I think, 1982. There are so many good new packages for integrating CoPilot, GPT-4, etc., as well as major Emacs platform improvements that are too many to list.

[+] confounded|2 years ago|reply
Out of interest, do you use Emacs as an alternative to Jupyter for interactive work (examining plots etc.)?

If so, which modes and packages do you use?

[+] raincole|2 years ago|reply
I'm not trying to bash Emacs or treesitter or anyone. But I find it mildly amusing that after so many decades, parsing and syntax highlighting aren't a perfectly solved problem, considering programming languages are the most used tools for developers.
[+] vidarh|2 years ago|reply
Parsing of a correct program is a pretty "solved" problem.

But fast enough re-parsing of fragments and recovery from errors is a much more complex problem, that often doesn't have a single correct answer, and it's also a much newer problem in as much as syntax-highlighting is much newer feature, being preceded largely by "offline" pretty-printers with very different constraints.

The extent to which modern compilers try to parse past errors still varies greatly, with a whole lot not even trying to.

But just any recovering parser also does not mean the problem is solved. E.g. you've typed "foo". Now you type "(". It'd be very annoying if your editor now re-colors everything as an error, so you typically want some error recovery. But how soon? Do you assume the tokens immediate afterwards are par of what was a valid expression until you typed "foo", or are they a valid part of an argument list? And where do they end? Do you just delay re-parsing until the user has typed more? Or left the line? Sometimes that can help, sometimes it will just make things worse.

Parsing methods that work fine if you assume you can "reset" the parse at many different points which tend to constrain the area considered an error and so reducing the size of a typical re-parse will fail badly if you want stricter re-parsing that frequently may trigger reparsing most of a file, for example.

A lot of this is subjective, and picking the "right" way of handling it largely comes down to unpacking humans unstated preferences, and trying to reconcile competing and possibly contradictory preferences.

[+] petepete|2 years ago|reply
Until TreeSitter came along the effort to add support for new languages to your editor would be gargantuan.

Now it's much, much easier providing there's a TreeSitter parser for your language.

I don't know of anything else that bridges the gap like this.

[+] kristopolous|2 years ago|reply
The languages extension strategies for CSS pretifiers are usually pretty reasonable.

In editors this always seems extremely esoteric comparatively: I've tried doing it in a few.

I'm sure brilliant people find it easy, but I'm merely average on a good day.

I haven't tried extending any of these modern electron based editors, can anyone speak to that?

[+] PurpleRamen|2 years ago|reply
There will never be "perfectly" solved problems at that complexity-level. There are always changing requirements and space for improvement. Make it faster, add new features, use new hardware-abilities, follow the flavors of this decade, this is an eternally going game of catching up.
[+] mbork_pl|2 years ago|reply
Well, there are more problems like that. You'd think diffing is a solved problem, and yet we still struggle with syntax-aware diffs (I use difftastic, which is great, but doesn't always work well, and is under constant development).
[+] hardwaregeek|2 years ago|reply
It's not really a solved problem in general. Most editors appear to use TextMate grammars which nobody likes. Otherwise you have to implement it using whatever custom setup your specific editor uses. It just happens that most languages have some poor soul who set this up already. Emacs is actually on the better side because tree-sitter is a much better setup for writing grammars.
[+] nerdponx|2 years ago|reply
This is that. Tree Sitter has become one of the foundational advances that is allowing us to make progress on solving that problem.
[+] thih9|2 years ago|reply
What would you consider a perfectly solved problem in this case? I.e. how is current development experience bad and how it could be better?
[+] Difwif|2 years ago|reply
Is anyone using treesitter with lsp-mode?

I see some people say it's possible and use both together but I thought for the most part language servers offer the same set of features, and probably better? My current mental model for how to use them together is that the majority of the languages I quickly read I set up treesitter for speed. For languages I read extensively or write I set up a language server.

[+] BaculumMeumEst|2 years ago|reply
they are mostly used for different things.

lsp (and lsp-mode) are mostly concerned with IDE functionality- go-to definition, show references, displaying project errors in real time without explicitly building, etc.

tree-sitter builds a syntax tree of your source code; its applications are things like syntax highlighting and structural navigation of your code.

there is some overlap in functionality, lsp has somewhat supported mechanisms for syntax highlighting iirc, but they are fairly orthogonal overall

so yes, it makes sense to use them together

[+] nequo|2 years ago|reply
I use both. In my experience, syntax highlighting with language servers is slower than with tree-sitter.

It stands to reason: a language server often does way more than just incremental parsing of the source code into a concrete syntax tree. By limiting itself to syntax, tree-sitter can be much faster.

[+] wiz21c|2 years ago|reply
I've been using it a bit but it still not on par with, well, vscode. It tends to be a bit slow on big files (say 10000+ lines) when you open type an fstring in python such as 'print(f"p={' once the open accolade is typed in, it can get noticeably slow.

But well, I still love emacs :-)

[+] jdblair|2 years ago|reply
It's so hard to give up your custom environment and keyboard muscle memory! (25 year emacs user here)