While Emacs 29.1 comes with "treesitter" built-in, you still need to manually build and install any treesitter language plugin implementing the actual language specific parser. This can be fiddly and frustrating doing it yourself.
I had a quick success with using this convenience script: https://github.com/casouri/tree-sitter-module/. It provides fully-automated builds for the most popular languages (including typescript, c and c++).
2. Install "build-essentials" (providing a c/c++ compiler if you're on Linux).
3. run "./build typescript" from within the repo
4. Copy the resulting shared library from "dist/libtree-sitter-typescript.so" into your "~/.emacs.d/tree-sitter/".
5. Open a random typescript file and try "M-x typescript-ts-mode" which should not give you any error but instead nice syntax highlighting.
You might find there is a treesitter plugin for your language available and it is even supported by "tree-sitter-module" but there is still no major mode, yet. Happened to me for Perl 5.
Technically in Emacs 29.1 tree-sitter is still only an optional build option, which a given package maintainer may have 'built in' to your package. It isn't actually a default. If you build it from source you need to pass the --with-tree-sitter flag to ./configure. See:
What I read from this is that tree-sitter isn't considered quite ready by the Emacs maintainers, perhaps because of the restricted number of actual treesitter modes, or maybe because the treesitter support itself is not quite considered there yet?
I found this snippet in one of Mickey's earlier tree-sitter posts that works great. It does require searching through the tree-sitter repo to make sure your paths are correct:
Is there anything that returns a parse tree of an org document? A while ago I wrote some super hacky elisp to navigate around the structure of a giant org mode doc, but it was rickety and terrible and constantly breaking.
Part of this is surely that I don't know wtf I'm doing, but it seemed like there was not an underlying data structure held in memory that you could conveniently query / manipulate, but rather, most of the existing org functionality built some kind of structure each time you did an operation.
Would appreciate any pointers, code examples, tutorials that show how to effectively navigate / manipulate an org structure and have it reflected in the buffer, if there is such a thing.
In org-alert we use `org-map-entries` and a simple `org-alert--parse-entry` function for stripping out the details we're looking for. Depending on what you want, it's not exactly a data structure, but maybe it will help you get started!
While it doesn't properly understand the structure, you can move around pretty well with Imenu or (configured) org-goto. I assume it's also possible to make something for it so that it take nesting into consideration like it does for some programming languages. My org files are only a couple 1000 lines though, so don't know how they perform when it gets larger than that.
This is from the author of the excellent book Mastering Emacs.
I am very far from being knowledgeable about programming on the Emacs platform, but I am trying to learn. I grabbed the name M-x-AI.com a while back with the goal of integrating other people’s Emacs packages with some of my own hacks into a better AI dev work environment and writing a short book on it. I have been using Emacs since, I think, 1982. There are so many good new packages for integrating CoPilot, GPT-4, etc., as well as major Emacs platform improvements that are too many to list.
I'm not trying to bash Emacs or treesitter or anyone. But I find it mildly amusing that after so many decades, parsing and syntax highlighting aren't a perfectly solved problem, considering programming languages are the most used tools for developers.
Parsing of a correct program is a pretty "solved" problem.
But fast enough re-parsing of fragments and recovery from errors is a much more complex problem, that often doesn't have a single correct answer, and it's also a much newer problem in as much as syntax-highlighting is much newer feature, being preceded largely by "offline" pretty-printers with very different constraints.
The extent to which modern compilers try to parse past errors still varies greatly, with a whole lot not even trying to.
But just any recovering parser also does not mean the problem is solved. E.g. you've typed "foo". Now you type "(". It'd be very annoying if your editor now re-colors everything as an error, so you typically want some error recovery. But how soon? Do you assume the tokens immediate afterwards are par of what was a valid expression until you typed "foo", or are they a valid part of an argument list? And where do they end? Do you just delay re-parsing until the user has typed more? Or left the line? Sometimes that can help, sometimes it will just make things worse.
Parsing methods that work fine if you assume you can "reset" the parse at many different points which tend to constrain the area considered an error and so reducing the size of a typical re-parse will fail badly if you want stricter re-parsing that frequently may trigger reparsing most of a file, for example.
A lot of this is subjective, and picking the "right" way of handling it largely comes down to unpacking humans unstated preferences, and trying to reconcile competing and possibly contradictory preferences.
There will never be "perfectly" solved problems at that complexity-level. There are always changing requirements and space for improvement. Make it faster, add new features, use new hardware-abilities, follow the flavors of this decade, this is an eternally going game of catching up.
Well, there are more problems like that. You'd think diffing is a solved problem, and yet we still struggle with syntax-aware diffs (I use difftastic, which is great, but doesn't always work well, and is under constant development).
It's not really a solved problem in general. Most editors appear to use TextMate grammars which nobody likes. Otherwise you have to implement it using whatever custom setup your specific editor uses. It just happens that most languages have some poor soul who set this up already. Emacs is actually on the better side because tree-sitter is a much better setup for writing grammars.
I see some people say it's possible and use both together but I thought for the most part language servers offer the same set of features, and probably better? My current mental model for how to use them together is that the majority of the languages I quickly read I set up treesitter for speed. For languages I read extensively or write I set up a language server.
lsp (and lsp-mode) are mostly concerned with IDE functionality- go-to definition, show references, displaying project errors in real time without explicitly building, etc.
tree-sitter builds a syntax tree of your source code; its applications are things like syntax highlighting and structural navigation of your code.
there is some overlap in functionality, lsp has somewhat supported mechanisms for syntax highlighting iirc, but they are fairly orthogonal overall
I use both. In my experience, syntax highlighting with language servers is slower than with tree-sitter.
It stands to reason: a language server often does way more than just incremental parsing of the source code into a concrete syntax tree. By limiting itself to syntax, tree-sitter can be much faster.
I've been using it a bit but it still not on par with, well, vscode. It tends to be a bit slow on big files (say 10000+ lines) when you open type an fstring in python such as 'print(f"p={' once the open accolade is typed in, it can get noticeably slow.
[+] [-] G3rn0ti|2 years ago|reply
While Emacs 29.1 comes with "treesitter" built-in, you still need to manually build and install any treesitter language plugin implementing the actual language specific parser. This can be fiddly and frustrating doing it yourself.
I had a quick success with using this convenience script: https://github.com/casouri/tree-sitter-module/. It provides fully-automated builds for the most popular languages (including typescript, c and c++).
This is how it works for "typescript":
1. Clone the repository: https://github.com/casouri/tree-sitter-module/
2. Install "build-essentials" (providing a c/c++ compiler if you're on Linux).
3. run "./build typescript" from within the repo
4. Copy the resulting shared library from "dist/libtree-sitter-typescript.so" into your "~/.emacs.d/tree-sitter/".
5. Open a random typescript file and try "M-x typescript-ts-mode" which should not give you any error but instead nice syntax highlighting.
You might find there is a treesitter plugin for your language available and it is even supported by "tree-sitter-module" but there is still no major mode, yet. Happened to me for Perl 5.
[+] [-] nanna|2 years ago|reply
https://www.masteringemacs.org/article/how-to-get-started-tr...
What I read from this is that tree-sitter isn't considered quite ready by the Emacs maintainers, perhaps because of the restricted number of actual treesitter modes, or maybe because the treesitter support itself is not quite considered there yet?
[+] [-] treeblah|2 years ago|reply
[+] [-] yougane|2 years ago|reply
[+] [-] shanusmagnus|2 years ago|reply
Part of this is surely that I don't know wtf I'm doing, but it seemed like there was not an underlying data structure held in memory that you could conveniently query / manipulate, but rather, most of the existing org functionality built some kind of structure each time you did an operation.
Would appreciate any pointers, code examples, tutorials that show how to effectively navigate / manipulate an org structure and have it reflected in the buffer, if there is such a thing.
[+] [-] elviejo|2 years ago|reply
[+] [-] morelisp|2 years ago|reply
But even with this I found it pretty awful.
[+] [-] Brentward|2 years ago|reply
https://github.com/spegoraro/org-alert/blob/master/org-alert...
[+] [-] imaltont|2 years ago|reply
[+] [-] mark_l_watson|2 years ago|reply
I am very far from being knowledgeable about programming on the Emacs platform, but I am trying to learn. I grabbed the name M-x-AI.com a while back with the goal of integrating other people’s Emacs packages with some of my own hacks into a better AI dev work environment and writing a short book on it. I have been using Emacs since, I think, 1982. There are so many good new packages for integrating CoPilot, GPT-4, etc., as well as major Emacs platform improvements that are too many to list.
[+] [-] confounded|2 years ago|reply
If so, which modes and packages do you use?
[+] [-] raincole|2 years ago|reply
[+] [-] vidarh|2 years ago|reply
But fast enough re-parsing of fragments and recovery from errors is a much more complex problem, that often doesn't have a single correct answer, and it's also a much newer problem in as much as syntax-highlighting is much newer feature, being preceded largely by "offline" pretty-printers with very different constraints.
The extent to which modern compilers try to parse past errors still varies greatly, with a whole lot not even trying to.
But just any recovering parser also does not mean the problem is solved. E.g. you've typed "foo". Now you type "(". It'd be very annoying if your editor now re-colors everything as an error, so you typically want some error recovery. But how soon? Do you assume the tokens immediate afterwards are par of what was a valid expression until you typed "foo", or are they a valid part of an argument list? And where do they end? Do you just delay re-parsing until the user has typed more? Or left the line? Sometimes that can help, sometimes it will just make things worse.
Parsing methods that work fine if you assume you can "reset" the parse at many different points which tend to constrain the area considered an error and so reducing the size of a typical re-parse will fail badly if you want stricter re-parsing that frequently may trigger reparsing most of a file, for example.
A lot of this is subjective, and picking the "right" way of handling it largely comes down to unpacking humans unstated preferences, and trying to reconcile competing and possibly contradictory preferences.
[+] [-] mickeyp|2 years ago|reply
https://www.masteringemacs.org/article/tree-sitter-complicat...
And also why using LSP to furnish your editor with highlight markers is an inelegant solution for many languages.
[+] [-] troupo|2 years ago|reply
A good overview here: https://matklad.github.io/2022/04/25/why-lsp.html
[+] [-] petepete|2 years ago|reply
Now it's much, much easier providing there's a TreeSitter parser for your language.
I don't know of anything else that bridges the gap like this.
[+] [-] olau|2 years ago|reply
https://steve-yegge.blogspot.com/2008/03/js2-mode-new-javasc...
I guess comp. sci. people studying languages have been more interested in syntactically valid programs than the opposite.
[+] [-] kristopolous|2 years ago|reply
In editors this always seems extremely esoteric comparatively: I've tried doing it in a few.
I'm sure brilliant people find it easy, but I'm merely average on a good day.
I haven't tried extending any of these modern electron based editors, can anyone speak to that?
[+] [-] PurpleRamen|2 years ago|reply
[+] [-] mbork_pl|2 years ago|reply
[+] [-] hardwaregeek|2 years ago|reply
[+] [-] nerdponx|2 years ago|reply
[+] [-] thih9|2 years ago|reply
[+] [-] Difwif|2 years ago|reply
I see some people say it's possible and use both together but I thought for the most part language servers offer the same set of features, and probably better? My current mental model for how to use them together is that the majority of the languages I quickly read I set up treesitter for speed. For languages I read extensively or write I set up a language server.
[+] [-] BaculumMeumEst|2 years ago|reply
lsp (and lsp-mode) are mostly concerned with IDE functionality- go-to definition, show references, displaying project errors in real time without explicitly building, etc.
tree-sitter builds a syntax tree of your source code; its applications are things like syntax highlighting and structural navigation of your code.
there is some overlap in functionality, lsp has somewhat supported mechanisms for syntax highlighting iirc, but they are fairly orthogonal overall
so yes, it makes sense to use them together
[+] [-] nequo|2 years ago|reply
It stands to reason: a language server often does way more than just incremental parsing of the source code into a concrete syntax tree. By limiting itself to syntax, tree-sitter can be much faster.
[+] [-] wiz21c|2 years ago|reply
But well, I still love emacs :-)
[+] [-] jdblair|2 years ago|reply