top | item 46721487

(no title)

Fiveplus | 1 month ago

>It is possible to use the language server for syntax highlighting. I am not aware of any particularly strong reasons why one would want to (or not want to) do this.

Hmm, the strong reason could be latency and layout stability. Tree-sitter parses on the main thread (or a close worker) typically in sub-ms timeframes, ensuring that syntax coloring is synchronous with keystrokes. LSP semantic tokens are asynchronous by design. If you rely solely on LSP for highlighting, you introduce a flash of unstyled content or color-shifting artifacts every time you type, because the round-trip to the server (even a local one) and the subsequent re-tokenization takes longer than the frame budget.

The ideal hygiene could be something like -> tree-sitter provides the high-speed lexical coloring (keywords, punctuation, basic structure) instantly and LSP paints the semantic modifiers (interfaces vs classes, mutable vs const) asynchronously like 200ms later. Relying on LSP for the base layer makes the editor feel sluggish.

discuss

order

mickeyp|1 month ago

That's generally how it works in most editors that support both.

Tree-sitter has okay error correction, and that along with speed (as you mentioned) and its flexible query language makes it a winner for people to quickly iterate on a working parser but also obviously integration into an actual editor.

Oh, and some LSPs use tree-sitter to parse.

Metasyntactic|1 month ago

> Hmm, the strong reason could be latency and layout stability. Tree-sitter parses on the main thread (or a close worker) typically in sub-ms timeframes

One of the designers/architects of 'Roslyn' here, the semantic analysis engine that powers the C#/VB compilers, VS IDE experiences, and our LSP server.

Note: For roslyn, we aim for microsecond (not millisecond) parsing. Even for very large files, even if the initial parse is milliseconds, we have an incremental parser design (https://github.com/dotnet/roslyn/blob/main/docs/compilers/De...) that makes 99.99+% of edits happen in microseconds, while reusing 99.99+ of syntax nodes, while also producing an independent, immutable tree (thus ensuring no threading concerns sharing these trees out to concurrent consumers).

> you introduce a flash of unstyled content or color-shifting artifacts every time you type, because the round-trip to the server (even a local one) and the subsequent re-tokenization takes longer than the frame budget.

This would indicate a serious problem somewhere.

It's also no different than any sort of modern UI stack. A modern UI stack would never want external code coming in that could ever block it. So all, potentially unbounded, processing work will be happening off the UI thread, ensuring that that thread is always responsive.

Note that "because the round-trip to the server (even a local one)" is no different from round-tripping to a processing thread. Indeed, in Visual Studio that is how it works as we have no need to run our server in a separate process space. Instead, the LSP server itself for roslyn simply runs in-process in VS as a normal library. No different than any other component that might have previously been doing this work.

> Relying on LSP for the base layer makes the editor feel sluggish.

It really should not. Note: this does take some amount of smart work. For example, in roslyn's classification systems we have a cascading set of classifying threads. One that classifies lexically, one for syntax, one for semantics, and finally, one for embedded languages (imagine embedded regex/json, or even C# nested in c#). And, of course, these embedded languages have cascading classification as well :D

Note that this concept is used in other places in LSP as well. For example, our diagnostics server computes compiler-syntax, vs compiler-semantics, versus 3rd-party analyzers, separately.

The approach of all of this has several benefits. First, we can scale up with the capabilities of the machine. So if there are free cores, we can put them to work computing less relevant data concurrently. Second, as results are computed on some operation, it can be displayed to the user without having to wait for the rest to finish. Being fine-grained means the UI can appear crisp and responsive, while potentially slower operations take longer but eventually appear.

For example, compiler syntax diagnostics generally take microseconds. While 3rd-party analyzer diagnostics might take seconds. No point in stalling the former while waiting for the latter to run. LSP makes multi-plexing this stuff easy

teo_zero|1 month ago

> For roslyn, we aim for microsecond (not millisecond) parsing. Even for very large files, even if the initial parse is milliseconds, we have an incremental parser design [] that makes 99.99+% of edits happen in microseconds

I'm curious how you can make such statements involving absolute time values, without specifying what the minimum hardware requirements are.

I often write code on a 10-year-old Celeron, and I've opted for tree-sitter on the assumption that a language server would show unbearable latency, but I might have been wrong all this time. Do you claim your engine would give me sub-ms feedback on such hardware?