top | item 46725799

(no title)

> Hmm, the strong reason could be latency and layout stability. Tree-sitter parses on the main thread (or a close worker) typically in sub-ms timeframes

One of the designers/architects of 'Roslyn' here, the semantic analysis engine that powers the C#/VB compilers, VS IDE experiences, and our LSP server.

Note: For roslyn, we aim for microsecond (not millisecond) parsing. Even for very large files, even if the initial parse is milliseconds, we have an incremental parser design (https://github.com/dotnet/roslyn/blob/main/docs/compilers/De...) that makes 99.99+% of edits happen in microseconds, while reusing 99.99+ of syntax nodes, while also producing an independent, immutable tree (thus ensuring no threading concerns sharing these trees out to concurrent consumers).

> you introduce a flash of unstyled content or color-shifting artifacts every time you type, because the round-trip to the server (even a local one) and the subsequent re-tokenization takes longer than the frame budget.

This would indicate a serious problem somewhere.

It's also no different than any sort of modern UI stack. A modern UI stack would never want external code coming in that could ever block it. So all, potentially unbounded, processing work will be happening off the UI thread, ensuring that that thread is always responsive.

Note that "because the round-trip to the server (even a local one)" is no different from round-tripping to a processing thread. Indeed, in Visual Studio that is how it works as we have no need to run our server in a separate process space. Instead, the LSP server itself for roslyn simply runs in-process in VS as a normal library. No different than any other component that might have previously been doing this work.

> Relying on LSP for the base layer makes the editor feel sluggish.

It really should not. Note: this does take some amount of smart work. For example, in roslyn's classification systems we have a cascading set of classifying threads. One that classifies lexically, one for syntax, one for semantics, and finally, one for embedded languages (imagine embedded regex/json, or even C# nested in c#). And, of course, these embedded languages have cascading classification as well :D

Note that this concept is used in other places in LSP as well. For example, our diagnostics server computes compiler-syntax, vs compiler-semantics, versus 3rd-party analyzers, separately.

The approach of all of this has several benefits. First, we can scale up with the capabilities of the machine. So if there are free cores, we can put them to work computing less relevant data concurrently. Second, as results are computed on some operation, it can be displayed to the user without having to wait for the rest to finish. Being fine-grained means the UI can appear crisp and responsive, while potentially slower operations take longer but eventually appear.

For example, compiler syntax diagnostics generally take microseconds. While 3rd-party analyzer diagnostics might take seconds. No point in stalling the former while waiting for the latter to run. LSP makes multi-plexing this stuff easy

discuss

teo_zero|1 month ago

> For roslyn, we aim for microsecond (not millisecond) parsing. Even for very large files, even if the initial parse is milliseconds, we have an incremental parser design [] that makes 99.99+% of edits happen in microseconds

I'm curious how you can make such statements involving absolute time values, without specifying what the minimum hardware requirements are.

I often write code on a 10-year-old Celeron, and I've opted for tree-sitter on the assumption that a language server would show unbearable latency, but I might have been wrong all this time. Do you claim your engine would give me sub-ms feedback on such hardware?

Metasyntactic|1 month ago

> I'm curious how you can make such statements involving absolute time values, without specifying what the minimum hardware requirements are.

That's a very fair point. In this case. I'm using the minimum requirements for visual studio

> Do you claim your engine would give me sub-ms feedback on such hardware?

I would expect yes, for nearly all edits. See the links I've provided in this discussion to our incremental parsing architecture.

Briefly, you can expect an edit to only cause a small handful of allocations. And the parser will be able to reuse almost the entirety of the other tree, skipping over vast swaths of it (before and after the edit) trivially.

Say you have a 100 types, each with a 100 members, each with 100 statements. An edit to a statement will trivially blow through 99 of the types, reusing them. Then in the type surrounding the edited statement, it will reuse 99 members. Then in the edited member, it will reuse 99 statements and just reparse the one affected one.

So basically it's just the computer walking 297 nodes (absolutely cheap on any machine), and reparsing a statement (also cheap).

So this should still be microseconds.

Now. That relates to parsing. But you did say: would give me sub-ms feedback on such hardware?

So it depends on what you mean by feedback. I don't make any claims here about layers higher up and how they operate. But I can make real, measured, claims about incremental parsing performance.

unknown|1 month ago

[deleted]