top | item 46725630

(no title)

C# Language Designer here, and one of the designers/architects of 'Roslyn', the semantic analysis engine that powers the C#/VB compilers, VS IDE experiences, and our LSP server.

The original post conflates some concepts worth separating. LSP and language servers operate at an IDE/Editor feature level, whereas tree-sitter is a particular technological choice for parsing text and producing a syntax tree. They serve different purposes but can work together.

What does a language server actually do? LSP defines features like:

  1. Finding references (`textDocument/references`)

  2. Go-to-definition (`textDocument/definition`)

  3. Syntax highlighting (`textDocument/semanticTokens/...`)

  4. Code completion, diagnostics, refactorings

A language server for language X could use tree-sitter internally to implement these features. But it can use whatever technologies it wants. LSP is protocol-level; tree-sitter is an implementation detail.

The article talks about tree-sitter avoiding the problem of "maintaining two parsers" (one for the compiler, one for the editor). This misunderstands how production compiler/IDE systems actually work. In Roslyn, we don't have two parsers. We have one parser that powers both the compiler and the IDE. Same code, same behavior, same error recovery. This works better, not worse. You want your IDE to understand code exactly the way the compiler does, not approximately.

The article highlights tree-sitter being "error-tolerant" and "incremental" as key advantages. These are real concerns. If you're starting from scratch with no existing language infrastructure, tree-sitter's error tolerance is valuable. But this isn't unique to tree-sitter. Production compiler parsers are already extremely error-tolerant because they have to be. People are typing invalid code 99% of the time in an editor.

Roslyn was designed from day one for IDE scenarios. We do incremental parsing (https://github.com/dotnet/roslyn/blob/main/docs/compilers/De...), but more importantly, we do incremental semantic analysis. When you change a file, we recompute semantic information for just the parts that changed, not the entire project. Tree-sitter gives you incremental parsing. That's good. But if you want rich IDE features, you need incremental semantics too.

The article suggests language servers are inherently "heavy" while tree-sitter is "lightweight." This isn't quite right. An LSP server is as heavy or light as you make it. If all you need is parsing and there's no existing language library, fine, use tree-sitter and build a minimal LSP server on top. But if you want to do more, LSP is designed for that. The protocol supports everything from basic syntax highlighting to complex refactorings.

Now, as to syntax highlighting. Despite the name, it isn't just syntactic in modern IDEs. In C#, we call this "classification," and it's powered by the full semantic model. A reference to a symbol is classified by what that symbol is: local, parameter, field, property, class, struct, type parameter, method, etc. Symbol attributes affect presentation. Static members are italicized, unused variables are faded, overwritten values are underlined. We classify based on runtime behavior: `async` methods, `const` fields, extension methods.

This requires deep semantic understanding. Binding symbols, resolving types, understanding scope and lifetime. Tree-sitter gives you a parse tree. That's it. It's excellent at what it does, but it's fundamentally a syntactic tool.

Example: in C#, `var x = GetValue();` is syntactically ambiguous. Is `var` a keyword or a type name? Only semantic analysis can tell you definitively. Tree-sitter would have to guess or mark it generically.

Tree-sitter is definitely a great technology though. Want to add basic syntax highlighting for a new language to your editor? Tree-sitter makes this trivial. Need structural editing or code folding? Perfect use case. However, for rich IDE experiences, the kind where clicking on a variable highlights all its uses, or where hovering shows documentation, or where renaming a method updates all call sites across your codebase, you need semantic analysis. That's a fundamentally different problem than parsing.

Tree-sitter definitely lowers the barrier to supporting new languages in editors. But it's not a replacement for language servers or semantic analysis engines. They're complementary technologies. For languages with mature compilers and semantic engines (C#, TypeScript, Rust, etc.), using the real compiler infrastructure for IDE features makes sense. For cases with simpler tooling needs, tree-sitter is an excellent foundation to build on.

discuss

sakesun|1 month ago

Notice some design docs are added since the beginning of this year.

conartist6|1 month ago

Why did Typescript abandon Roslyn's red green tree design?

Metasyntactic|1 month ago

I wrote several of typescript's initial compilers. We didn't use red/green for a few reasons:

• The js engines of the time were not efficient with that design. This was primarily testing v8 and chakra (IE/edge's prior engine).

• Red/green takes advantage of many things .net provides to be extremely efficient. For example structs. These are absent in js, making things much more costly. See the document on red-green trees I wrote here for more detail: https://github.com/dotnet/roslyn/blob/main/docs/compilers/De...

• The problem domains are a bit different. In Roslyn the design is a highly concurrent, multi-threaded feature-set that wants to share immutable data. Ts/JS being single threaded doesn't have the same concerns. So there is less need to efficiently create an immutable data structure. So having it be mutable meant working well with the engines of the time, with sacrificing too much.

• The ts parser is incremental, and operates very similarly to what I describe in for Roslyn in https://github.com/dotnet/roslyn/blob/main/docs/compilers/De.... However, because it operates on the equivalent of a red tree, it does need to do extra work to update positions and parent pointers.

Tldr, different engine performance and different consumption patterns pushed us to a different model.