top | item 30860488

(no title)

beyang | 3 years ago

Sourcegraph CTO here. It's interesting to read about GitHub's approach and how it contrasts with the approach we've taken at Sourcegraph. One of the key tradeoffs the article highlights is GitHub's decision to take the "shallow-but-wide" approach to code navigation, which has enabled them to provide some level of code navigation for most open-source repositories on GitHub, but at the expense of precision/accuracy (i.e., the system can't necessarily differentiate between different symbols with the same name).

Sourcegraph decided early on to take the opposite approach, favoring precision and accuracy over supporting every public codebase. Part of the reason why is that we aren't a code host that hosts millions of open-source repositories, so we didn't feel the need to support all of those at once. Another big reason is we heard from our users and customers that code navigation accuracy was critical for exploring their private code and enabling them to stay in flow (inaccurate results would break the train of thought because you'd have to actively think about how to navigate to the referenced symbol). We actually built out a language-agnostic search-based code navigation, but increasingly user feedback has driven us to adopt a more precise model, based at first on our own protocol (https://srclib.org) and also the LSIF protocol open-sourced by Microsoft that now enables code navigation for many popular editor extensions.

This is not to say that GitHub's approach is wrong, but more to say that it's interesting how different goals and constraints have led to systems that are quite different despite tackling the same general problem. GitHub aiming to provide some level of navigation to every repository on GitHub, and Sourcegraph aiming to provide best-in-class navigation for private codebases and dependencies.

(Btw, hats off to the GitHub team for open-sourcing tree-sitter, a great library which we've incorporated into parts of our stack. We actually hosted the creator of tree-sitter, Max Brunsfeld, on our podcast awhile back and it was a really fun and insightful conversation if people are interested in hearing some of the backstory of tree-sitter: https://about.sourcegraph.com/podcast/max-brunsfeld.)

discuss

order

ushakov|3 years ago

you should also highlight that your solution has a significant downside: you have to generate the definitions yourself in order for them to be displayed in source graph

this means you’d have to setup a CI/CD to rebuild index each time you make changes to the code + host it on your own

what’s great in GitHub’s approach is they take that obligation away from customers

beliu|3 years ago

Actually that’s not quite right. We offer auto-indexing now for some languages with more on the way. If neither auto-indexing nor manual LSIF generation are an option, we fall back to our search-based code navigation, which, similar to GitHub’s implementation, trades off accuracy for zero configuration (GitHub’s approach is parser-based while we use a combo of parsing, search, and heuristics that eliminates the need for an index entirely in some cases). This is all documented here for those curious to learn more: https://docs.sourcegraph.com/code_intelligence.