top | item 42570300

(no title)

DylanSp | 1 year ago

It sounds like the indexing time/complexity is increased a lot by the amount of detailed data they're storing. They mention determining which `using` statement is used to resolve each symbol reference in C++ source, to enable dead code detection; that's going to require some sophisticated analysis.

discuss

menaerus|1 year ago

Correct, you need to build an AST representation of the code that you want to index. Essentially, it's a compiler frontend pass and which is why it takes so much longer than what ctags heuristics do. Now think millions of lines of code, multiple build configurations, the amount of RAM you need, etc. Multiple branches, or even smaller revisions/commits, is also a big computation problem.

That said, Glean seems to be reusing the indexer from LLVM/clang for C and C++.

> The C++ indexer ("the clang indexer") is a wrapper over clang. The clang indexer is a drop in replacement for the C++ compiler that emits Glean facts instead of code. The wrapper is linked against libclang and libllvm.

[1] https://glean.software/docs/indexer/cxx