top | item 28366415

(no title)

dons | 4 years ago

We use this to power things like find-references or jump-to-def, "symbol search" and autocomplete, or more complicated code queries and analysis (even across languages). Imagine rich LSPs without a local checkout, web-based code queries, or seeding fuzzers and static analyzers with entry points in code.

Our focus has been on very large scale, multi-language code indexing, and then low latency (e.g. hundreds of micros) query times, to drive highly interactive developer workflows.

discuss

gwbas1c|4 years ago

I'm really struggling to understand what Glean does, and why I would use it. Most important: Your landing page should quickly show what Glean does that a typical IDE (Visual Studio, Visual Studio Code, Eclipse, ect, does.)

Specifically, things like "Go to definition," and tab completion have been in industry-leading IDEs for at least 20 years.

What's novel about Glean? It seems like a lot of hoops to jump through when Visual Studio (and Visual Studio Code) can index a very large codebase in a few seconds. (And don't require a server and database to do it.)

Perhaps a 20-second video (no sound) showing what Glean does that other IDEs don't will help get the message across?

WastingMyTime89|4 years ago

> It seems like a lot of hoops to jump through when Visual Studio (and Visual Studio Code) can index a very large codebase in a few seconds.

I think you are not thinking large enough. An IDE absolutely can not index a very large codebase and allow users to make complex queries on it. Think multiple millions lines of code here. The use case is closer to "find me all the variables of this type or a type derived from it in all the projects at Facebook" than "go to this definition in the project I'm currently editing".

conradev|4 years ago

This makes a lot of sense to me through an efficiency lens.

Facebook could spend a lot of money to get engineers beefy workstations, and then have each of these workstations clone the same repository and build the same index locally.

Or, they could leverage the custom built servers in their data centers (which are already more energy-efficient than the laptops), build a single index of the repo, and serve queries on-demand from IDEs throughout the company.

I could also see an analytics angle to this if it could incorporate history and track engineering trends over time. In my experience, decision making in engineering around codebase maintenance is usually rooted in “experience” or “educating guessing” rather than identifying areas of high churn in the codebase or what not.

masukomi|4 years ago

100% same take.

I'd add that I didn't want to click "get started" because i didn't know if it was a thing i wanted, and then "get started" actually took me to documentation, which is not what i expect from a "get started" button. The Documentation had the presumption that i wanted to use it, and thus the implication that i knew wtf "it" was.

I don't care about its efficiency, or declarative language, or any of that when i still don't know what we're talking about.

n_jd|4 years ago

I don't know what Glean is used for, but here are some guesses for this kind of technology:

- find references / go to definition for web tools, like when reviewing pull requests

- multi-language refactoring, e.g. modifying C bindings

- building structural static analysis tools like coccinelle, or semgrep, but better

rad_gruchalski|4 years ago

Imagine that you pulled in all your dependencies in different languages in source + windows source and visual studio source. Now you want to click around that source. This is what this tool is for.

maccard|4 years ago

What size codebases do you have that a few seconds has visual studio fully indexing it? My experience with VS on large projects is that it takes however long the project takes to compile before it's usable, but many functions (go to definition) can occasionally hit a file that needs to be reparsed and can stall for minutes on end. I use Vs2019 on a 32 core workstation with 128GB ram, fwiw.

fnord77|4 years ago

“Go to definition” has been around even longer, since at least the early 90s

gravypod|4 years ago

I see you support Thrift and Buck. Would you also be interested in adding Proto and Bazel support? Being able to query the code based on the build graph (sort of) would be very cool.

mhitza|4 years ago

Briefly skimmed the docs and it noted that it doesn't store expressions from the parsed AST. That means it's mostly a symbol lookup system?

When doing large system refactoring searching by code patterns is the number one thing I'd like to have a tool for. For example being able to query for all for loops in a codebase that have a call to function X within their body.

progval|4 years ago

How would it perform for, say, 500TB of source code?

And what would be the disk and memory requirements for this? Could they be distributed across a handful of servers?

dmos62|4 years ago

I'd be surprised if this question could have an off hand answer. Doesn't sound like something that could have scalability predictable enough to do back of the envelope calculations on.

gricardo99|4 years ago

What on earth has this much source code? Every open source project ever?

zerr|4 years ago

Since this is HN, could you please share more technical/impl details, e.g. what makes it more scalable and faster in general and also compared to other similar engines?

soonnow|4 years ago

Does that mean you are using the shell or how is it used to enable these functionalities?

dons|4 years ago

Most clients hit the Glean server via the network (thrift/JSON) and then mostly via language bindings to the Glean query language, Angle. The shell is more for debugging/exploration.

Imagine an IDE plugin that queries Glean over the network for symbol information about the current file, then shows that on hover. That sort of thing.

the_duke|4 years ago

This is really cool.

Seems like there are only indexers for Flow and Hack though.

Will there be more indexers built by Facebook, or will it rely on community contributions?

simonmar|4 years ago

There will be more indexers: we have Python, C++/Objective C, Rust, Java and Haskell. It's just a case of getting them ready to open source. You can see the schemas for most of these already in the repo: https://github.com/facebookincubator/Glean/tree/main/glean/s...

dons|4 years ago

A bit of both I think.

pdpi|4 years ago

Been away from Fb for a few years. How does this relate to tbgs?

gaogao|4 years ago

Jump to def is nice when biggrepping a piece of code a la what you can do with codesearch, cs.android.com