Launch HN: Bloop (YC S21) – Code Search with GPT-4
Traditional code search tools match the terms in your query against the codebase, but often you don’t know the right terms to start with, e.g. ‘Which library do we use for model inference?’ (These types of questions are particularly common when you’re learning a new codebase.) bloop uses a combination of neural semantic code search (comparing the meaning - encoded in vector representations - of queries and code snippets) and chained LLM calls to retrieve and reason about abstract queries.
Ideally, a LLM could answer questions about your code directly, but there is significant overhead (and expense) in fine-tuning the largest LLMs on private data. And although they’re increasing, prompt sizes are still a long way off being able to fit a whole organisation’s codebase.
We get around these limitations with a two-step process. First, we use GPT-4 to generate a keyword query which is passed to a semantic search engine. This embeds the query and compares it to chunks of code in vector space (we use Qdrant as our vector DB). We’ve found that using a semantic search engine for retrieval improves recall, allowing the LLM to retrieve code that doesn’t have any textual overlap with the query but is still relevant. Second, the retrieved code snippets are ranked and inserted into a final LLM prompt. We pass this to GPT-4 and its phenomenal understanding of code does the rest.
Let’s work through an example. You start off by asking ‘Where is the query parsing logic?’ and then want to find out ‘Which library does it use?’. We use GPT-4 to generate the standalone keyword query: ‘query parser library’, which we then pass to a semantic search engine that returns a snippet demonstrating the parser in action: ‘let pair = PestParser::parse(Rule::query, query);’. We insert this snippet into a prompt to GPT-4, which is able to work out that pest is the library doing the legwork here, generating the answer ‘The query parser uses the pest library’.
You can also filter your search by repo or language - What’s the readiness delay repo:myApp lang:yaml. GPT-4 will generate an answer constrained to the respective repo and language.
We also know that LLMs are not always (at least not yet) the best tool for the job. Sometimes you know exactly what you’re looking for. For this, we’ve built a fast, trigram index based regex search engine based on Tantivy. Because of this, bloop is fast at traditional search too. For code navigation, we’ve built a precise go-to-ref/def engine based on scope resolution that uses Tree-sitter.
bloop is fully open-source. Semantic search, LLM prompts, regex search and code navigation are all contained in one repo: https://github.com/bloopAI/bloop.
Our software is standalone and doesn’t run in your IDE. We were originally IDE-based but moved away from this due to constraints on how we could display code to the user.
bloop runs as a free desktop app on Mac, Windows and Linux: https://github.com/bloopAI/bloop/releases. On desktop, your code is indexed with a MiniLM embedding model and stored locally, meaning at index time your codebase stays private. Indexing is fast, except on the very largest repos (GPU indexing coming soon). ‘Private’ here means that no code is shared with us or OpenAI at index time, and when a search is made only relevant code snippets are shared to generate the response. (This is more or less the same data usage as Copilot).
We also have a paid cloud offering for teams ($12 per user per month). Members of the same organisation can search a shared index hosted by us.
We’d love to hear your thoughts about the product and where you think we should take it next, and your thoughts on code search in general. We look forward to your comments!
[+] [-] tkiolp4|3 years ago|reply
Wouldn’t it make more sense for any company to have only ONE interface to use GPT that is wired with some (or all) parts of the company digital assets? Text is already flexible enough to allow such a universal interface.
[+] [-] tedsanders|3 years ago|reply
Two thoughts:
First, GPT is all marginal cost, no fixed cost. So the marginal economics of 1 super app vs 10 specialized mini apps is roughly the same.
Second, imagine the same argument applied to a utility like electricity. "Why would any company pay for lightbulbs from one company, HVAC from a second company, and appliances from a third company?"
Early on, electricity was complicated and you might actually purchase all of these from the Edison Illuminating Company. But there are returns to specialization, and today the winning formula is different companies that specialize in each. The company best at manufacturing lightbulbs is not necessarily the company that is best at manufacturing washing machines.
Similarly, you could ask: "Why would any company pay for a laptop from one company, an operating system from a second company, and office apps from a third company?"
In some cases it may make sense to vertically integrate (e.g., Apple is happy to sell you a combined laptop + OS + Pages/Numbers), but in many cases the specialized players still do fine (e.g., you might buy a Dell laptop, a Microsoft OS, and Notion/Sheets).
So I think it's very much an open question as to whether the winning approach will be a Swiss army knife (single product) or a toolkit (multiple products).
[+] [-] hn_throwaway_99|3 years ago|reply
1. https://en.wikipedia.org/wiki/Facade_pattern
[+] [-] s1k3s|3 years ago|reply
[+] [-] 101008|3 years ago|reply
Also, why as a customer would care if you use GPT or something else? It is because it is a buzzword?
[+] [-] jstummbillig|3 years ago|reply
Simple: You do it because you pay for what SaaS does, not what GPT does. A user is not interested in postgres or nginx, even though they are the tools used to build the tool they care about and pay for. If what the new tool uniquely does is not adding enough value for enough users, it's going to fail.
[+] [-] jacobr1|3 years ago|reply
Or consider codex low-code type use cases. Yes you could generate generic code, but integrated into some kind of platform that knows what APIs your company uses (internal cataglog) and some kind of IAM/Auth platform, PAAS to host the results integrated into a single tool might make sense, especially if the prompts are already injecting the boilerplate about how to interact with the ecosystem.
Or consider where just adding some kind of generative feature is just a _feature_ of broader product.
[+] [-] karmasimida|3 years ago|reply
The moat they are going to build is going to incredibly ... low.
The main intelligence is going to be provided by GPT models.
[+] [-] boplicity|3 years ago|reply
For example, I'm no a medical diet and need a meal plan. This diet has quite a few restrictions.
I could set up automated queries that produce a weekly meal plan, and grocery list, and also checks all of the ingredients against the allowable / not allowable food lists. This could include multiple queries, one to get an initial response, another to validate it, etc. Everything could be organized nicely, so I wouldn't have to input any text to get what I need -- I'd just click buttons. There's lots of ways this could be further customized to be more useful, such as saving recipes, generating new recipes based on old favorites, generating recipes based on foods already at home, etc. (Heck, you could wire it up to a camera in the fridge.)
Anyways -- text input to get text output is the most basic form of this technology. There's a lot you can do with such text to make it easier to interact with for specific use cases.
[+] [-] louiskw|3 years ago|reply
This is quite different from a GPT tool for most other jobs, and I think having granular control of the interface layer certainly helps us ship a better product.
That's not to say everything needs to be an app. If the output is just conversational text, it can and probably will be some kind of 'Alexa skill' like plugin.
[+] [-] teaearlgraycold|3 years ago|reply
[+] [-] coffeebeqn|3 years ago|reply
If it’s just appending a prompt to chatgpt then it’s certainly useless
[+] [-] menzoic|3 years ago|reply
The answer to why companies don't have one interface with GPT is that they would still need to invest time and money into building the application that uses GPT. Checkout the bloop source code to get an idea of what's needed https://github.com/bloopAI/bloop there are also many complex things to deal with. You can't just shove an entire codebase into GPT-4, it has limited ability to track context. The codebase needs to be indexed and stored in a way that sematic search can be done on it very quickly. Every app has their own unique challenges towards getting the data into GPT and making it fast.
TL;DR companies are buying apps that use GPT-4 API, not just access to GPT-4. They'd rather buy the apps instead of building and maintaining them.
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] li4ick|3 years ago|reply
[+] [-] boplicity|3 years ago|reply
I'm not a coder but I code; not sure if that makes sense. I sort of know a bit of regex, but find it utterly painful and not worth the time.
I'm an amateur, effectively, with little time. I like AI tools for coding, because I can input a request, and get sample code. I know enough to be able to read most of the code that I get back, but not enough to be able to easily write such code on my own.
These types of tools, in my opinion, have the potential to make people like me, and even people who are much less knowledgable than me, productive programmers. This could be transformative.
Sure, we won't be as good or productive as real programmers, but that's beside the point.
In other words, what you're referring to with "that's about it" could still transform quite a few lives.
[+] [-] intalentive|3 years ago|reply
[+] [-] cjonas|3 years ago|reply
There should be a way to only run completions when prompted.
Please upvote this issue if you run into the same problems: https://github.com/community/community/discussions/9817
[+] [-] nico|3 years ago|reply
And that builds a test first. Then iterates on the new code against the test until it passes.
ChatTDD?
[+] [-] renewiltord|3 years ago|reply
> ‘Private’ here means that no code is shared with us or OpenAI at index time, and when a search is made only relevant code snippets are shared to generate the response. (This is more or less the same data usage as Copilot).
is the reason I went from "ha, cool tool" to "okay, let me go download it".
Quite surprised that you don't actually have a login wall to download. Missing an opportunity to track downloads, upsell to paid plan, etc. etc. imho.
[+] [-] evrydayhustling|3 years ago|reply
[+] [-] purplecats|3 years ago|reply
[+] [-] dazbradbury|3 years ago|reply
I can't find any mention of which languages are supported - can anyone point me in the right direction?
[+] [-] bebrws|3 years ago|reply
I used TreeSitter which I thought was pretty awesome though because it allows for parsing a TON of different languages. I had to parse the languages to create the different code snippet strings. I don't want to create a code snippet of half a function for example..
So TreeSitter parses the code into an AST and I send each different AST node to OpenAI to get the vector (I optimized this so multiple nodes of the same AST type are combined). Send the prompt to OpenAI to get a vector. Find the most similar code snippets to the prompt and include them at the top of a prompt to ChatGPT.
This is the same idea right? If anyones interested it can be found here: https://bbarrows.com/posts/using-embeddings-ada-and-chatgpt-...
https://github.com/bebrws/openai-search-codebase-and-chat-ab...
[+] [-] ggordonhall|3 years ago|reply
[+] [-] dabei|3 years ago|reply
[+] [-] x-complexity|3 years ago|reply
Compared to a regular search engine, the permissions required are pretty much the same. Both this & regular search engines need to go through a repo's codebase to be even able to give results in the first place.
Privacy-wise, they could probably make it better by requiring each repo to be approved before they can be searched, but that would make for a more friction-laden developer UX. The broad permissions are likely just a consequence of not wanting to ask the user every time a new repo is to be searched through.
[+] [-] louiskw|3 years ago|reply
On bloop cloud we use the GitHub App permission system which is more granular and only request read access.
[+] [-] bjtitus|3 years ago|reply
[+] [-] knexer|3 years ago|reply
- llm rephrases the last conversation entry into a standalone, context-free search query
- rephrased query is embedded, top-k results retrieved from the vector db
- llm selects a top-1 winner from the top-k results
- llm answers the question given conversational context and the top-1 code search result
(from https://github.com/BloopAI/bloop/blob/8905a36388ce7b9dadaedf...)
[+] [-] duped|3 years ago|reply
[+] [-] Alifatisk|3 years ago|reply
OpenAi’s codex and the newer models should be used for actual coding related prompts.
[+] [-] elanzini|3 years ago|reply
If multiple pieces of code from different files are being referenced in the response, it would be nice to have clickable refs that take you to that piece of code in the repo.
[+] [-] ggordonhall|3 years ago|reply
We're working on a new interface to make this clearer, where we'll display the lines of code that GPT has referenced in its answer.
[+] [-] btbuildem|3 years ago|reply
[+] [-] louiskw|3 years ago|reply
I thought my Sennheiser Momentum 4's would do a better job, but even they were no match for a glass call booth.
[+] [-] avinassh|3 years ago|reply
[+] [-] swyx|3 years ago|reply
i honestly feel like a bad user for this but i have yet to adopt a semantic search engine for code for some reason, respite codeium and sourcegraph also offering more advanced code search thingies. any ideas on how to break force of habit?
[+] [-] sqs|3 years ago|reply
Code search historically has been adopted by <10% of devs, although it's usually binary within each company, with equilibriums at both ~1% adoption or >80%+ adoption. My model of LLMs applied to code search is that they make it so even a first-time user can use (easily, via the LLM) features that were previously only accessible to power users: regexps, precise code navigation, large-scale changes/refactors, diff search, other advanced search filters, and all kinds of other code metadata (ownership, dep graph, observability, deployment, runtime perf, etc.) that the code search engine knows. The code search engine itself is just used under the hood by the LLM to answer questions and write great code that uses your own codebase's conventions.
I bet you (and the ~90% of devs who don't use code search today) will be compelled when you see and use /that/.
[+] [-] louiskw|3 years ago|reply
For one, having the engine respond in natural language makes a big difference. The last generation of semantic code search (we built one) used transformers to retrieve code, but as a user you'd still have to read through an entire code chunk to find the answer.
Also, LLMs (probably starting with GPT-4) can now reason properly. The capability to make a search, read the results and execute an entirely new search based on its reasoning skills, and do this iteratively, until it finds the answer, is a huge jump from just using semantic search on its own.
[+] [-] amcaskill|3 years ago|reply
Any plans to enable the tool to edit code for me? For example if I ask for a suggestion on how to fix something, could I just “accept” the suggestion and have the changes made to the file?
[+] [-] mehlmao|3 years ago|reply
[+] [-] samwillis|3 years ago|reply
[+] [-] avinassh|3 years ago|reply
[+] [-] Freddie111|3 years ago|reply
[+] [-] meken|3 years ago|reply
[+] [-] ccQpein|3 years ago|reply
[+] [-] sorokod|3 years ago|reply
[+] [-] andre-z|3 years ago|reply
[+] [-] shashanoid|3 years ago|reply