top | item 43321523

Show HN: In-Browser Graph RAG with Kuzu-WASM and WebLLM

158 points| sdht0 | 11 months ago |blog.kuzudb.com

We show the potential of modern, embedded graph databases in the browser by demonstrating a fully in-browser chatbot that can perform Graph RAG using Kuzu (the graph database we're building) and WebLLM, a popular in-browser inference engine for LLMs. The post retrieves from the graph via a Text-to-Cypher pipeline that translates a user question into a Cypher query, and the LLM uses the retrieved results to synthesize a response. As LLMs get better, and WebGPU and Wasm64 become more widely adopted, we expect to be able to do more and more in the browser in combination with LLMs, so a lot of the performance limitations we see currently may not be as much of a problem in the future.

We will soon also be releasing a vector index as part of Kuzu that you can also use in the browser to build traditional RAG or Graph RAG that retrieves from both vectors and graphs. The system has come a long way since we open sourced it about 2 years ago, so please give us feedback about how it can be more useful!

29 comments

order

willguest|11 months ago

I absolutely love this. I make VR experiences that run on the ICP, which delivers wasm modules as smart contracts - I've been waiting for a combo of node-friendly, wasm deployable tools and webLLM. The ICP essentially facilitates self-hosting of data and provides consensus protocols for secure messaging and transactions.

This will make it super easy for me to add LLM functionality to existing webxr spaces, and I'm excited to see how an intelligent avatar or convo between them will play out. This is, very likely, the thing that will make this possible :)

If anyone wants to collab, or contribute in some way, I'm open to ideas and support. Search for 'exeud' to find more info

wkat4242|11 months ago

Why the Blockchain there? I don't really see the value. But maybe I misunderstand. It's just that I tend to be pretty dismissive of products mentioning blockchain. Mostly from the time when this tech was severely overhyped. Like Metaverse after it and now of course AI. I do know there's some usecases for it, I just wonder what they are and why you chose it.

I think I like the idea but I don't think I fully understand what it is that you're doing :) But I love everything VR.

esafak|11 months ago

The example is not ideal for showcasing a graph analytics database because they could have used a traditional relational database to answer the same query, Which of my contacts work at Google?

laminarflow027|11 months ago

Hi, I work at Kuzu and can offer my thoughts on this.

You're making a fair observation here and it's true for any high level query language - SQL and Cypher and interchangeable unless the queries are recursive, in which case Cypher's graph syntax (e.g., the Kleene star * or shortest paths) has several advantages. One could make the argument that Cypher is easier for LLMs to generate because the joins are less verbose (you simply express the join as a query pattern). This post is not necessarily about graph analytics. It's about demonstrating that it's very simple to develop a relatively complex application using LLMs and a database fully in-browser, which can potentially open up new use cases. I'm sure many people will come up with other creative ways putting these fully in-browser technologies, both graph-specific, and not, e.g., using vector search-based retrieval. In fact, there are already some of our users doing this right now.

beefnugs|11 months ago

We wont be seeing any ai examples that actually are anywhere near useful until we rewrite all serialization/de serialization into "natural language" as well as create layers upon layers of loops of frameworks with simulations and test cases around this nonsense

nattaylor|11 months ago

This is very cool. Kuzu has a ton of great blog content on all the ways they make Kuzu light and fast. WebLMM (or in the future chrome.ai.* etc) + embedded graph could make for some great UXes

At one time I thought I read that there was a project to embed Kuzu into DuckDB, but bringing a vector store natively into kuzu sounds even better.

mentalgear|11 months ago

Nice! You might also want to check out Orama - which is also an open-source hybrid vector/full text search engine for any js runtime.

canadiantim|11 months ago

Could it be viable to have one or multiple kuzu databases per user? What’s the story like for backups with kuzu?

I saw you recently integrated FTS which is very exciting. I love everything about Kuzu and want to use it, but currently tempted to use Turso to allow for multiple sqlite dbs per user (eg one for each device).

Or would it be possible to use Kuzu to query data stored on sqlite?

Great work through and through tho. Really amazing to see the progress you’ve all made!

guodong|11 months ago

Hi! Work at Kuzu here.

> would it be possible to use Kuzu to query data stored on sqlite? Yes, we have a SQLite extension (https://docs.kuzudb.com/extensions/attach/rdbms/) that can read data from SQLite databases.

> Could it be viable to have one or multiple kuzu databases per user? What’s the story like for backups with kuzu? You can have multiple databases, but con only connect to one at a time for now. We don't have support for backups for now, but we'd like to hear more about your specific use cases. Would be great if you could join our discord (https://kuzudb.com/chat) or contact us through contact@kuzudb.com, and we can chat more there.

jasonthorsness|11 months ago

Don't the resource requirements from even small LLMs exclude most devices/users from being able to use stuff like this?

laminarflow027|11 months ago

True, but there are likely innovations happening in multiple dimensions all at once: WebGPU improvements that better utilize a device's compute, Wasm64. And of course, LLMs over time become SLMs (smaller and smaller models), that can do a surprisingly large variety of things well.

Putting aside LLMs for a minute, even applications that do not need LLMs, but benefit from a graph database, can be unlocked to help build interactive UIs and visualizations that retain privacy on the client side without ever moving the data to a server. Loads of possibilities!

nsonha|11 months ago

Could someone please explain in-browser inference to me? So in the context of OpenAI usage (WebLLM github), this means I will send binary to OpenAI instead of text? And it will lower the cost and run faster?

a-ungurianu|11 months ago

Not exactly. If you refer to the following line:

> Full OpenAI Compatibility

> WebLLM is designed to be fully compatible with OpenAI API.

It means that WebLLM exposes an API that is identical in behaviour with the OpenAI one, so any tools that build against that API could also build against WebLLM and it will still work.

WebLLM by the looks of it runs the inference purely in the browser. None of your data leaves your browser.

WebLLM does need to get a model from somewhere, with the demo linked here getting Llama3.1 Instruct 8B model{1}.

1: https://huggingface.co/mlc-ai/Llama-3.1-8B-Instruct-q4f32_1-...

DavidPP|11 months ago

I'm new to the world of graph, and I just started building with SurrealDB in embedded mode.

If you don't mind taking a few minutes, what are the main reasons to use Kuzu instead?

laminarflow027|11 months ago

Hi, glad to help! I'm a DevRel advocate at Kuzu, and have spent a decent amount of time in other database paradigms thinking about these things. I'm familiar with SurrealDB too.

Although I cannot comment too much SurrealDB's exact capabilities and performance at this point, I can definitely highlight that at the data modeling and query language-level: Kuzu's data model is a property graph model (so an actual "graph" model rather than a multi-model database) and Kuzu implements Cypher as its query language, which is already widely adopted in the industry and is very intuitive to write (for both humans and LLMs).

Although Surreal DB does indeed offer an embedded mode, Kuzu is by design 100% embedded, is super-lightweight and can run natively in many environments, such as browsers, Android applications, AWS Lambda (serverless) and we're especially designed to be a VERY Python-friendly graph database that integrates with pretty much all well-known Python libraries. Because of its columnar storage layer, Kuzu can seamlessly read and write different data formats, such as Panda/Polars DataFrame, Arrow tables, Iceberg or Delta Lake tables and seamlessly move data between advanced graph analytics libraries like NetworkX. For anything related to graph computation, Kuzu is likely to have all the right tools and utilities to help you solve the problem at hand.

In my opinion, it's a myth that databases are heavy, monolithic pieces of software, and hopefully, using Kuzu will demonstrate that it's totally possible to have data in your primary store but seamlessly move it to a performant graph storage layer when required, and move the results back with minimum friction and cost. Hope that helps!

srameshc|11 months ago

I heard about it for the first time, an embedable graph database Kuzu and even better the WASM mix and LLM.

itissid|11 months ago

Since I already have a browser connected to the Internet where this would execute, could one have the option of transparently executing the webGPU + LLM in a cloud container communicating with the browser process?

mewim|11 months ago

I think WebGPU is mostly for running inside the browser. If one has the option to use a cloud container + GPU, running LLM inference directly with CUDA/ROCm/TPU will be possible and runs more efficiently.