top | item 41137658

Sqlite-vec: Work-in-progress vector search SQLite extension that runs anywhere

314 points| brylie | 1 year ago |github.com | reply

43 comments

order
[+] alexgarcia-xyz|1 year ago|reply
Author here, happy to answer any questions! Been working on this for a while, so I'm very happy to get this v0.1.0 "stable" release out.

sqlite-vec works on MacOS, Linux, Windows, Raspberry Pis, in the browser with WASM, and (theoretically) on mobile devices. I focused a lot on making it as portable as possible. It's also pretty fast - benchmarks are hard to do accurately, but I'd comfortable saying that it's a very very fast brute-force vector search solution.

One experimental feature I'm working on: You can directly query vectors that are in-memory as a contiguous block of memory (ie NumPy), without any copying or cloning. You can see the benchmarks for that feature here under "sqlite-vec static", and it's competitive with faiss/usearch/duckdb https://alexgarcia.xyz/blog/2024/sqlite-vec-stable-release/i...

[+] bambax|1 year ago|reply
Thank you for this, it's really super exciting!

The link on See Installing sqlite-vec for more details. https://alexgarcia.xyz/sqlite-vec/installing.html is a a 404 (the correct link is https://alexgarcia.xyz/sqlite-vec/installation.html presumably).

The datasette link https://datasette.io/plugins/datasette-sqlite-vec is an error 500.

On the releases page https://github.com/asg017/sqlite-vec/releases/tag/v0.1.0 can you explain what is vec0.dll vs sqlite-vec-0.1.0-loadable-windows-x86_64.tar.gz, which also contains a similarly named vec0.dll but of a different size?

[+] rcarmo|1 year ago|reply
Great to see this. Seems simple enough, but I can't wait until ORMs like peewee incorporate support alongside things like FTS, etc. just for the sake of case of use.
[+] cyanydeez|1 year ago|reply
Which wasm sqlite project would it be compatible with?
[+] Cieric|1 year ago|reply
I feel like I've touched a lot of things where something like this is useful (hobby projects). In my case I've done a recommendation engine, music matching (I specifically use it for matching anime to their data), and perceptual hash matching.
[+] alexgarcia-xyz|1 year ago|reply
Really curious to hear about what kind of music embedding models/tools you used! I've tried finding some good models before but they were all pretty difficult to use
[+] yard2010|1 year ago|reply
Can you elaborate about your projects please? What tools are you using?
[+] pjot|1 year ago|reply
I’ve done something similar, but using duckDB as the backend.

https://github.com/patricktrainer/duckdb-embedding-search

[+] youngbum|1 year ago|reply
Duckdb is an excellent choice for this task, and it’s incredibly fast!

We’ve also added vector search to our product, which is really useful.

OpenAI’s official examples of embedding search use cosine similarity. But here’s the cool part: since OpenAI embeddings are unit vectors, you can just run the dot product instead!

DuckDB has a super fast dot product function that you can use with SQL.

In our product, we use duckdb-wasm to do vector searches on the client side.

[+] bodantogat|1 year ago|reply
This sounds useful (I do a lot of throw-away text analysis on my laptop)
[+] marvel_boy|1 year ago|reply
Could anybody explain me a simple example how to do text analysis via this vector search. It just searches for the closer vector?
[+] 1yefuwang1|1 year ago|reply
Hi, nice work. I write a similar vector search extension https://github.com/1yefuwang1/vectorlite inspired by sqlite-vss using C++17 and hnswlib.

I'd like to do a benchmark to compare it with sqlite-vec, but I guess it is not a fair comparison given that sqlite-vec uses brute-force only.

One thing I'd recommend is to include recall rate in your benchmark data.

Brute force approach is a good starting point but doesn't scale with serious production workload.

[+] deepsquirrelnet|1 year ago|reply
I love this. I know how much work addressing the dependencies must be, but you’re really attacking the right problems. Looking forward to trying this out with my project.
[+] huevosabio|1 year ago|reply
Been using this for video games and it's absolutely awesome. Alex, the author, is also great and very approachable.

I've been looking for something like this for a while.

[+] bcjordan|1 year ago|reply
Curious what applications for vector search you've found interesting
[+] nattaylor|1 year ago|reply
I have a use case for this that I'm excited to try. I'm glad AlexG has put so much effort into this. Even the docs are pretty good!

My pyenv python3.12.2's sqlite won't load extensions even after installing with what I think are the correct command line flags. Argh!

My brew installed python3.12's sqlite will load extensions though, so I can proceed.

[+] mic47|1 year ago|reply
Nice. Been waiting for this release to try it out.
[+] pietz|1 year ago|reply
Is this also what turso uses in their "AI feature"?
[+] alexgarcia-xyz|1 year ago|reply
No, libsql added custom vector search directly into their library, while sqlite-vec is a separate SQLite extension.

The libsql vector feature only works in libsql, sqlite-vec works in all SQLite versions. The libsql vector feature works kindof like pgvector, while sqlite-vec works more like the FTS5 full text SQLite extension.

I'd say try both and see which one you like more. sqlite-vec will soon be a part of Turso's and SQLite Cloud's products.

Turso's version: https://turso.tech/vector

[+] haolez|1 year ago|reply
What's the maximum vector size?
[+] alexgarcia-xyz|1 year ago|reply
vec0 virtual tables have a hard-coded max of 8192 dimensions, but I can raise that very easily (I wanted to reduce resource exhaustion attacks). But if you're comparing vectors manually, then the `vec_distance_ls()` and related functions have no limits (besides SQLite's 1GB blob limit)