Looks cool! Thoughts on exposing this through a cli or mcp for local knowledge access for agents? For example, I use Claude Code for research and I have a local corpus of PDFs that I would like to make available as additional domain-specific information that Claude can use in addition to what it has in Opus or whatever model I'm using.
Tell us more. I had Codex port this to Python so I could wrap my head around it, it’s quite interesting. Why would I use this WAL-check pointing thingamajig when I have access to SQLite-vec, qdrant and other embedded friends?
WAL/checkpointing is about control over durability and crash behavior, not “better vectors.”
sqlite-vec and Qdrant are storage engines first; their durability is mostly “under the hood.” If your goal is a clean
local RAG system, owning that layer can be better when you want:
1. deterministic ingest semantics (append-only event log of chunks, then materialize state),
2. fast recovery from partial writes (replay only WAL since last checkpoint),
3. precise checkpoint boundaries tuned to your app (e.g., after every batch/conv/session ingest),
4. a single-file, dependency-light artifact you can own end-to-end.
That’s why it can be better than sqlite-vec/Qdrant in this specific case: not for raw ANN quality, but for operational
predictability + composability of ingestion, retrieval, and memory lifecycle in one library.
If you don’t care about that control and are fine with a managed server/extension model, built-ins are usually the
simpler and smarter choice.
For one, qmd uses SQLite (fts5 and SQLite-vec, at least at some point) and then builds reranked hybrid search on top of that. It uses some cool techniques like resilient chunking and embedding, all packaged up into a typescript cli. Id say it sits at a layer above Wax.
I assume that wax uses Apple's ANE to do embeddings (so no third-party services like OpenAI are needed). Did you happen to compare search quality when using ANE embeddings vs. OpenAI's text-embedding-3-large (or another commonly used online embedding)?
Great idea. But having only an SDK in Swift makes it extremely restricted. If it had a CLI interface I could start using it right away. And it only works on macOS?
Putting out a cli interface in the coming days,
linux support is coming for WaxCore by Next week
works on macOS, iOS, vision os, watch os
In addition were working on a port to python and kotlin
sqlite-vec is a great vector index — Wax actually uses SQLite under the hood too.
The difference is the layer. sqlite-vec gives you vec_distance_cosine() in SQL. Wax gives you: hand it a .mov file, get
back token-budgeted, LLM-ready context from keyframes and transcripts, with EXIF-accurate timestamps and hybrid
BM25+vector search via RRF fusion — all on-device.
It's the difference between a B-tree and an ORM. You'd still need to write the entire ingestion pipeline, media parsing,
frame hierarchy, token counting, and context assembly on top of sqlite-vec. That's what Wax is.
This looks rather over-engineered. I built something myself using SQLite in probably 10% of the code (or less) and queries were always running in the single digit milliseconds?
No offense intended but was this vibe coded? Have you tested and verified the code by hand?
What does verifying code by hand even mean? Do we now have artisanal testing, lovingly hand crafted? I get the current debate about LLM assisted coding but this is mudslinging and not constructive discussion.
I built Wax because every RAG solution required either Pinecone/Weaviate in the cloud or ChromaDB/Qdrant running locally. I wanted the SQLite of RAG -- import a library, open a file, query. Except for multimodal content at GPU speed.
The architecture that makes this work:
Metal-accelerated vector search -- Embeddings live directly in unified memory (MTLBuffer). Zero CPU-GPU copy overhead. Adaptive SIMD4/SIMD8 kernels + GPU-side bitonic sort = sub-millisecond search on 10K+ vectors (vs ~100ms CPU). This isn't just "faster" -- it enables interactive search UX that wasn't possible before.
Atomic single-file storage (.mv2s) -- Everything in one crash-safe binary: embeddings, BM25 index, metadata, compressed payloads. Dual-header writes with generation counters = kill -9 safe. Sync via iCloud, email it, commit to git. The file format is deterministic -- identical input produces byte-identical output.
Photo/Video RAG -- Index your photo library with OCR, captions, GPS binning, per-region embeddings. Query "find that receipt from the restaurant" searches text, visual similarity, and location simultaneously. Videos get segmented with keyframe embeddings + transcript mapping. Results include timecodes for jump-to-moment navigation. All offline -- iCloud-only photos get metadata-only indexing.
Swift 6.2 strict concurrency -- Every orchestrator is an actor. Thread safety proven at compile time, not runtime. Zero data races, zero @unchecked Sendable, zero escape hatches.
Deterministic context assembly -- Same query + same data = byte-identical context every time. Three-tier surrogate compression (full/gist/micro) adapts based on memory age. Bundled cl100k_base tokenizer = no network, no nondeterminism.
import Wax
let brain = try await MemoryOrchestrator(at: URL(fileURLWithPath: "brain.mv2s"))
// Index
try await brain.remember("User prefers dark mode, gets headaches from bright screens")
// Retrieve
let context = try await brain.recall(query: "user display preferences")
// Returns relevant memories with source attribution, ready for LLM context
What makes this different:
Zero dependencies on cloud infrastructure -- No API keys, no vendor lock-in, no telemetry
Production-grade concurrency -- Not "it works in my tests," but compile-time proven thread safety
Multimodal from the ground up -- Text, photos, videos indexed with shared semantics
Performance that unlocks new UX -- Sub-millisecond latency enables real-time RAG workflows
## Wax Performance (Apple Silicon, as of Feb 17, 2026)
- 0.84ms vector search at 10K docs (Metal, warm cache)
- 9.2ms first-query after cold-open for vector search
- ~125x faster than CPU (105ms) and ~178x faster than SQLite FTS5 (150ms) in
the same 10K benchmark
- 17ms cold-open → first query overall
- 10K ingest in 7.756s (~1289 docs/s) with hybrid batched ingest
- 0.103s hybrid search on 10K docs
- Recall path: 0.101–0.103s (smoke/standard workloads)
Built for: Developers shipping AI-native apps who want RAG without the infrastructure overhead. Your data stays local, your users stay private, your app stays fast.
The storage format and search pipeline are stable. The API surface is early but functional. If you're building RAG into Swift apps, I'd love your feedback.
Would wax also be usable as a simple variant of a hybrid search solution? (i.e., not in the context of "agent memory" where knowledge added earlier is worth less than knowledge added more recently)
peterloron|12 days ago
giancarlostoro|12 days ago
malshe|12 days ago
backscratches|12 days ago
ckarani|12 days ago
ckarani|12 days ago
threecheese|12 days ago
ckarani|12 days ago
simlevesque|12 days ago
HexDecOctBin|12 days ago
dmd|12 days ago
atonse|12 days ago
How does this compare to qmd [1] by Tobi Lutke?
[1] https://github.com/tobi/qmd
threecheese|12 days ago
mshekow|12 days ago
znnajdla|12 days ago
ckarani|12 days ago
kleton|12 days ago
Stefan-H|12 days ago
ckarani|12 days ago
The difference is the layer. sqlite-vec gives you vec_distance_cosine() in SQL. Wax gives you: hand it a .mov file, get back token-budgeted, LLM-ready context from keyframes and transcripts, with EXIF-accurate timestamps and hybrid BM25+vector search via RRF fusion — all on-device.
It's the difference between a B-tree and an ORM. You'd still need to write the entire ingestion pipeline, media parsing, frame hierarchy, token counting, and context assembly on top of sqlite-vec. That's what Wax is.
ElFitz|11 days ago
Can’t wait to try it on iOS if the required APIs are available.
burntalmonds|12 days ago
saberience|11 days ago
No offense intended but was this vibe coded? Have you tested and verified the code by hand?
zcw100|11 days ago
ckarani|12 days ago
The architecture that makes this work: Metal-accelerated vector search -- Embeddings live directly in unified memory (MTLBuffer). Zero CPU-GPU copy overhead. Adaptive SIMD4/SIMD8 kernels + GPU-side bitonic sort = sub-millisecond search on 10K+ vectors (vs ~100ms CPU). This isn't just "faster" -- it enables interactive search UX that wasn't possible before.
Atomic single-file storage (.mv2s) -- Everything in one crash-safe binary: embeddings, BM25 index, metadata, compressed payloads. Dual-header writes with generation counters = kill -9 safe. Sync via iCloud, email it, commit to git. The file format is deterministic -- identical input produces byte-identical output.
Query-adaptive hybrid fusion -- Four parallel search lanes (BM25, vector, timeline, structured memory). Lightweight classifier detects intent ("when did I..." → boost timeline, "find documentation about..." → boost BM25). Reciprocal Rank Fusion with deterministic tie-breaking = identical queries always return identical results.
Photo/Video RAG -- Index your photo library with OCR, captions, GPS binning, per-region embeddings. Query "find that receipt from the restaurant" searches text, visual similarity, and location simultaneously. Videos get segmented with keyframe embeddings + transcript mapping. Results include timecodes for jump-to-moment navigation. All offline -- iCloud-only photos get metadata-only indexing. Swift 6.2 strict concurrency -- Every orchestrator is an actor. Thread safety proven at compile time, not runtime. Zero data races, zero @unchecked Sendable, zero escape hatches.
Deterministic context assembly -- Same query + same data = byte-identical context every time. Three-tier surrogate compression (full/gist/micro) adapts based on memory age. Bundled cl100k_base tokenizer = no network, no nondeterminism.
import Wax
let brain = try await MemoryOrchestrator(at: URL(fileURLWithPath: "brain.mv2s"))
// Index try await brain.remember("User prefers dark mode, gets headaches from bright screens")
// Retrieve let context = try await brain.recall(query: "user display preferences") // Returns relevant memories with source attribution, ready for LLM context
What makes this different:
Zero dependencies on cloud infrastructure -- No API keys, no vendor lock-in, no telemetry Production-grade concurrency -- Not "it works in my tests," but compile-time proven thread safety Multimodal from the ground up -- Text, photos, videos indexed with shared semantics Performance that unlocks new UX -- Sub-millisecond latency enables real-time RAG workflows
## Wax Performance (Apple Silicon, as of Feb 17, 2026)
Built for: Developers shipping AI-native apps who want RAG without the infrastructure overhead. Your data stays local, your users stay private, your app stays fast.The storage format and search pipeline are stable. The API surface is early but functional. If you're building RAG into Swift apps, I'd love your feedback.
GitHub: https://github.com/christopherkarani/Wax
Star it if you're tired of spinning up vector databases for what should be a library call.
anonymoushn|12 days ago
mshekow|12 days ago
owenm|12 days ago