(no title)
pbadams | 2 years ago
In general though PQ is a pretty good baseline. I'm glad all these vector DB companies seem to have decided that the best form of marketing is high-quality summaries/tutorials about fundamental concepts, it's a good contribution to the community.
[1] Fig. 1 in https://www.jstage.jst.go.jp/article/mta/6/1/6_2/_pdf (2018)
fzliu|2 years ago
As great as HNSW and other graph-based indexes are, I think PQ and other encoder/decoder-based methods are still incredibly important for ANN search in general. In particular, it should be possible to learn some sort of joint encoding with neural networks targeted towards different modalities.
jhj|2 years ago
However PQ does have the nice property that distance computation can be performed with lookup-adds instead of multiply-adds but in practice performance is lower than normal brute-force multiply-add inner product because lookup table gather is slow on CPU or GPU relative to arithmetic. The gain of PQ is by dimensionality reduction (a 128 dimensional vector can be compressed as 8 PQ codes resulting in 8 lookup-adds, versus 128 FMAs for the brute-force inner product, say).