Blog author. I've done some separate testing on storing ~500GB of embeddings (~1B embeddings) in a partitioned table. The partition key was built using IVFFLAT as a "coarse quantizer" (in this case, sampling the entire dataset and finding K means), storing the mean vectors in a separate table, and then loading each vector into the partition with closest center. After that, I built an IVFFLAT index on each partition. With the indexes, this added up to ~1TB storage. This was primarily a "is it possible test" vs. thorough benchmarking.
No comments yet.