(no title)
xfalcox | 3 months ago
Making the storage cost of the index 32 times smaller is the difference of being able to offer this at scale without worrying too much about the overhead.
xfalcox | 3 months ago
Making the storage cost of the index 32 times smaller is the difference of being able to offer this at scale without worrying too much about the overhead.
Someone|3 months ago
By moving the values to a single bit, you’re lumping stuff together that was different before, so I don’t think recall loss would be expected.
Also: even if your vector is only 100-dimensional, there already are 2^100 different bit vectors. That’s over 10^30.
If your dataset isn’t gigantic and has documents that are even moderately dispersed in that space, the likelihood of having many with the same bit vector isn’t large.
barrkel|3 months ago