If I understand, having only 4096 bytes of data per term causes multiple terms in the same query to intersect to little or no results. The purpose seems to cut cost in compromise of completeness.
Yeah. That seems like a design decision that will scale poorly. For reference, even in my dinky 100M index I have individual terms with several gigabytes of associated document references.
In general hash map table index designs don't tend to be very efficient. If you use a skip list or something similar, you can calculate the intersection between sets in sublinear time.
marginalia_nu|2 years ago
In general hash map table index designs don't tend to be very efficient. If you use a skip list or something similar, you can calculate the intersection between sets in sublinear time.
daoudc|2 years ago
daoudc|2 years ago