Looks neat. It would be useful to compare to other implementations: https://ann-benchmarks.com/ -- potentially not just speed, but implementation details that might change recall.
i think with small codebases like this is less about speed and more about education of essentials - i actually often encourage juniors to do small clones like this, feel proud, and then study the diffs with the at-scale repros and either feel humbled or feel like they have a contribution to make.
I see they are still using GloVe word embeddings for the first benchmark. Ah good ol' days! Nothing wrong with it, should still yield a realistic distribution of vectors. Just brings a lot of memories :)
swyx|10 months ago
unknown|10 months ago
[deleted]
oersted|10 months ago