top | item 35551908

(no title)

heyzk | 2 years ago

Would love to hear more about (1), are you vectorizing different types of transforms on the original document?

discuss

jn2clark|2 years ago

I think you can and it has some benefits. One interesting thing that can help is to store representations from transformations over the document and then "fuse" the vectors (i.e. average them) at indexing time. You are effectively able to do run-time augmentation but without any extra inference overhead at query time and without increased memory. An easy way to think of this is for similarity measures that are linear (i.e. dot product) you are now scoring the document over a weighted sum of the transformations of the document. Test-time augmentation is a very well known method in ML generally for improving performance and is applicable here. You can do the same for queries as well - akin to query expansion.