Hey I wrote this about 6 months ago, nice to see it here! AMA, but please note that this is SoTA territory and things have changed significantly since then. Notably, folks are now seeing good preliminary results with SBERT (sentence level encodings instead of token-level): https://www.aclweb.org/anthology/D19-1410.pdf
Any experience clustering or classifying documents based on these high dimensional vectors? Also what have you found of dimension reduction techniques such as UMAP or good old PCA?
you mentioned the prohibitive size of the vectorizations of documents -- what role, if any, have matrix/tensor decompositions or tensor networks played in helping the search community with this?
[+] [-] binarymax|5 years ago|reply
[+] [-] charlescearl|5 years ago|reply
https://cs.uwaterloo.ca/~jimmylin/publications/Nogueira_Lin_...
Do you have any thoughts on this or similar approaches in production?
[+] [-] petulla|5 years ago|reply
[+] [-] binarymax|5 years ago|reply
[+] [-] lootsauce|5 years ago|reply
[+] [-] charleskinbote|5 years ago|reply
for reference: https://ai.googleblog.com/2019/06/introducing-tensornetwork-...