top | item 22945690

Understanding BERT and Search Relevance (2019)

78 points| martinlaz | 5 years ago |opensourceconnections.com | reply

18 comments

[+] binarymax|5 years ago|reply

Hey I wrote this about 6 months ago, nice to see it here! AMA, but please note that this is SoTA territory and things have changed significantly since then. Notably, folks are now seeing good preliminary results with SBERT (sentence level encodings instead of token-level): https://www.aclweb.org/anthology/D19-1410.pdf

[+] charlescearl|5 years ago|reply

I've glanced at an approach augments an existing document by adding likely search terms generated by BERT model specialized for Q&A

https://cs.uwaterloo.ca/~jimmylin/publications/Nogueira_Lin_...

Do you have any thoughts on this or similar approaches in production?

[+] petulla|5 years ago|reply

How did you end up handling the query/document asymmetry issue? Seems like query sentenceBERT/averaged document vectors?

[+] binarymax|5 years ago|reply

Also, shameless plug, but if you are interested in learning more about this, we're giving a training course in June which covers this and other NLP search techniques extensively: https://opensourceconnections.com/blog/2020/06/16/nlp-remote...

[+] lootsauce|5 years ago|reply

Any experience clustering or classifying documents based on these high dimensional vectors? Also what have you found of dimension reduction techniques such as UMAP or good old PCA?

[+] charleskinbote|5 years ago|reply

you mentioned the prohibitive size of the vectorizations of documents -- what role, if any, have matrix/tensor decompositions or tensor networks played in helping the search community with this?

for reference: https://ai.googleblog.com/2019/06/introducing-tensornetwork-...