top | item 44131629

(no title)

persedes | 9 months ago

Would be interesting to see how e.g sentence transformer models compare to this. My takeaway with the e.g. openai embedding models was that they were better suited for larger chunks of texts, so getting god + dog with a higher similarity might be indicative that it's not a good model for such small text?

  emb = SentenceTransformer("all-MiniLM-L6-v2")
  embeddings = emb.encode(["dog", "god"])
  cosine_similarity(embeddings)
  Out[16]: 
  array([[1.        , 0.41313702],
       [0.41313702, 1.0000004 ]], dtype=float32)

discuss

No comments yet.