(no title)
adtac | 6 months ago
Vector embeddings are lossy encodings of documents roughly in the same way a SHA256 hash is a lossy encoding. It's virtually impossible to reverse the embedding vector to recover the original document.
Note: when vectors are combined with other components for search and retrieval, it's trivial to end up with a horribly insecure system, but just vector embeddings are useful by themselves and you said "all useful AI retrieval systems are insecure by design", so I felt it necessary to disagree with that part.
sfink|6 months ago
Incorrect. With a hash, I need to have the identical input to know whether it matches. If I'm one bit off, I get no information. Vector embeddings by design will react differently for similar inputs, so if you can reproduce the embedding algorithm then you can know how close you are to the input. It's like a combination lock that tells you how many numbers match so far (and for ones that don't, how close they are).
> It's virtually impossible to reverse the embedding vector to recover the original document.
If you can reproduce the embedding process, it is very possible (with a hot/cold type of search: "you're getting warmer!"). But also, you no longer even need to recover the exact original. You can recover something close enough (and spend more time to make it incrementally closer).
mpeg|6 months ago
frakt0x90|6 months ago
unknown|6 months ago
[deleted]