I've been playing with embeddings and wanted to try out what results the embedding layer will produce based on just word-by-word input and addition / subtraction, beyond what many videos / papers mention (like the obvious king-man+woman=queen). So I built something that doesn't just give the first answer, but ranks the matches based on distance / cosine symmetry. I polished it a bit so that others can try it out, too.For now, I only have nouns (and some proper nouns) in the dataset, and pick the most common interpretation among the homographs. Also, it's case sensitive.
godelski|9 months ago
n2d4|9 months ago
The prompt I used:
> Remember those "semantic calculators" with AI embeddings? Like "king - man + woman = queen"? Pretend you're a semantic calculator, and give me the results for the following:
mathgradthrow|9 months ago
Affric|9 months ago
Curious tool but not what I would call accurate.
gweinberg|9 months ago
thatguysaguy|9 months ago
sdeframond|9 months ago
The role of the Attention Layer in LLMs is to give each token a better embedding by accounting for context.
charlieyu1|9 months ago
virgilp|9 months ago
pjc50|9 months ago
montebicyclelo|9 months ago
Is the famous example everyone uses when talking about word vectors, but is it actually just very cherry picked?
I.e. are there a great number of other "meaningful" examples like this, or actually the majority of the time you end up with some kind of vaguely tangentially related word when adding and subtracting word vectors.
(Which seems to be what this tool is helping to illustrate, having briefly played with it, and looked at the other comments here.)
(Btw, not saying wordvecs / embeddings aren't extremely useful, just talking about this simplistic arithmetic)
loganmhb|9 months ago
E.g. in this calculator "man - king + princess = woman", which doesn't make much sense. "airplane - engine", which has a potential sensible answer of "glider", instead "= Czechoslovakia". Go figure.
jbjbjbjb|9 months ago
India - Asia + Europe = Italy
Japan - Asia + Europe = Netherlands
China - Asia + Europe = Soviet-Union
Russia - Asia + Europe = European Russia
calculation + machine = computer
groby_b|9 months ago
And, worse, most latent spaces are decidedly non-linear. And so arithmetic loses a lot of its meaning. (IIRC word2vec mostly avoided nonlinearity except for the loss function). Yes, the distance metric sort-of survives, but addition/multiplication are meaningless.
(This is also the reason choosing your embedding model is a hard-to-reverse technical decision - you can't just transform existing embeddings into a different latent space. A change means "reembed all")
Retr0id|9 months ago
actor - man + woman = actress
garden + person = gardener
rat - sewer + tree = squirrel
toe - leg + arm = digit
gregschlom|9 months ago
raddan|9 months ago
100%
bee_rider|9 months ago
spindump8930|9 months ago
Are you using word2vec for these, or embeddings from another model?
I also wanted to add some flavor since it looks like many folks in this thread haven't seen something like this - it's been known since 2013 that we can do this (but it's great to remind folks especially with all the "modern" interest in NLP).
It's also known (in some circles!) that a lot of these vector arithmetic things need some tricks to really shine. For example, excluding the words already present in the query[1]. Others in this thread seem surprised at some of the biases present - there's also a long history of work on that [2,3].
[1] https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935...
[2] https://arxiv.org/abs/1905.09866
[3] https://arxiv.org/abs/1903.03862
nxa|9 months ago
The dictionary is based on https://wordnet.princeton.edu/, no word2vec. It's just a plain lookup among precomputed embeddings (with mxbai-embed-large). And yes, I'm excluding words that are present in the query because.
It would be interesting to see how other models perform. I tried one (forgot the name) that was focused on coding, and it didn't perform nearly as well (in terms of human joy from the results).
antidnan|9 months ago
https://neal.fun/infinite-craft/
thaumasiotes|9 months ago
It provides a panel filled with slowly moving dots. Right of the panel, there are objects labeled "water", "fire", "wind", and "earth" that you can instantiate on the panel and drag around. As you drag them, the background dots, if nearby, will grow lines connecting to them. These lines are not persistent.
And that's it. Nothing ever happens, there are no interactions except for the lines that appear while you're holding the mouse down, and while there is notionally a help window listing the controls, the only controls are "select item", "delete item", and "duplicate item". There is also an "about" panel, which contains no information.
lcnPylGDnU4H9OF|9 months ago
C-x_C-f|9 months ago
ActionHank|9 months ago
jumploops|9 months ago
I built a game[0] along similar lines, inspired by infinite craft[1].
The idea is that you combine (or subtract) “elements” until you find the goal element.
I’ve had a lot of fun with it, but it often hits the same generated element. Maybe I should update it to use the second (third, etc.) choice, similar to your tool.
[0] https://alchemy.magicloops.app/
[1] https://neal.fun/infinite-craft/
lightyrs|9 months ago
bee_rider|9 months ago
> a drug (such as opium or morphine) that in moderate doses dulls the senses, relieves pain, and induces profound sleep but in excessive doses causes stupor, coma, or convulsions
https://www.merriam-webster.com/dictionary/narcotic
So we can see some element of losing time in that type of drug. I guess? Maybe I’m anthropomorphizing a bit.
grey-area|9 months ago
__MatrixMan__|9 months ago
mrastro|9 months ago
aniviacat|9 months ago
Other stuff that works: key, door, lock, smooth
Some words that result in "flintlock": violence, anger, swing, hit, impact
Retr0id|9 months ago
ttctciyf|9 months ago
Makes no sense, admittedly!
- dulcimer and - zither are both in firmly in .*gun.* territory it seems..
downboots|9 months ago
soxfox42|9 months ago
tough|9 months ago
neom|9 months ago
grey-area|9 months ago
nxa|9 months ago
Also, if it gets buried in comments, proper nouns need to be capitalized (Paris-France+Germany).
I am planning on patching up the UI based on your feedback.
GrantMoyer|9 months ago
[1]: https://github.com/GrantMoyer/word_alignment
rdlw|9 months ago
Or maybe they would all be completely inscrutable and man-woman would be like the 50th strongest result.
ale42|9 months ago
anonu|9 months ago
skeptrune|9 months ago
ericdiao|9 months ago
Can not personally find the connection here, was expecting father or something.
ericdiao|9 months ago
High dimension vector is always hard to explain. This is an example.
afandian|9 months ago
I’ve been unable to find it since. Does anyone know which site I’m thinking of?
halter73|9 months ago
clbrmbr|9 months ago
wine - beer = grape juice
beer - wine = bowling
astrology - astronomy + mathematics = arithmancy
galaxyLogic|9 months ago
That could be seen as trying to find the true "meaning" of a word.
nxa|9 months ago
behnamoh|9 months ago
Very few papers that actually say something meaningful are left unnoticed, but as soon as you say something generic like "language models can do this", it gets featured in "AI influencer" posts.
tiborsaas|9 months ago
mynameajeff|9 months ago
fallinghawks|9 months ago
(Goshawks are very intense, gyrs tend to be leisurely in flight.)
neom|9 months ago
Getting to cornbread elegantly has been challenging.
yigitkonur35|9 months ago
ignat_244639|9 months ago
But if I assume the biased answer and rearrange the operands, I get "man - criminal + black = white". Which clearly shows, how biased your embeddings are!
Funny thing, fixing biases and ways to circumvent the fixes (while keeping good UX) might be much challenging task :)
TZubiri|9 months ago
gus_massa|9 months ago
Jimmc414|9 months ago
paleolith + cat = Paleolithic Age
paleolith + dog = Paleolithic Age
paleolith - cat = neolith
paleolith - dog = hand ax
cat - dog = meow
Wonder if some of the math is off or I am not using this properly
Glyptodon|9 months ago
andrelaszlo|9 months ago
e____g|9 months ago
woman + intelligence = man (77%)
Oof.
wdutch|9 months ago
car + stupid = idiot, car + idiot = stupid
nikolay|9 months ago
nxa|9 months ago
dalmo3|9 months ago
unknown|9 months ago
[deleted]
sapphicsnail|9 months ago
karel-3d|9 months ago
man+vagina=woman (ok that is boring)
2muchcoffeeman|9 months ago
cabalamat|9 months ago
iambateman|9 months ago
fallinghawks|9 months ago
Edit: these must be capitalized to be recognized.
nxa|9 months ago
dtj1123|9 months ago
ericdiao|9 months ago
Accurate.
coolcase|9 months ago
downboots|9 months ago
hacker - code = professional golf
krishna-vakx|9 months ago
love + time = commitment
boredom + curiosity = exploration
vision + execution = innovation
resilience - fear = courage
ambition + humility = leadership
failure + reflection = learning
knowledge + application = wisdom
feedback + openness = improvement
experience - ego = mastery
idea + validation = product-market fit
matallo|9 months ago
great idea, but I find the results unamusing
HWR_14|9 months ago
havkom|9 months ago
-red
and:
red-red-red
But it did not work and did not get any response. Maybe I am stupid but should this not work?
unknown|9 months ago
[deleted]
hagen_dogs|9 months ago
blue + red = yellow (87%) -- rgb, neat
black + {red,blue,yellow,green} = white 83% -- weird
moefh|9 months ago
Blue + red is magenta. Yellow would be red + green.
None of these results make much sense to me.
MYEUHD|9 months ago
queen - woman + man = drone
bee_rider|9 months ago
Glyptodon|9 months ago
hello_computer|9 months ago
Finbel|9 months ago
firejake308|9 months ago
fph|9 months ago
cosmicgadget|9 months ago
maxcomperatore|9 months ago
darepublic|9 months ago
kylecazar|9 months ago
zerof1l|9 months ago
female + age = male
jryb|9 months ago
x3y1|9 months ago
doubtfuluser|9 months ago
Good to understand this bias before blindly applying these models (Yes- doctor is gender neutral - even women can be doctors!!)
heyitsguay|9 months ago
blobbers|9 months ago
rice + fish + raw = meat
hahaha... I JUST WANT SUSHI!
7373737373|9 months ago
G1N|9 months ago
six (84%)
Close enough I suppose
bluelightning2k|9 months ago
tlhunter|9 months ago
downboots|9 months ago
LadyCailin|9 months ago
erulabs|9 months ago
huh
atum47|9 months ago
78% male horse 72% horseman
adzm|9 months ago
this is pretty fun
growlNark|9 months ago
ainiriand|9 months ago
That's weird.
mannykannot|9 months ago
quantum_state|9 months ago
kataqatsi|9 months ago
hmm...
woodruffw|9 months ago
dmonitor|9 months ago
insane_dreamer|9 months ago
LOL
throwaway984393|9 months ago
[deleted]
ephou7|9 months ago
[deleted]
ezbie|9 months ago
mhitza|9 months ago
unknown|9 months ago
[deleted]
spinarrets|9 months ago