top | item 44965333

(no title)

graphviz | 6 months ago

What do people learn from visualizations like this?

What is the most important problem anyone has solved this way?

Speaking as somewhat of a co-defendant.

discuss

Not everything has to be directly informative or solve a problem. Sometimes data visualization can look pretty for pretty's sake.

Dimensionality reduction/clustering like this may be less useful for identifying trends in token embeddings, but for other types of embeddings it's extremely useful.

diwank|6 months ago

Agreed. The fact that it has any structure at all is fascinating (and super pretty). Could signal at interesting internal structures. I would love to see a version for Qwen-3 and Mistral too!

I wonder if being trained on significant amounts of synthetic data gave it any unique characteristics.

jablongo|6 months ago

I lets you inspect what actually constitutes a given cluster, for example it seems like the outer clusters are variations of individual words and their direct translations, rather than synonyms (the ones I saw at least).

TuringNYC|6 months ago

> What do people learn from visualizations like this?

Applying the embeddings model to some dataset of yours of interest, and then a similar visualization, is where it gets cool because you can visually look at clusters and draw conclusions about the closeness of items in your own dataset

ethan_smith|6 months ago

Embedding visualizations have helped identify bias in word embeddings (Word2Vec), debug entity resolution systems, and optimize document retrieval by revealing semantic clusters that inform better indexing strategies.

graphviz|6 months ago

Interesting, glad to know it's been useful for some specific contributions. (Not questioning that interesting-looking, appealing displays as overviews for general awareness are also worthwhile.)