(no title)
haxton | 2 years ago
You almost certainly want a graph like structure (overlapping communities rather than clusters).
But unsupervised clustering was almost entirely ineffective for every use case I had :/
haxton | 2 years ago
You almost certainly want a graph like structure (overlapping communities rather than clusters).
But unsupervised clustering was almost entirely ineffective for every use case I had :/
simonw|2 years ago
I mainly like it as another example of the kind of things you can use embeddings for.
My implementation is very naive - it's just this:
I imagine there are all kinds of improvements that could be made to this kind of thing.I'd love to understand if there's a good way to automatically pick an interesting number of clusters, as opposed to picking a number at the start.
https://github.com/simonw/llm-cluster/blob/main/llm_cluster....
FreakLegion|2 years ago
stefanka|2 years ago
Alternatively, there is a Bayesian GMM in sklearn. When you restrict it to diagonal Covariance matrices, you should be fine in high dimensions
nl|2 years ago
haxton|2 years ago
visarga|2 years ago