(no title)
romanfll | 2 months ago
This approach ("Sine Landmark Reduction") uses linearised trilateration—similar to GPS positioning—against a synthetic "sine skeleton" of landmarks.
The main trade-offs:
It is O(N) and deterministic (solves Ax=b instead of iterative gradient descent).
It forces the topology onto a loop structure, so it is less accurate than UMAP for complex manifolds (like Swiss Rolls), but it guarantees a clean layout for user interfaces.
It can project ~9k points (50 dims) to 3D in about 2 seconds on a laptop CPU. Python implementation and math details are in the post. Happy to answer questions!
lmeyerov|2 months ago
We see a lot of wide social, log, and cyber data where this works, anywhere from 5-200 dim. Our bio users are trickier, as we can have 1K+ dimensions pretty fast. We find success there too, and mostly get into preconditioning tricks for those.
At the same time, I'm increasingly thinking of learning neural embeddings in general for these instead of traditional clustering algorithms. As scales go up, the performance argument here goes up too.
abhgh|2 months ago
I have a couple of questions for now: (1) I am confused by your last sentence. It seems you're saying embeddings are a substitute for clustering. My understanding is that you usually apply a clustering algorithm over embeddings - good embeddings just ensure that the grouping produced by the clustering algo "makes sense".
(2) Have you tried PaCMAP? I found it to produce high quality and quick results when I tried it. Haven't tried it in a while though - and I vaguely remember that it won't install properly on my machine (a Mac) the last time I had reached out for it. Their group has some new stuff coming out too (on the linked page).
[1] https://github.com/YingfanWang/PaCMAP
romanfll|2 months ago
threeducks|2 months ago
romanfll|2 months ago
donkeybeer|2 months ago
jdhwosnhw|2 months ago
yorwba|2 months ago
yxhuvud|2 months ago
leecarraher|2 months ago
aoeusnth1|2 months ago
romanfll|2 months ago