(no title)
lmcinnes | 5 years ago
Pretrained-CNN --> UMAP --> HDBSCAN
can turn out relatively reasonable results, especially if the UMAP you use for the clustering is to more than 2 or 3 dimensions (often 5 to 20 is good, depending on the data). You can, of course, still use a 2D UMAP to visualize the results. If you want such a pipeline packaged up then consider the PixPlot package, designed for exactly this use case, from the Yale Digital Humanities Lab: https://github.com/YaleDHLab/pix-plot
* Disclaimer: I am highly biased, as an author of both HDBSCAN and UMAP implementations.
No comments yet.