A big surprise for me was to find the explainability cluster quite far from the causality one. But I guess it stems from a cultural facet: causality is mainly the purview of statistics (with Pearl at the helm) with a strong medical sciences focus; when explainability is more of a reaction to algorithm used in the industry (trees, GLMs, neural networks, etc.); which you deploy for performance, and only then you care about knowing the why.
This is cool! Always find these visualization helpful but it does get quite big sometimes.
[shameless plug]
I created a personalized newsletter for Arxiv (which I plan to expand to others) so you can receive the latest research papers on the topics. You can also filter by some keywords too (e.g. only give me LLM or RAG related papers).
I created something like this for my data engineering class project. It was a temporal visualization of citation networks. It was fun to see different domains like computer vision and nlp be seemingly separate but then as time went on become pretty coupled with each other
OP is also the author of the popular dimensionality reduction algorithm UMAP.
I guess the pipeline was embedding documents with an LLM (or even plain old word2vec average over the abstract might do it), and then reducing that to 2 dimensions with a cosine similarity metric with the help of UMAP.
I have no idea about colors and local cluster naming though. Maybe that's handcrafted.
BenoitP|1 year ago
A big surprise for me was to find the explainability cluster quite far from the causality one. But I guess it stems from a cultural facet: causality is mainly the purview of statistics (with Pearl at the helm) with a strong medical sciences focus; when explainability is more of a reaction to algorithm used in the industry (trees, GLMs, neural networks, etc.); which you deploy for performance, and only then you care about knowing the why.
pyromaker|1 year ago
[shameless plug]
I created a personalized newsletter for Arxiv (which I plan to expand to others) so you can receive the latest research papers on the topics. You can also filter by some keywords too (e.g. only give me LLM or RAG related papers).
https://app.scholars.io
lbeckman314|1 year ago
https://github.com/TutteInstitute/datamapplot
esafak|1 year ago
rdedev|1 year ago
fleischhauf|1 year ago
HanClinto|1 year ago
BenoitP|1 year ago
I guess the pipeline was embedding documents with an LLM (or even plain old word2vec average over the abstract might do it), and then reducing that to 2 dimensions with a cosine similarity metric with the help of UMAP.
I have no idea about colors and local cluster naming though. Maybe that's handcrafted.
dkural|1 year ago
unknown|1 year ago
[deleted]