top | item 35644972

(no title)

jayalammar | 2 years ago

There's a lot you can do with the vectors themselves without needing to embed any more text (e.g., clustering, exploration, visualization after dimensionality reduction...etc). Here's a previous embeddings exploration of top HN posts: https://txt.cohere.com/combing-for-insight-in-10-000-hacker-... A lot of that code can be used here as well.

If you want to query for a search term, you can use a trial API key which is free to use for prototyping. The embedding model itself is not open source, though. [co-author of the post here]

discuss

order

minimaxir|2 years ago

If that's the intent, IMO the release dataset should have more metadata (e.g. paragraph heading, article taxonomy)

jayalammar|2 years ago

How would you add that data? As new columns you mean? Or add the paragraph headings to the text of the paragraphs before embedding them?