top | item 41826839

Show HN: A dataset of all HN submission texts (2006-2024) in Markdown

1 points| shutty | 1 year ago |huggingface.co

We're at nixiesearch.ai building a yet another search over HN, but we found no public datasets of the actual submission texts available - so we scraped one!

TLDR: 2.1M texts, around 55% of all stories still available online.

discuss

order

No comments yet.