Show HN: Self-host Reddit – 2.38B posts, works offline, yours forever
286 points| 19-84 | 1 month ago |github.com
The key point: This doesn't touch Reddit's servers. Ever. Download the Pushshift dataset, run my tool locally, get a fully browsable archive. Works on an air-gapped machine. Works on a Raspberry Pi serving your LAN. Works on a USB drive you hand to someone.
What it does: Takes compressed data dumps from Reddit (.zst), Voat (SQL), and Ruqqus (.7z) and generates static HTML. No JavaScript, no external requests, no tracking. Open index.html and browse. Want search? Run the optional Docker stack with PostgreSQL – still entirely on your machine.
API & AI Integration: Full REST API with 30+ endpoints – posts, comments, users, subreddits, full-text search, aggregations. Also ships with an MCP server (29 tools) so you can query your archive directly from AI tools.
Self-hosting options: - USB drive / local folder (just open the HTML files) - Home server on your LAN - Tor hidden service (2 commands, no port forwarding needed) - VPS with HTTPS - GitHub Pages for small archives
Why this matters: Once you have the data, you own it. No API keys, no rate limits, no ToS changes can take it away.
Scale: Tens of millions of posts per instance. PostgreSQL backend keeps memory constant regardless of dataset size. For the full 2.38B post dataset, run multiple instances by topic.
How I built it: Python, PostgreSQL, Jinja2 templates, Docker. Used Claude Code throughout as an experiment in AI-assisted development. Learned that the workflow is "trust but verify" – it accelerates the boring parts but you still own the architecture.
Live demo: https://online-archives.github.io/redd-archiver-example/
GitHub: https://github.com/19-84/redd-archiver (Public Domain)
Pushshift torrent: https://academictorrents.com/details/1614740ac8c94505e4ecb9d...
Aurornis|1 month ago
What I'd really like is a plugin that automatically pulls from archives somewhere and replaces deleted comments and those bot-overwritten comments with the original context.
Reddit is becoming maddening to use because half the old links I click have comments overwritten with garbage out of protest for something. Ironically the original content is available in these archives (which are used for AI training) but now missing for actual users like me just trying to figure out how someone fixed their printer driver 2 years ago.
anonymous908213|1 month ago
accrual|1 month ago
I read "it's maddening because ... they decided to use their autonomy and..." and I stop there. So be it.
Gander5739|1 month ago
NickNaraghi|1 month ago
19-84|1 month ago
reddit: https://github.com/19-84/redd-archiver/blob/main/tools/subre...
voat: https://github.com/19-84/redd-archiver/blob/main/tools/subve...
ruqqus: https://github.com/19-84/redd-archiver/blob/main/tools/guild...
diggings|1 month ago
You've probably come across this already but there are alternative archives to PushShift that may have differing sets of posts and comments (perhaps depending on removal request coverage?)
One is Arctic Shift: https://github.com/ArthurHeitmann/arctic_shift/releases
Another is PullPush: https://pullpush.io/
m463|1 month ago
sort of like forking a project.
19-84|1 month ago
registry readme: https://github.com/19-84/redd-archiver/blob/main/docs/REGIST...
register instances: https://github.com/19-84/redd-archiver/blob/main/.github/ISS...
feconroses|1 month ago
19-84|1 month ago
https://github.com/ArthurHeitmann/arctic_shift/releases
Arctic Shift https://academictorrents.com/browse.php?search=RaiderBDev
Watchful1 https://academictorrents.com/browse.php?search=Watchful1
alcroito|1 month ago
There's no `.env.example` file to copy from. And even if the env vars are set manually, there are issues with the mentioned volumes not existing locally.
Seems like this needs more polish.
19-84|1 month ago
https://github.com/19-84/redd-archiver/commit/0bb103952195ae...
the docs have been updated with mkdir steps
https://github.com/19-84/redd-archiver/commit/c3754ea3a0238f...
elSidCampeador|1 month ago
19-84|1 month ago
twobitshifter|1 month ago
19-84|1 month ago
nick007x|1 month ago
https://huggingface.co/datasets/nick007x/pushshift-reddit
It’s handy for grabbing individual months or subreddit slices without needing to pull the full torrent. Might be useful for smaller-scale archiving or testing.
nick007x|1 month ago
[deleted]
bkovacev|1 month ago
19-84|1 month ago
data catalog readme: https://github.com/19-84/redd-archiver/blob/main/tools/READM...
reddit data: https://github.com/19-84/redd-archiver/blob/main/tools/subre...
tourist2d|1 month ago
[deleted]
justsomehnguy|1 month ago
EDIT: Is there any cheap way to search? I have MS TechNet archive which is useless without search, so I realky want to know a way to have a cheap local search w/o grepping everyting.
19-84|1 month ago
vivzkestrel|1 month ago
- details probably include the 400 million youtube accounts, channel id, name, creator url, etc
19-84|1 month ago
blks|1 month ago
unknown|1 month ago
[deleted]
blks|1 month ago
dvngnt_|1 month ago
drob518|1 month ago
tetrisgm|1 month ago
inquirerGeneral|1 month ago
[deleted]
Jordan-117|1 month ago
[deleted]
19-84|1 month ago
apstls|1 month ago
devilsdata|1 month ago
metaPushkin|1 month ago
diggyhole|1 month ago
kylehotchkiss|1 month ago
layer8|1 month ago
19-84|1 month ago
syngrog66|1 month ago
nullandvoid|1 month ago
It's an open forum - similar to here, whatever I post I it's in the public forum and therefore I expect it to be used / remixed however anyone wants.
devilsdata|1 month ago
antisthenes|1 month ago