top | item 46442990

Show HN: I bootstrapped a podcast search engine in Rust (1 yr update)

3 points| lukaesch | 2 months ago

A year ago, I shared my journey bootstrapping Audioscrape in Rust. Back then: 500 users, SQLite, 4k LoC in main.rs, running on a $7/month VM.

Today: 25,000+ transcribed episodes, knowledge graph with AI-extracted entities, and still running lean.

What changed:

Tech evolution: SQLite → PostgreSQL (scale). Added OpenSearch for full-text + semantic search. Self-hosted WhisperX on 2 GPUs (~100 episodes/hour). OpenAI for entity extraction (people, companies, topics). Still Rust/Axum, now ~15k LoC.

New features: Speaker diarization (who said what) using voice fingerprinting. Entity pages linking mentions across episodes. Timestamp-based sharing and deep linking. MCP server for AI agents to search podcasts.

What stayed the same:

Solo developer. Bootstrapped, no VC. Rust for everything backend. Obsessive cost optimization.

Current stats:

25,000+ transcribed episodes. Top podcasts: JRE, Lex Fridman, Huberman Lab, etc. Pipeline processes 100+ episodes/hour. Still under $100/month infra (excluding GPUs).

Learnings from year one:

Rust's async ecosystem is production-ready. SQLx migrations saved me during the PostgreSQL switch. Entity extraction is harder than transcription. SEO matters more than I expected for discovery.

2025 goals:

API access for developers. Real-time transcription for live podcasts. Improved semantic search with custom embeddings.

Try it: https://www.audioscrape.com

Search is free, no account needed. Would love feedback on the search UX and what features would make this useful for your workflow.

discuss

unknown|2 months ago

[deleted]