top | item 46260371

Show HN: Scrape websites into queryable Gemini RAG knowledge bases

1 points| yoloshii | 2 months ago |apify.com

Simple Apify actor that scrapes websites and indexes them in Google's new Gemini File Search API (launched Nov 6).

The workflow: Scrape → Clean content → Upload to Gemini → Get permanent queryable knowledge base with automatic citations.

Technical approach: - Intelligent scraper selection from 5 Apify-native scrapers - Automatic content cleaning (strips nav/ads/fluff) - Uploads to Gemini File Search (persistent storage) - Per-page PPE pricing ($0.02 start, $0.0015/page)

Potential use cases: - Turn documentation into AI chatbots - Make company wikis naturally searchable - Build RAG apps without managing vector databases

Basically streamline your Gemini file search RAG ingestion with an Apify scraper run AIO.

You can also employ the actor programmatically with agents using Apify's actor mcp.

Built this over a weekend for the Apify 1M Actor Challenge. It's my first Apify actor, so curious to hear if the pricing makes sense.

Note* There is a banned websites to scrape list filter due to the constraints of the challenge. This will be lifted after the challenge ends (Jan 31, 2026).

discuss

order

No comments yet.