tyropita's comments

tyropita | 2 years ago | on: Tree of Thoughts

Documentation looks really neat and in-depth, always appreciated. Looks like you’re missing a .gitignore file. Folders like __pycache__ don’t need to be checked in.

tyropita | 3 years ago | on: The world needs a non-profit search engine

After a certain scale I think you can let clients do double-work and let the most common crawl data, among different clients, win.

And since you control what URLs need to be crawled, you protect yourself against rogue clients sending arbitrary URLs.

There certainly are a lot of elegant ways to reduce spam for this particular problem imo.

tyropita | 3 years ago | on: The world needs a non-profit search engine

Quite a neat way to crawl websites using a browser extension. That by itself is a form of donation to the search engine. Maybe in the future you can have dedicated software for self-hosted clients that users can run to crawl and index websites for mwmbl? Kinda like folding@home.

How are the batches of URLs to be crawled generated/discovered and posted at your API?

How do you deal with duplicate crawls?

page 1