top | item 30164480

(no title)

ikinsey | 4 years ago

I've been developing web crawlers for the better half of a decade. They are used for various purposes such as cataloging sentiment/bias in news media, finding new tv shows to watch, or mapping out the Tor hidden service directory.

Currently, I am writing a web crawler application framework in golang. Always looking for help or new ideas on what to crawl next!

Emails welcome, check my profile.

discuss

order

vivegi|4 years ago

I had posted this on Ask HN (https://news.ycombinator.com/item?id=30096235#30096410) a few days ago.

Your crawler perhaps could be customized to crawl and publish an index of all available Progressive Web Apps. A naive way would be to check for sites that have a PWA App manifest file in their root folder.

Let me know if you are interested in collaborating.

ikinsey|4 years ago

This looks very possible. It would only require a two modules for analysis and frontier management. It would be great to collaborate on something like this!

fsflover|4 years ago

What's your opinion about YaCy: https://yacy.net?

ikinsey|4 years ago

YaCy is a great tool! Haven't used it all that extensively since 2012. Very good for setting up simple crawls with minimal configuration or for crawling intranets.

datavirtue|4 years ago

Tips for information on getting into web crawling?