(no title)
mnmkng
|
1 year ago
Crawlee isn’t any less configurable than Scrapy. It just uses different, in my personal opinion more approachable, patterns. It makes it easier to start with, but you can tweak whatever you want. Btw, you can add middleware in Crawlee Router.
mdaniel|1 year ago
Oh, then I have obviously overlooked how one would be able to determined if a proxy has been blocked and evict it from the pool <https://github.com/rejoiceinhope/scrapy-proxy-pool/blob/b833...> . Or how to use an HTTP cache independent of the "browser" cache (e.g. to allow short-circuiting the actual request if I can prove it is not stale for my needs, which enables recrawls to fix logic bugs or even downloading the actual request-response payloads for making better tests) https://docs.scrapy.org/en/2.11/topics/downloader-middleware...
Unless you meant what I said about "pip install -e && echo glhf" in which case, yes, it's a "simple matter of programming" into a framework that was not designed to be extended
8organicbits|1 year ago