(no title)
livando | 7 years ago
I disagree on this point. Starting with a single threaded model allowed my team to scale quickly and with little additional overhead. What we have lost with performance we gained in simplicity and developer productivity. That being said tuning and porting portions of the app to a multi-threaded system is slotted to take place within the next year.
Start with single threaded and simple, move to multi-threaded scrapers when the juice is worth the squeeze.
pdimitar|7 years ago
I've done several very amateur scrapers in the last several years, I am never going back to languages with a global interpreter lock, ever.
iooi|7 years ago
Now that I think about it, it's even less than 4 lines:
from multiprocess.pool import Pool (or ThreadPool)
pool = Pool()
pool.map(scrape, urls)
detaro|7 years ago