(no title)
mozillas | 7 years ago
I also remember about a side project that involves by crawling HN and displaying the most mentioned books in the comments. I think the author was making a bit of money by having a link to Amazon for each book(and using an affiliate code). You could do something similar but for popular Wikipedia articles for example. Or you could use Reddit as a source and instead of searching for popular books to search for sneakers or music.
Of course, I don't know if these ideas are too simple or too complicated, interesting or not for you and the person who will give you a grade.
maceurt|7 years ago
On a related note a lot of different comments seem to be mentioning scraping some sort of data from a website on a continual basis. Would I just create a script that is attached to an extra worker that would send its data to the actual database that would in turn be read by the web server? Or would I want to just have the web server itself get the data and write to the database?
mozillas|7 years ago
That's what I did, but I might have had different requirements. If you don't have a lot to crawl and you don't have to do it very often(once a week or less), you can probably space out the requests enough so that the server doesn't feel it. It helps a lot if you use some caching as well for the website itself in this case. I think it depends a lot on the requirements of the project. But using two machines is safer I think, although it might complicate things a bit.
Keep in mind that there's probably better technical advice out there than mine. I'm a hobbyist developer.