Start mining - FREE 100MM tweet db
I have been collecting tweets for 4 days now, using a app that I haven coding for the last 5 months.
The reason why I did this app was because I wanted to make user-based recommendations, and other types of data mining using Mahout, and I didnt find enough data for my experiments.
About the app I am using a single 8GB-ram Centos Server hosted on the Rackspace cloud with a cost of less than 15 dollars per day. It can process up to 100 (90 - 105) twitter profiles per second. It works with a average of 2GB of ram and 90% CPU. Its completely fault tolerant. It can process other social networks as well using a simple parse-template.
I was able to collect 90+ million tweets from more than 6 million -- the db has 20MM users -- users using JAVA, memcache, mysql, php (visualization), a non ACID architecture, using a object-like structure (no-sql?).
I hope this datasets helps you get into the big data world.
The current sql dump is too big (66GB) to put in one of my servers so please skypeme:calufaxp or email me calufa{a}gmail.com if you want the data. BTW, the data is FREE...
If anyone has a server where I can upload this sql and let others download it let me know.
[+] [-] calufa|15 years ago|reply
[+] [-] sidmitra|15 years ago|reply
[+] [-] calufa|15 years ago|reply
[+] [-] calufa|15 years ago|reply
-- 350+MM rows total --
[+] [-] cstrouse|15 years ago|reply
[+] [-] calufa|15 years ago|reply
[+] [-] jparicka|15 years ago|reply
[+] [-] calufa|15 years ago|reply
[+] [-] fhsdfh|15 years ago|reply
[+] [-] uptown|15 years ago|reply
[+] [-] JoachimSchipper|15 years ago|reply
I don't want to be mean, but this doesn't strike me as a very good idea.
[+] [-] MrMcDowall|15 years ago|reply
http://discovertext.com/osamabinladen.aspx
[+] [-] mikelbring|15 years ago|reply
[+] [-] calufa|15 years ago|reply