(no title)
mmmooo
|
11 years ago
So ~650M daily active users..4PB of data warehouse created each day, that means ~7MB of new data on each active user per day. Given that its data warehouse, I'm going to guess its not images, seems like a lot to me. I guess it shouldn't surprise anyone that every interaction on and off the site, is heavily tracked.
nbm|11 years ago
A lot of that data is also not tied to individuals either - for example the access logs for the CDN (which, being on a different domain by design, does not share cookies so is not attached to an account) even reasonably heavily sampled is probably tens of gigabytes a day, and is rolled up into efficient forms for queries in various ways. A lot of it isn't even about requests coming through the web site/API - it may just be internal inter-service request information, or inter-datacenter flow analysis, or per-machine service metrics ("Oh, look, process A on machines B through E went from 2GB resident to 24GB in 30 seconds a few seconds before the problem manifested").
(Not that it makes too much of a difference at this scale, but it is closer to 860M daily actives.)
unknown|11 years ago
[deleted]
srcmap|11 years ago
I wonder if they can predict with some percentage accuracy on what any particular active US user might vote for today base on the user's graph data?