Ask HN: Who's buying all the big data?
It's clear in the case of massive players such as Facebook and Google that don't have to sell their data. They can just keep their data and directly offer targeted marketing to advertising.
But what about smaller players, that don't have a strong 'big data' presence and product? Examples such as:
- online only NVIDIA drivers
- anti-virus software
- chat apps such as Skype, Discord, WhatsApp, etc
- online sites such as Github, StackExchange, etc
- <shady mobile app #1402>
All of the above collect data about you in some form of another, it's a staple nowadays. Who's buying all this data? For how much?
[+] [-] dsacco|8 years ago|reply
If you productize the data as a forecast, it can cost between mid four figures to low six figures per quarter per client, depending on the data and its signal. Raw data is not particularly lucrative or useful, but developing a predictive forecast with a very low margin of error based on an exclusive source of data is both.
Smaller companies who find themselves with marketable data tend to partner with well-networked financial research companies such as 7Park. Sometimes they sell directly to clients if they have the network for it. In my experience, sophisticated quantitative hedge funds mostly internalize their data sourcing initiatives instead of purchasing from external vendors. For example, Renaissance Technologies and Two Sigma both have massive data sourcing and analysis operations internally, off the top of my head. They very rarely engage with outside data vendors (particularly the former).
All of this falls under what is usually termed, "alternative data." There are basically two classes of data vendors. First, you have Foursquare, Meraki (prior to its acquisition), Airmail, etc. Basically, any company which provides a free service for things like geolocation, inbox decluttering or financial account/statement aggregation is most likely reselling your data. For example, Yodlee resells customer receipt data. These companies source their data from a public facing product with a sufficiently strong moat to make the data exclusive.
The second class of vendor curates data from a wide variety of sources and has no public product from which to source their own data. For example, SecondMeasure provides sophisticated analysis of the kind of receipt data from Yodlee. YipitData is very active in this work (though they're not actually very good at it). Most of the very good companies doing this work are pretty under the radar - they tend not to be VC-funded or well-publicized. This class is more easily replicated as an internal initiative in a hedge fund, because all of the data is technically public, just very hard to find and analyze.
[+] [-] WingH|8 years ago|reply
[+] [-] muzani|8 years ago|reply
Another major use is that it builds it as a wall around the product users. It helps to recognize user patterns, build a better product, and keep them from going to similar products, e.g. Netflix.
I think a lot of people look at "big data" as some kind of gold mine, but really it's just a rock mine. You have to know which rocks are in demand by whom before you start mining all of it.
[+] [-] matt_the_bass|8 years ago|reply
I think this hits the nail on the head. Knowing how to extract value is important. Not just the raw data.
[+] [-] trekking101|8 years ago|reply
I've written a bit about derivative uses of data if you are interested: https://post-employment.com/category/business/