top | item 16568701

Ask HN: Who's buying all the big data?

15 points| orlp | 8 years ago | reply

With every service, platform, software package, god knows what collecting data on you, I can't help but ask: who's buying all this data?

It's clear in the case of massive players such as Facebook and Google that don't have to sell their data. They can just keep their data and directly offer targeted marketing to advertising.

But what about smaller players, that don't have a strong 'big data' presence and product? Examples such as:

  - online only NVIDIA drivers  
  - anti-virus software  
  - chat apps such as Skype, Discord, WhatsApp, etc  
  - online sites such as Github, StackExchange, etc  
  - <shady mobile app #1402>

All of the above collect data about you in some form of another, it's a staple nowadays. Who's buying all this data? For how much?

7 comments

[+] dsacco|8 years ago|reply

Mostly hedge funds and market research companies. Less frequently, companies looking to bootstrap computer vision or natural language processing systems without sourcing their own data.

If you productize the data as a forecast, it can cost between mid four figures to low six figures per quarter per client, depending on the data and its signal. Raw data is not particularly lucrative or useful, but developing a predictive forecast with a very low margin of error based on an exclusive source of data is both.

Smaller companies who find themselves with marketable data tend to partner with well-networked financial research companies such as 7Park. Sometimes they sell directly to clients if they have the network for it. In my experience, sophisticated quantitative hedge funds mostly internalize their data sourcing initiatives instead of purchasing from external vendors. For example, Renaissance Technologies and Two Sigma both have massive data sourcing and analysis operations internally, off the top of my head. They very rarely engage with outside data vendors (particularly the former).

All of this falls under what is usually termed, "alternative data." There are basically two classes of data vendors. First, you have Foursquare, Meraki (prior to its acquisition), Airmail, etc. Basically, any company which provides a free service for things like geolocation, inbox decluttering or financial account/statement aggregation is most likely reselling your data. For example, Yodlee resells customer receipt data. These companies source their data from a public facing product with a sufficiently strong moat to make the data exclusive.

The second class of vendor curates data from a wide variety of sources and has no public product from which to source their own data. For example, SecondMeasure provides sophisticated analysis of the kind of receipt data from Yodlee. YipitData is very active in this work (though they're not actually very good at it). Most of the very good companies doing this work are pretty under the radar - they tend not to be VC-funded or well-publicized. This class is more easily replicated as an internal initiative in a hedge fund, because all of the data is technically public, just very hard to find and analyze.

[+] WingH|8 years ago|reply

Why is YipitData not very good?

[+] muzani|8 years ago|reply

I did a grocery/food app. Grocery stores buy massive amounts of product data. They need to know what to put on the shelves, what to put on what shelf, what not to put on the shelf, which products are trending. Since they make very low margins, this is one of the main differentiators in income. A single store also has cash flow in the millions, so they're willing to spend a lot.

Another major use is that it builds it as a wall around the product users. It helps to recognize user patterns, build a better product, and keep them from going to similar products, e.g. Netflix.

I think a lot of people look at "big data" as some kind of gold mine, but really it's just a rock mine. You have to know which rocks are in demand by whom before you start mining all of it.

[+] matt_the_bass|8 years ago|reply

> I think a lot of people look at "big data" as some kind of gold mine, but really it's just a rock mine. You have to know which rocks are in demand by whom before you start mining all of it.

I think this hits the nail on the head. Knowing how to extract value is important. Not just the raw data.

[+] trekking101|8 years ago|reply

One person's metadata is another's data! You (may be unintentionally) characterise it as a market where you check boxes on "datasets" to buy, but it's far from that. Monetizing by licensing/selling some kind of log data isn't really interesting--but understanding what signal can be derived and how it can be applied to an opportunity (arbitrage, value added analysis, financial product/insurance etc.... is where it gets interesting.

I've written a bit about derivative uses of data if you are interested: https://post-employment.com/category/business/