top | item 16845685

(no title)

dsacco | 7 years ago

> Could somebody explain why so much effort is being put into quant strategies, when it seems that real-world information gathering would be a much easier way to gain an edge over others?

I used to be part of a research group that sold the so-called "alternative data" you're describing to 30 or so hedge funds in the NYC area, including several of the largest. The example I like to give is that we knew well ahead of time that Tesla would miss on the Model 3 because we knew every vehicle they were selling by model, year, configuration, date and price with <99% accuracy. I still occasionally sell forecasts like this and the methodology is straightforward enough that even a solo investor can consistently beat the market if they know how to source the data. But I've mostly lost faith in this technique as the sole differentiator of a fund's alpha.

Some funds, like Two Sigma, have large divisions with a very sophisticated pipeline for this kind of analysis. They do exactly what you describe. For the most part it works, but there are several obstacles that keep this from being the holy grail of successful trading:

1. First and foremost, this analysis is fundamentally incomplete. You are not forecasting market movements, you're forecasting singular features of market movements. What I mean by that is that you aren't predicting the future state of a price; if the price of a security is a vector representing many dimensions of inputs, you're predicting one dimension. As a simple example, if I know precisely how many vehicles Tesla has sold, I don't know how the market will react to this information, which means I have some nontrivial amount of error to account for.

2. This analysis doesn't generalize well. If I have a bunch of information about the number of cars in Walmart parking lots, the number of vehicles sold by Tesla (with configurations), the number of online orders sold by Chipotle, etc. how should I design a data ingestion and processing pipeline to deal with all of this in a unified way? In other words, my analysis is dependent upon the kind of data I'm looking at, and I'll be doing a lot of different munging to get what I need. Each new hypothesis will require a lot of manual effort. This is fundamentally antagonistic to classification, automation and risk management.

3. It's slow. Under this paradigm you're coming up with hypotheses and seeking out unique and exclusive data to test those hypotheses. That means you're missing a lot of unknown unknowns and increasing the likelihood of finding things that other funds will also be able to find pretty easily. You are only likely to develop strategies which can have somewhat straightforward and intuitive explanations for their relationship with the data.

This is not to say the system doesn't work - it very clearly works. But it's also easy to hit relatively low capacity constraints, and it's imperfect for the reasons I've outlined. You might think exclusive data gives you an edge, but for the most part it does not (except for relatively short horizons). It's actually extremely difficult to have data which no other market participant has, and information diffusion happens very quickly. Ironically, in one of the very few times my colleagues and I had truly exclusive data (Tesla), the market did not react in a way that could be predicted by our analysis.

The most successful quantitative hedge funds focus on the math, because most data has a relatively short half-life for secrecy. They don't rely on the exclusivity of the data, they rely on superior methods for efficiently classifying and processing truly staggering amounts of it. They hire people who are extraordinarily talented at the fundamentals of mathematics and computer science because they mostly don't need or want people to come up with unique hypotheses for new trading strategies. They look to hire people who can scale up their research infrastructure even more, so that hypothesis testing and generation is automated almost entirely.

This is why I've said before that the easiest way to be hired by RenTech, DE Shaw, etc. is to be on the verge of re-discovering and publishing one of their trade secrets. People like Simons never really cared about how unique or informative any particular dataset is. They cared about how many diverse sets of data they could get and how efficiently they could find useful correlations between them. The more seemingly disconnected and inexplicable, the better.

Now with all of that said, I would still wholeheartedly recommend this paradigm for anyone with technical ability who wants to beat the market on $10 million or less (as a solo investor). A single creative and competent software engineer can reproduce much of this strategy for equities with only one or two revenue streams. You can pour into earnings positions for which your forecast predicts an outcome significantly at odds with the analyst consensus. You can also use your data to forecast volatility on a per-equity basis and sell options on those which do not indicate much volatility in the near term. Both of these are competitive for holding times ranging from days to months and, with the exception of some very real risk management complexity, do not require a large investment in research infrastructure.

discuss

Bromskloss|7 years ago

> The example I like to give is that we knew well ahead of time that Tesla would miss on the Model 3 because we knew every vehicle they were selling by model, year, configuration, date and price with <99% accuracy.

Is the way in which you got that information something you can divulge? I mean, was it talking to an employee or was it something exciting and far fetched? By the way, I presume you meant ">99%" or something similar.

> A single creative and competent software engineer can reproduce much of this strategy

By "this strategy", do you mean prediction based on a source of "alternative data"?

Interesting comment, in any case.

downandout|7 years ago

Wow! Thanks for the detailed answer. You introduced a lot of issues I hadn’t thought of, and your last paragraph gave me some ideas.

Also...generally speaking, what does this type of information sell to hedge funds for? For something like the Tesla information for example? I would assume it's probably not millions, but somewhere in the 5-6 figures?

wumpus|7 years ago

Good example: Tesla had a miss on Model 3 production for Q1, yet the stock rose significantly. And the miss was predicted by both the fan vin tracker and Bloomberg's vin tracker.

I used to work for D. E. Shaw & Co., now I work in Silicon Valley and invest my money in index funds. Much better that way.

thisisit|7 years ago

Any tips/starting point for the uninitiated?