top | item 11948690

Numerai, a hedge fund built by a community of anonymous data scientists

178 points| joeykrug | 9 years ago |medium.com | reply

81 comments

order
[+] dzdt|9 years ago|reply
Some thoughts on the business model:

* traditional hedge funds have a problem with scaling: if you put more money in the same strategy returns go down. Numerai hopes to scale the number of strategies it employs by scaling the number of researchers participating.

* by providing researchers only opaque streams of data, they prevent researchers from leaving and competing directly. If you don't know how the data corresponds to the market, you can't replicate the trading at another fund. (Some big hedge funds like D.E.Shaw do the same!)

* researchers may still leave and compete indirectly, using the same algorithms on different market features. But by paying anonymously in bitcoin, Numerai may be hoping for the reverse, that programmers from other quant funds will anonymously moonlight for Numerai using their algorithms from those other funds.

* by being opaque with the data, Numerai keeps researchers from knowing the true value their strategies are providing. That information asymmetry is in Numerai's favor, letting them underpay even strong performers.

[+] lordnacho|9 years ago|reply
Quant fund insider here.

The data is pretty pure, in the sense of not telling you any metadata at all. It's literally just a bunch of numbers and 0/1 labels.

It's hard to implement a strategy without knowing what exactly you're looking at. I get the feeling this "pure dataset" is part of some framework that Numerai thinks will beat the market, given good predictors.

That's not necessarily the case. Say I assume the 0/1 means up/down over some period. Well, being able to guess 0/1 correctly would obviously help. Say I'm right 70% of the time, then I can equal weight my bets and it will be just swell. But say I'm right about 51% of the time. Then it's going to take quite a while longer for the law of large numbers to work in my favour. Remember your ML algo will only be able to give you good predictions if some of the 21 features are actually meaningful, and we have no reason to think they are actually meaningful.

Now, let's say I have some domain knowledge in finance. I want to predict over/underachievement relatively. I would be able to guess which shares go up relative to others, but not the market factor. That would require a different framework to the one I'm supposing is presented here. Is there flexibility for that?

The secrecy thing makes me wonder, too. If it's just a matter of not showing your work, why don't you just have a website where people submit their daily/weekly/monthly portfolios and you keep track of the tally?

[+] valdiorn|9 years ago|reply
> Say I'm right 70% of the time, then I can equal weight my bets and it will be just swell. But say I'm right about 51% of the time. Then it's going to take quite a while longer for the law of large numbers to work in my favour.

That's actually very far from being true. If you trade a single instrument, sure, the variance will kill you in anything but the very long run. But if you trade thousands of securities (like say, the entire US equity market), then a 55% prediction ratio and a market neutral strategy will absolutely crush. Even if you blindly buy/sell on every signal without doing any sort of weighing (excluding low confidence predictions, etc), then you should see a several sigma strategy.

It only takes a very, very small edge to make a very low risk strategy if you can diversify.

https://en.wikipedia.org/wiki/Signal_averaging

Now add on top of that the fact they will have several low SNR prediction signals, and the effects of signal averaging become even greater

I'm also a "quant fund insider", as you put it...

[+] mikkom|9 years ago|reply
> I get the feeling this "pure dataset" is part of some framework that Numerai thinks will beat the market, given good predictors.

> That's not necessarily the case.

This is the part that I find most interesting. They have a hypothesis and they are testing it with real money.

They are even outsourcing computational power which I think is very interesting as running ml fund with thousands of algos would probably be quite hard to scale.

[+] gtrubetskoy|9 years ago|reply
I'm skeptical. There are skyscrapers in NYC's, Londons, Singapores and Hong Kong's of this world filled with people who are smart and have enormous computer resources and funds and are paid handsomely to work on solving this problem with all manners of ML and AI at their disposal, the "crowd" has no advantage over them. The "closed system" is much larger than the "crowd" in this case.
[+] mikkom|9 years ago|reply
Unless there are people in the "crowd" who work in machine learning / data mining in totally different sector (let's say genomics/biostatistics as this is the example in the article) but have no access to the hedge fund world.

The "crowd" could very well have a long list of very intelligent people who are "experts" in some other sector and who have fresh insights and want to get some anonymous extra income on top of their salary.

This said, I think their compensation seems really low.

[+] karmacondon|9 years ago|reply
I don't think this is true at all. 10,000 people are just going to have more ideas and better individual ideas than 100 experts. The impact of that much creativity and perspective can be exponential, and it's hard to duplicate.

When I'm designing a system, I hate to have to try to out think everyone on the internet. If you have a known set of opponents you can predict what they might do. When you're up against anybody from anywhere, you never know what you're going to get. Global scale collaboration is a very powerful thing because it allows a complete exploration of the solution space, and it's difficult to stop.

[+] joncooper|9 years ago|reply
"It is intuitively obvious that an open access hedge fund will generate more intelligence than a closed system built on a pre-internet, pre-cryptocurrency, pre-AI organizational design."

Really? Because the folks with the magic black box aren't capable of funding an Interactive Brokers account to keep 100% of their upside and 100% of their IP?

(Also: risk management and order handling are harder problems than signal generation.)

[+] arcanus|9 years ago|reply
Isn't that the magic of OSS? Linux::Windows, matplotlib::mathematica, android::iPhone, etc. In each case, the free variety quickly catches up to the proprietary version, and in doing so, cuts into the profitability of the parent. Furthermore, this often breaks down monopolies, as they must innovate or die.
[+] fpgaminer|9 years ago|reply
I started poking at this out of curiosity, and a desire to begin sharpening my TensorFlow axe, and one thing remains unclear. They give you two spreadsheets, one being the training data and the other is the tournament data (what you need to predict on). Each entry in the spreadsheet is 21 features and a single binary class. The latter is what you predict. But for the submissions they request a probability, not a class. They don't explain what "probability" here means. Does it mean probability of class 0? Probability of class 1? Probability of the moon exploding on a Thursday?

Overall interesting idea. Undecided whether it's real/scam/fake, but definitely very interesting at face value. I just wish their documentation was more clear. Seems kind of important...

EDIT: Found a comment on Reddit that indicates that it means probability of class 1 (https://www.reddit.com/r/MachineLearning/comments/3wdr9e/num...)

[+] TrickedOut|9 years ago|reply
Have you found any good TensorFlow examples which handle financial or time series data like this? Please do share! Most of the examples I find are either image processing or text processing. Rarely time series or traditional DB type data.
[+] cryptokoala|9 years ago|reply
Numerai comes across as fraudulently abusing cryptographic buzzwords like homomorphic encryption https://medium.com/@Numerai/encrypted-data-for-efficient-mar...
[+] alexmingoia|9 years ago|reply
Numerai uses order-preserving encryption, which is homomorphic. The algos are trained on the ciphertext itself.
[+] bberenberg|9 years ago|reply
Understanding which features to create and why is significantly more impactful than just trying new models on the same dataset.
[+] Xcelerate|9 years ago|reply
> Numerai was seed funded by Howard L. Morgan the co-founder of Renaissance Technologies.

Very interesting. This gives this idea some legitimacy in my opinion.

[+] s_q_b|9 years ago|reply
Very much agreed.

For those who are not aware, Renaissance Technologies is a massively successful hedge fund that makes investment decisions solely from data, with perhaps the most sophisticated mathematical models in the marketplace.

Their approach was entirely novel when James Simons founded the firm. Simons is incredible mathematician, graduated MIT in his teens, and obtained his doctorate at 23. Before and during Renaissance, he made significant contributions to cryptology, topology, and string theory.

His firm essentially invented quantitative trading. To this day, with close to $30 Billion under management, Renaissance Technologies still makes investment decisions purely algorithmically.

[+] osullivj|9 years ago|reply
I see they've got Packard from Prediction Company as well. Bass, Thomas A., The Predictors, 1999, gives a good account of Packard & Doyne Farmer's work on market prediction.
[+] powera|9 years ago|reply
This is where the "accredited investor" warnings are appropriate. Don't do this with your money if you aren't willing and able to lose it!

In the long run, it's impossible for people to beat the market simply by looking at historic stock prices. Impossible. If it is possible in the short run, more and more people will do it until they don't make any money at all, or a "black swan" event occurs and they go completely bankrupt. (I suppose there's a third option, that they all make so much money that the entire rest of the world goes bankrupt, but that's absurd)

So be careful!

[+] hault|9 years ago|reply
As Peter Thiel says, great startup founders are those which can see the future in ways in which others can't. This idea certainly looks like the future to me. Very interested to see where this goes.
[+] rgbrgb|9 years ago|reply
So cool. This is the first time I've heard of homomorphic encryption. In my case, Open Listings has a lot of real estate data that we're not allowed to vend programmatically (sale prices, list prices, property characteristics). It would be interesting to be able to release this data in a legally encrypted form and let data scientists train predictors. We currently have an offer creation API that's being used by algorithmic investors but they have to get their data to decide what to bid on from another source. My immediate questions maybe someone here will know the answer to...

1) Is it legal to vend a dataset that is encrypted this way if you're not allowed to vend the original? The OP implies that it is, but that seems too good to be true.

2) Is there software purpose-built for this type of thing? What's good in this domain? Our stack is mostly ruby but we're polyglots.

[+] modeless|9 years ago|reply
So in exchange for giving a hedge fund a stock tip that earns 20% in a month, the guy gets $10k? That sounds like a ripoff to me! If you have the skills to do that repeatedly you can make a whole lot more than $10k doing the trades yourself.
[+] onion2k|9 years ago|reply
The point is that you probably can't do it repeatedly, or predict with any confidence that you can even do it once. Numerai enables you to bet using someone else's money, with a vastly reduced reward, but no risk to you.

Numerai won this time (hence the PR piece) but I don't think we should judge their performance on one action in isolation. We should judge whether their approach works based on a year or two of trading on these predictions. Maybe longer, if reacting to unusual events (economic collapse, freak speculation on tulips, etc) is something you care about.

[+] agorabinary|9 years ago|reply
Well if you have $50k capital then 20% gains = $10k is about fair. Luckily for numerai they have $1mil capital so the rewards are a bit higher...
[+] davnn|9 years ago|reply
Interesting idea. Not that crowd driven investment algorithms are new, but I have not seen a machine learning one before.

What really ennoys me about this kind of businesses is that they pay tiny prices and shut the competitions once they have found what they were looking for, however Numerai might be completely different in that regard and I wish them the best!

Btw: The article kind of conveys the feeling as if machine learning is something new to the hedge fund business and that's absolutely not the case. There are already smart people working on really complex algorithms since a couple of years now.

[+] dharmon|9 years ago|reply
Even more than "a couple of years". About 15 years ago I worked at a day trading firm and we were writing models that used machine learning. At the time we thought of it more as "computational statistics", but its basically what is called ML now and taught in ML courses (although we didn't use Neural Nets).

BTW, even in 2001 we were far from the first to do this.

[+] richard_craib|9 years ago|reply
Every other machine learning competition I've seen is kind of one-off in nature. But stock market data is being generated all the time so I think there will always be new strategies to learn on Numerai.
[+] shostack|9 years ago|reply
Can you clarify on the "pay tiny prices and shut the competitions once they have found what they were looking for" comment? Has this happened before? Is there anything here that makes it seem like this wouldn't be the case here?
[+] thedlade|9 years ago|reply
Exactly, machine learning has already been applied in a number of Wall Street firms for some time now
[+] ianpurton|9 years ago|reply
When you’re standing at the beginning of a super exponential curve, that’s the time to buy insurance against any negative outcomes along that curve. So today, we’re allowing users to donate Bitcoin to the Machine Intelligence Research Institute (MIRI) as a hedge against things going horribly right.

If you're the kind of person that falls for this kind if thing, then you should know I'm also standing in front of a super exponential curve raised to the power of infinity and beyond. You can also send me bitcoin as a hedge if you wish.

[+] dharma1|9 years ago|reply
I've been looking at this a few times. Its like a giant ensemble. But I'm not sure ML will be able to beat chance on average on a data source like this.

And if someone discovers they are making money consistently on numerai, I think they would set up their own fund quite quickly.

I do like the encrypted system though, could be used for other ML competitions where you don't want to give your model away

[+] HappyTypist|9 years ago|reply
I know why they're paying out Bitcoin and keeping everything anonymous. They are hoping quant hedge fund insiders submit their model to the site.
[+] abcampbell|9 years ago|reply
But why did the machine want to go long Salmar ASA?
[+] alexmingoia|9 years ago|reply
The same reason any trader wants to go long Salmar ASA: Their model says so. Except with numerai, the model is a machine-learning algorithm instead of a human with their bias and intuition.
[+] brycehidysmith|9 years ago|reply
Does it matter? The machine saw a pattern, and it responded to the pattern. We don't need to know.