After having spent an insane amount of time in late 2017/2018 building an HFT bot for Binance I can say this is a pretty solid article.
In our case we were doing triangle trading between BTC/ETH/USDT pairs and had our buys/sell delay down to 3-7ms. At one point moving 0.3-0.7% of Binance’s daily volume.
Few notes:
* Finding an objective point of truth for value when all of the currencies are floating is hard but vital to success. This was the hardest problem we encountered. We tried taking the realtime average of BTC and ETH across all exchanges, we tried tying it to the shortest route to USD, and several other routes... but ultimately this is where we ended up “losing” most of our alpha.
* Order books are seemingly simple but the devil is in the details. This especially matters for paper trading.
* Efficiently using API limits at exchanges is an optimization problem in and of itself.
* Our model was relatively simple but we focused on speed and edge cases. For instance Binance would rotate IPs on their load balancers and we’d constantly check the latency between each open SSL connection and use the fastest. Further we wouldn’t decode the buy response to plaintext we’d just read the raw stream.
After several epic months our entire project fell apart after a cryptic phone call about “institutional access” that didn’t follow the 1s websocket update. The access was quiet expensive and we said no to it and shortly after all of our strategies went to crap.
Best we could tell someone was front running us due to an artificial delay for our account (delay between trades went to ~20ms up from our prior steady speed of 3-7ms) and/or a bunch of the trades in the orderbook were bogus.
Frustrated we tried our strategy on another account and the delay dropped again to our normal range and was profitable again (the orderbooks were slightly different between bots!).
It was in that moment we realized playing in unregulated markets is not fun or something we wanted to continue to do. Intermediary risk was something we didn’t account for.
Further we realized that there will always been a better resourced or more dedicated team willing to fight you for your alpha.
After months of effort and a ton of fun we decided it was best we went back and focused on a problem where we could build a long term competitive advantage.
Even in regulated markets there are problems. I have a friend who built an HFT algorithm that he was using for trading stocks. He had some really good results with early testing. But at some point he was confused why many of his buy/sell orders weren't being executed despite being open for several minutes (hours?) his algo would make bids far away from the current spread, anticipating movement. He finally concluded that some institutional traders must have access to sub-penny ordering, despite it being against regulation.
I didn't believe him at first, since, the more likely problem was elsewhere, but then a few months later sub-penny trading in dark pools was all over the news. This was like 5 or 6 years ago. He's since moved on to other things having come to similar conclusions, that trying to play such a rigged game was futile.
Really interesting, thanks!
Could you share bit more about the 'institutional access' call? Does it mean that selected few have access to real-time book updates and all the rest is @100ms delayed? So much of level playing field :) Is it common for asian crypto exchanges?
> the orderbooks were slightly different between bots
That sounds like a big deal. If this is repeatable, you should document it better. Unregulated doesn't mean a license to do blatantly illegal things. Crypto exchanges certainly get taken to court.
Sure it wasnt your book building algo and a snapshot retrieval race?
Ran an ML model years ago had a number of great months then out of no where no trade I or the ML would make, would work. Looked like someone was front running my orders and messing with my trades. Weird delays, trades would take to long to go through all, and all sorts of odd events on Level 2. Ended up shutting it down took a good 2 months before my manual trades started going through at a normal rate again.
This is why the whole $0 trading fee and robinhood concern me. I'm paying for the trades and someone is still messing with me.
What's the value of HFT? If exchanges were required to add a random delay to very trade to work against high frequency traders, would anything of value be lost?
When You do this kind of (HFT) trading, how does ones tax returns look like? E.g. I assume you still have to report every trade with its gain/loss to a tax authority?
> It was in that moment we realized playing in unregulated markets is not fun or something we wanted to continue to do. Intermediary risk was something we didn’t account for.
Indeed. This is the really difficult thing about the crypto space: winnings, if you can keep them. And you can't if the house is just going to front-run all your orders.
"we realized that there will always been a better resourced or more dedicated team willing to fight you for your alpha"
- that's part of what makes financial markets such a fun and interesting challenge but agreed, intermediary risk at the timescales you were operating at in this kind of unregulated market is real and not fun
I'd love to hear what problem you moved to that you believe you can build a long term competitive advantage on if you can talk about it?
> Best we could tell someone was front running us due to an artificial delay for our account (delay between trades went to ~20ms up from our prior steady speed of 3-7ms) and/or a bunch of the trades in the orderbook were bogus.
I wouldn't be surprised, given that traditional HFT companies are building cryptotrading desks and they have a lot more capital to play with too.
Thanks for sharing. Do you think building the strategy on another exchange such as Coinbase Pro or pursuing a strategy that wasn't as latency-sensitive might've yielded more success?
I spent the last year working fulltime on a system similar to the one described here.
I trade the top ~20 cryptos on binance. I use deep learning models (combination of temporal, causal convnets and RNNs) with heavy data augmentation. I built my own tooling for data collection, training, backtesting and live deployment. Having a data engineer background coming into this was hugely helpful: most of my time was spent manipulating data in some way (and not playing around with the models).
One of the most demanding parts was estimating spread/slippage costs and including it into the loss function.
Most of what the author talked about, I learned the hard way.
I'm now at the point where I ran some tests (trading small amounts) live on binance and the results are positive: I do manage to make small profits, but more importantly, the recorded live trades reflect very closely the backtest trades (for a given period). I'm currently scaling up my model and adding better monitoring / reporting / CI.
I'd be happy to chat with anyone having done similar projects or willing to exchange ideas.
I'd love to hear more about what kind of data augmentation you're doing. A friend of mine recently got a GAN to work for timeseries which is really interesting.
I've done a lot of work in the space and would love to chat - just emailed you :)
This post is an exemplar of the crucial relationship between domain-specific knowledge and ML competency in the ML space. The bulk of the post is detailing the tricky ins and outs of trading, and overall the author gives the impression that they're broadly knowledgeable about stock markets.
Contrast this post with those you see with ML hobbyists who delve into medicine or fake-news and produce useless results testament to their lack of domain-specific competency.
Bingo. I've made this comment several times on Hacker News in the past as well and in my opinion it's the number one reason I've seen ML projects fail to have impact at companies it's deployed: the operators (typically higher level Math/CS types) simply don't understand the domains well enough and so frequently end up making absurd recommendations/suggestions (often to the detriment of other business areas).
The successful application of ML requires a deep understanding of the domain it's being applied in.
Domain knowledge is essential to almost any project that aims for eventual commercial success, it is quite rare than an outsider will come into a field, apply some ML and make a killing.
Funnily enough I think the ML hobbyist problem is most pervasive in the "predict the stock market" domain. There was a post on HN a few days ago [1] that was overfitting the validation set and hand-waving away fees and spreads. The author concluded that "there was no subtle underlying pattern" because they failed to find one.
Great post! Very refreshing to hear about a) the honest level of effort involved in this type of endeavor, b) the amount of nonsense trading advice out there.
Maybe in a future post you could discuss the security and banking side of this in more detail? In the 6ish years I’ve played around with crypto trading (and I really mean play, nothing close to your level), I’ve had 2 exchanges hacked and lose all customer funds, another 2 had major security breaches causing days of downtime but recovered, and one site seized by the FBI.
Then there are the horror stories of banks freezing your account when you move funds in and out of exchanges. Luckily That hasn’t happened to me.
I bet you have some good stories and perspective on that side of it, I would love to hear it.
Author here. Honestly, I don't have a good answer. I spread my capital across enough exchanges so that if one runs away with it gets hacked it doesn't ruin me. It's just a risk I'm taking.
I'm also not trading much capital. Because the system is more on the HFT side, the actively traded capital isn't that high, and I don't care about losing it. Any profit I try to get out of the exchanges regularly. I wouldn't feel comfortable leaving large sums on those exchanges.
The banking side is becoming more mature, I think, as many exchanges like Coinbase provide custodial cold-storage options for institutional clients.
Counter-party risk always exist.
> Then there are the horror stories of banks freezing your account when you move funds in and out of exchanges.
Depends on the country. What happened to me is that a bank did not freeze my account. Instead, they simply reported it to the government, and asked AML questions regarding the transfer. The government, on the other hand, wanted me to provide bookkeeping records. Otherwise, they were going to assume that every transfer coming back from cryptocurrency exchange was pure profit.
Basically, I was not raided, my accounts were not frozen, but the government knows my wallet addresses (and I had to pay back 4 years worth of cryptocurrency trading profits with interest applied, which also left me realize how little I had made profit in the end).
I’m impressed with this system, but I’m even more impressed with the author’s writing style. I’d love to see more technical posts written with this level of clarity.
Claiming a 4000% return while staying market neutral seems a little too good to be true.
First: those levels are insanely high, so the algo must be taking some absurd risks and have the worst sharpe ratio, or getting pretty close to being 100% accurate.
Second: if you can scale this across markets, and assuming the same return, that investment will turn into 12 billions in 4 years. I doubt that you'd write a blog post about it if you had found such a gold mine.
Naively using linear scaling on financial models provides zero guidance to how the model would actually perform... Scaling financial models is an extremely hard problem.
See RenTech limiting the size of their Medallion fund because it was getting too large to scale....
Is't it so that many of these strategies don't scale well? When you are in low volume trading you are collecting all the best trades but as soon as you go 10x you are affecting way too much
I would love to know how this fared recently in the large sell-off.
What he says about some markets possibly being predicable rings true to me. But the article was far from convincing that the BTC market is actually predicable.
The natural assumption should be that the author was in the right place at the right time. Although he went through great lengths, I'm not convinced this is anything other than luck.
Author here. Let's assume you cannot got short (which you can't in most crypto markets) and can only make money when the market goes up. The best you can do is avoid losing money when the market goes down.
But there are no pure market downturns. On a daily scale the market may be down, but that does not mean that on a millisecond or second-scale you will only see downward movement. There is just an overall downtrend, but there is still almost the same upward movement to make money. For HFT systems, it really doesn't matter if the market, on a daily scale, goes up or down. There is no difference.
In fact, on many markets the system does better in downward trends. Probably because there is more liquidity on that side of the book, a bias that may come from certain market participants.
Through hypothesis testing you can estimate the probability that this was due to luck is very low. Assuming that a monkey would have a 50% chance of profiting on a day, the chance of going a month without a losing day is less than 1 in a billion.
IANAquant, but he said in the intro that he used a market neutral strategy, so he _should_ make money both when the broader market is going up or down.
It would be interesting to know though!
> The biggest edge probably comes from the effort put into building the infrastructure.
I feel like this should be in bold, but either way, I love reading that in these posts. In every way, from research to confirm your models are correct, to be able to trust real time trades, you need a solid architecture. This thought isn't only for trading remember, where it's the same in tons of solutions to problems. If comment readers have other examples, I'd love to hear them in responses.
I've played with writing bots before and this post hits on so many of the edge cases I personally ran into. I have never heard it this well explained before. Brilliant.
It's interesting they suggest the higher the timeframe, the noisier the time series, when to my understanding the opposite is typically found -- the lower timeframes exhibit a more random walk and the higher timeframes exhibit trending behavior.
Can someone comment on how taxes are handled when automated trades are made like this? It's something that seems wholly absent from the cost calculations.
It is a big loss of your time and nobody will give you back that one. I suggest you to use your time in non zero-sum games, something that can create value for you and society. Now that you have some saving you can definitely afford it.
The next best thing of not doing it is to quit doing it now.
Disclaimer: I built a similar system in the past, took some gains and then realised the above. I then quitted to build a company.
It must be said that it is a lot easier to make money in a stock market that has had low volatility and no significant, prolonged dip in the last 5+ years. My own long-term investments have earned 20% return over the last 5 years with zero trading. I realize that this article is specifically about crypto, but trends in all markets is generally up across the board.
As someone who just finished a BSc in computer science and started a MSc in Financial Technology and Computing this post is really interesting to me, keep ‘em coming :D!
Author here. Perhaps if you already have existing HFT infrastructure and connections to efficiently trade on such exchanges. But such infra costs millions. If you don't have this, you're probably at too large of a disadvantage to find any alpha.
At least that's my understanding based on conversations I've had, I've never traded equities.
Crypto is a relatively inefficient market. Equity markets are too efficient: there are too many smart players and alpha is very hard to find. Crypto is a little easier, though that is changing fast.
Low trading fees is what originally caused me to change gears from etf programming trading to btc programming trading in the early days. I imagine it is the same for most.
[+] [-] nickreese|6 years ago|reply
In our case we were doing triangle trading between BTC/ETH/USDT pairs and had our buys/sell delay down to 3-7ms. At one point moving 0.3-0.7% of Binance’s daily volume.
Few notes:
* Finding an objective point of truth for value when all of the currencies are floating is hard but vital to success. This was the hardest problem we encountered. We tried taking the realtime average of BTC and ETH across all exchanges, we tried tying it to the shortest route to USD, and several other routes... but ultimately this is where we ended up “losing” most of our alpha.
* Order books are seemingly simple but the devil is in the details. This especially matters for paper trading.
* Efficiently using API limits at exchanges is an optimization problem in and of itself.
* Our model was relatively simple but we focused on speed and edge cases. For instance Binance would rotate IPs on their load balancers and we’d constantly check the latency between each open SSL connection and use the fastest. Further we wouldn’t decode the buy response to plaintext we’d just read the raw stream.
After several epic months our entire project fell apart after a cryptic phone call about “institutional access” that didn’t follow the 1s websocket update. The access was quiet expensive and we said no to it and shortly after all of our strategies went to crap.
Best we could tell someone was front running us due to an artificial delay for our account (delay between trades went to ~20ms up from our prior steady speed of 3-7ms) and/or a bunch of the trades in the orderbook were bogus.
Frustrated we tried our strategy on another account and the delay dropped again to our normal range and was profitable again (the orderbooks were slightly different between bots!).
It was in that moment we realized playing in unregulated markets is not fun or something we wanted to continue to do. Intermediary risk was something we didn’t account for.
Further we realized that there will always been a better resourced or more dedicated team willing to fight you for your alpha.
After months of effort and a ton of fun we decided it was best we went back and focused on a problem where we could build a long term competitive advantage.
Edit: typos and formatting
[+] [-] milesvp|6 years ago|reply
I didn't believe him at first, since, the more likely problem was elsewhere, but then a few months later sub-penny trading in dark pools was all over the news. This was like 5 or 6 years ago. He's since moved on to other things having come to similar conclusions, that trying to play such a rigged game was futile.
[+] [-] tardis_thad|6 years ago|reply
[+] [-] carlsborg|6 years ago|reply
That sounds like a big deal. If this is repeatable, you should document it better. Unregulated doesn't mean a license to do blatantly illegal things. Crypto exchanges certainly get taken to court.
Sure it wasnt your book building algo and a snapshot retrieval race?
[+] [-] megaframe|6 years ago|reply
This is why the whole $0 trading fee and robinhood concern me. I'm paying for the trades and someone is still messing with me.
[+] [-] criddell|6 years ago|reply
[+] [-] rixrax|6 years ago|reply
[+] [-] pjc50|6 years ago|reply
Indeed. This is the really difficult thing about the crypto space: winnings, if you can keep them. And you can't if the house is just going to front-run all your orders.
[+] [-] alexcnwy|6 years ago|reply
"we realized that there will always been a better resourced or more dedicated team willing to fight you for your alpha" - that's part of what makes financial markets such a fun and interesting challenge but agreed, intermediary risk at the timescales you were operating at in this kind of unregulated market is real and not fun
I'd love to hear what problem you moved to that you believe you can build a long term competitive advantage on if you can talk about it?
[+] [-] bhl|6 years ago|reply
I wouldn't be surprised, given that traditional HFT companies are building cryptotrading desks and they have a lot more capital to play with too.
[+] [-] throwaway_e4WNi|6 years ago|reply
As somebody who still runs a profitable bot on Binance I find this hard to believe.
Also all order book related endpoints/streams are public, so queries/subscriptions are not tied to a specific account.
[+] [-] kami8845|6 years ago|reply
[+] [-] keyle|6 years ago|reply
[+] [-] pinouchon|6 years ago|reply
Most of what the author talked about, I learned the hard way.
I'm now at the point where I ran some tests (trading small amounts) live on binance and the results are positive: I do manage to make small profits, but more importantly, the recorded live trades reflect very closely the backtest trades (for a given period). I'm currently scaling up my model and adding better monitoring / reporting / CI.
I'd be happy to chat with anyone having done similar projects or willing to exchange ideas.
[+] [-] alexcnwy|6 years ago|reply
I've done a lot of work in the space and would love to chat - just emailed you :)
[+] [-] Traster|6 years ago|reply
[+] [-] thundergolfer|6 years ago|reply
Contrast this post with those you see with ML hobbyists who delve into medicine or fake-news and produce useless results testament to their lack of domain-specific competency.
[+] [-] nsainsbury|6 years ago|reply
The successful application of ML requires a deep understanding of the domain it's being applied in.
[+] [-] jacquesm|6 years ago|reply
[+] [-] alexcnwy|6 years ago|reply
Funnily enough I think the ML hobbyist problem is most pervasive in the "predict the stock market" domain. There was a post on HN a few days ago [1] that was overfitting the validation set and hand-waving away fees and spreads. The author concluded that "there was no subtle underlying pattern" because they failed to find one.
[1] https://news.ycombinator.com/item?id=21624907
[+] [-] gricardo99|6 years ago|reply
Maybe in a future post you could discuss the security and banking side of this in more detail? In the 6ish years I’ve played around with crypto trading (and I really mean play, nothing close to your level), I’ve had 2 exchanges hacked and lose all customer funds, another 2 had major security breaches causing days of downtime but recovered, and one site seized by the FBI.
Then there are the horror stories of banks freezing your account when you move funds in and out of exchanges. Luckily That hasn’t happened to me.
I bet you have some good stories and perspective on that side of it, I would love to hear it.
[+] [-] traK6Dcm|6 years ago|reply
I'm also not trading much capital. Because the system is more on the HFT side, the actively traded capital isn't that high, and I don't care about losing it. Any profit I try to get out of the exchanges regularly. I wouldn't feel comfortable leaving large sums on those exchanges.
[+] [-] nov272019|6 years ago|reply
Counter-party risk always exist.
> Then there are the horror stories of banks freezing your account when you move funds in and out of exchanges.
Depends on the country. What happened to me is that a bank did not freeze my account. Instead, they simply reported it to the government, and asked AML questions regarding the transfer. The government, on the other hand, wanted me to provide bookkeeping records. Otherwise, they were going to assume that every transfer coming back from cryptocurrency exchange was pure profit.
Basically, I was not raided, my accounts were not frozen, but the government knows my wallet addresses (and I had to pay back 4 years worth of cryptocurrency trading profits with interest applied, which also left me realize how little I had made profit in the end).
[+] [-] mellosouls|6 years ago|reply
https://towardsdatascience.com/what-happened-when-i-tried-ma...
Discussed here: https://news.ycombinator.com/item?id=21624907
[+] [-] mtm7|6 years ago|reply
[+] [-] dnautics|6 years ago|reply
Interestingly Benoit Mandelbrot talks about this in "the (mis)behaviour of markets" and explicitly calls it "market time"
[+] [-] d--b|6 years ago|reply
Claiming a 4000% return while staying market neutral seems a little too good to be true.
First: those levels are insanely high, so the algo must be taking some absurd risks and have the worst sharpe ratio, or getting pretty close to being 100% accurate.
Second: if you can scale this across markets, and assuming the same return, that investment will turn into 12 billions in 4 years. I doubt that you'd write a blog post about it if you had found such a gold mine.
[+] [-] crazypyro|6 years ago|reply
See RenTech limiting the size of their Medallion fund because it was getting too large to scale....
[+] [-] kungito|6 years ago|reply
[+] [-] onlyrealcuzzo|6 years ago|reply
I would love to know how this fared recently in the large sell-off.
What he says about some markets possibly being predicable rings true to me. But the article was far from convincing that the BTC market is actually predicable.
The natural assumption should be that the author was in the right place at the right time. Although he went through great lengths, I'm not convinced this is anything other than luck.
[+] [-] traK6Dcm|6 years ago|reply
But there are no pure market downturns. On a daily scale the market may be down, but that does not mean that on a millisecond or second-scale you will only see downward movement. There is just an overall downtrend, but there is still almost the same upward movement to make money. For HFT systems, it really doesn't matter if the market, on a daily scale, goes up or down. There is no difference.
In fact, on many markets the system does better in downward trends. Probably because there is more liquidity on that side of the book, a bias that may come from certain market participants.
[+] [-] Akababa|6 years ago|reply
[+] [-] dtjohnnyb|6 years ago|reply
[+] [-] semiotagonal|6 years ago|reply
[+] [-] jackschultz|6 years ago|reply
I feel like this should be in bold, but either way, I love reading that in these posts. In every way, from research to confirm your models are correct, to be able to trust real time trades, you need a solid architecture. This thought isn't only for trading remember, where it's the same in tons of solutions to problems. If comment readers have other examples, I'd love to hear them in responses.
[+] [-] latchkey|6 years ago|reply
[+] [-] mthoms|6 years ago|reply
[+] [-] adamiscool8|6 years ago|reply
[+] [-] fny|6 years ago|reply
[+] [-] lorepieri|6 years ago|reply
Disclaimer: I built a similar system in the past, took some gains and then realised the above. I then quitted to build a company.
[+] [-] jugg1es|6 years ago|reply
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] tatoalo|6 years ago|reply
[+] [-] KloudTrader|6 years ago|reply
[+] [-] echelon|6 years ago|reply
[+] [-] traK6Dcm|6 years ago|reply
At least that's my understanding based on conversations I've had, I've never traded equities.
[+] [-] smabie|6 years ago|reply
[+] [-] proverbialbunny|6 years ago|reply