top | item 22480260

An Update from Robinhood’s Founders

212 points| Beowolve | 6 years ago |blog.robinhood.com

277 comments

order
[+] czbond|6 years ago|reply
On the profession side of this, if you're an engineer at RH in the thick of this - many have been there. It seems dire now, but in a few years the fog, panic, and haze of no sleep will become a story you tell your peers at happy hour.

Many will cast stones - but they have been there too. If they haven't, well maybe their day will also come. You may feel bad at the moment - but the best way professionally forward is "We try our best tomorrow"

[+] cheschire|6 years ago|reply
If this were an outage directly caused by a natural disaster, I could understand. This outage was an availability problem. This clearly points to some prioritization problems within the leadership layers if robust and resilient infrastructure was not emphasized.

The prioritization problems may not be due to ignorance or malice though, and may be justifiable if there are other fires that are burning brighter. It's still pointing to problems though, and I think it's completely legitimate for engineers to question the stability of the company when this sort of thing happens.

At the very least as an engineer I would be asking some pointed questions of my leadership. Maybe not dusting off the resume yet, but still I'd want to get reassurance from internally that the leadership problems that caused this are being addressed.

[+] bertil|6 years ago|reply
Important step though: have a retro, many maybe and write a report explaining what was messed up and how you might mitigate in the future. It looks like it’s going to be a good one. If you can share a sanitised version publicly, that would hopefully make it all a little bit more worth it.

I think I speak for everyone here if I say that, if that report is public and interesting, everyone on this thread will be happy to get you a drink.

[+] vinaypai|6 years ago|reply
This is all true for a company that is actually pushing any boundaries as opposed to failing pathetically at a well solved problem.
[+] RayVR|6 years ago|reply
Having worked as a professional investor since 2012, I can say these outages can happen anywhere. I've seen day long outages at exchanges where tens or hundreds of billions of dollars would have been trading, at brokers where who knows how much would have traded. I've also experienced these outages at retail companies that are more established, including TD Ameritrade (I become a customer when ThinkOrSwim was acquired.) I have also seen brokers screw over individuals on a significant scale without real ramifications.

The fact that Robinhood is telling people anything about the outage is only because they are the company they are, operating in the startup world/mentaity.

To the people thinking they should be compensated in some way...If you are doing >$1m daily volume, maybe you can contact them to see what they can do but even then, I doubt it. The way this should be handled is to have multiple executing brokers. You can implement offsetting positions if needed and transfer positions when your main account becomes available, if you are using a broker that can clear. Right now it seems Robinhood is working to implement clearing but you could still go to neutral or put on your positions.

[+] Itsdijital|6 years ago|reply
I have mixed feelings of sympathy about this whole RH thing.

Anyone who has used RH regularly should be well aware of how inept it is. Any spikes in volume or volatility, even on a single stock, bring it to it's knees pretty often. Like not just the last week, but even during calm periods. I've personally lost 20-30% on positions solely because RH was bugging out, thankfully I use RH just for "fun trades" usually <$100.

I cannot fathom having the balls to trade any real amount of money on the platform while being aware of these long term issues.

On the flipside I feel for new users and perhaps even generally inactive users who weren't aware of RH's incredible flakiness. I'd imagine (or hope to) the losses of most of those users were small, assuming they were new or casual and just testing the waters.

Even if one of my small plays hit it big on RH, the money would just go to my main account on TD (which has been smooth all week shy of a few hiccups Fri morning during record volume). It's been obvious for a long time that RH should not and cannot be trusted. If you're trading options with a $60K account on RH, well, I don't even have words for that level of ignorance.

[+] stef25|6 years ago|reply
I abandoned Coinbase after having difficulties getting a few 1000 bucks out of there. It worked out in the end.

Problems with my data I can tolerate up to a point. Problems with my money I absolutely can not tolerate. As you said, it's unfathomable how people can trade money on a platform that's flaky.

[+] 0x8BADF00D|6 years ago|reply
It’s another example of why DevOps has become a buzzword and most teams just pay lip service to it.
[+] jennyyang|6 years ago|reply
I know quite a few people that were personally affected by this and lost money due to the two outages and they are all pulling their money from Robinhood. The fact that they can't offer any compensation might be a big problem for them, since they already have zero trading fees, which is what most brokerages offer as compensation.

Personally it doesn't pass the smell test for me. The load was much higher the previous week and load problems go away once the load disappears. They probably had a lot less load the rest of the day, so the fact they were down the entire day suggests it was something else. I would need a fully transparent post mortem before I believed anything they said.

[+] solidasparagus|6 years ago|reply
Failures due to high load can take a while to resolve - you often need to fix the broken infrastructure, process the backlog, and catch up to live.
[+] afc|6 years ago|reply
Load problems don't go away when the load disappears. If the system isn't engineered very carefully (this takes a lot of work!), you may have cascading failures that may take hours to resolve, especially if you have bad retry policies (their mention of thundering herd problem seems to indicate that they might).

We wrote a bit about this here: https://landing.google.com/sre/sre-book/chapters/addressing-...

I would strongly caution anyone who thinks this subject is trivial, just add a bit of load shedding and you're done. I wrote a bit about my team's work (including a simplified view of some of the considerations that go into how we do retries) here: https://landing.google.com/sre/sre-book/chapters/handling-ov...

[+] rpdillon|6 years ago|reply
I'm not sure it's fair to assume that service gets automatically restored when load dissipates after failures due to high load.
[+] driverdan|6 years ago|reply
This isn't something new, downtime is the norm for Robinhood. Anyone trusting them with more than play money is foolish.
[+] balls187|6 years ago|reply
How did they lose money?
[+] rolltiide|6 years ago|reply
> The fact that they can't offer any compensation might be a big problem for them, since they already have zero trading fees

Robinhood makes the most money than any known firm on Wall Street by getting paid specifically to leak user's trades to other traders.

SEC requires a periodic report on that which shows compensation.

Can't believe people are still buying Robinhood's pitch of misdirection.

[+] RestlessMind|6 years ago|reply
This is such an empty update. At the very least, they should have published a detailed postmortem or committed to one by a certain date. How are we supposed to know that they have learned their lessons?
[+] harikb|6 years ago|reply
I don’t work for them, but I am pretty sure we can blame the litigious nature of this industry for the lack of detail in the postmortem. Not everyone can afford to be cloudflare :)

Even for Cloudflare, I thought the company will get sued out of existence after the proxy data leak, but finance industry/SEC etc is a completely different ballgame.

[+] dilly_li|6 years ago|reply
Start from the email notification. They have been asking themselves the easy questions.

Just look at the top questions in their email:

* Are the funds in my account safe? Yes, your funds are safe.

* Was my personal information affected? No, your personal information was not affected.

* Can I use my Robinhood debit card? Yes. If you have a debit card, you should have been—and should still be able to—use your card, but you may have had issues receiving notifications, viewing your balance, and seeing transactions in your app.

------------

The real question is: How is Robinhood compensating for the missed trades?

Stop asking yourself the easy questions, RH.

[+] jsf01|6 years ago|reply
I’d be interested to read a deep technical post-mortem like those which have become fairly standard among other big tech companies. Hoping Robinhood does the right thing here.
[+] 0xy|6 years ago|reply
Still silence on the traders who lost tens of thousands of dollars? Are they going to be compensating or not?

This blog post doesn't appear to say anything. It's not an apology, it's not an explanation, it doesn't say what they're going to do in response.

This is after the incident in which there was no status updates or support availability for multiple hours of time. Why can't they commit to updates every hour or every 30 minutes?

[+] SkyPuncher|6 years ago|reply
I'm having a really hard time understanding this argument.

Unless I have an SLA with a provider outlining penalties, they don't owe me anything if they go down. How is this any different?

[+] crystaldev|6 years ago|reply
> This blog post doesn't appear to say anything. It's not an apology, it's not an explanation, it doesn't say what they're going to do in response.

On the advice of any good lawyer.

[+] ska|6 years ago|reply
I agree the level of feedback isn't great, but what would people be compensated for? Did they misplace actual orders?
[+] aloknnikhil|6 years ago|reply
Genuine question: With no commission trading at places like Schwab and eTrade, is it even worth trading on Robinhood? For as far as I could remember (about 2 years ago), Robinhood has always failed to scale.
[+] mjs33|6 years ago|reply
Their DNS system failed? How?! Unless DNS stands for “Do Not Sell”
[+] tempsy|6 years ago|reply
Sad that there isn’t an actual apology anywhere to be found in the letter at all.

And now with the fed rate cut the interest on cash is only 1.3%, with more cuts expected later in the year, which was the last big differentiator. I don’t see how they don’t see massive net withdrawals going forward.

[+] CamelCaseName|6 years ago|reply
> And now with the fed rate cut the interest on cash is only 1.3%, with more cuts expected later in the year, which was the last big differentiator. I don’t see how they don’t see massive net withdrawals going forward.

This isn't really an issue because the fed rate cut impacts everyone. Other institutions will cut their interest rates as well. I know of a few banks (Canadian) that have already lowered their GIC rates.

If anything, this is actually good for RH. Now instead of comparing 1.8% at RH and 1% at another Financial Institution, you're comparing 1.3% and 0.5% -- a much bigger multiple.

[+] GaryNumanVevo|6 years ago|reply
Yeah because if they're culpable then they can be sued via class-action
[+] xyst|6 years ago|reply
The boys down in the salt mines of WSB will want a blood sacrifice.

Founders should be fired. CTO/CIO should be replaced.

[+] vinaypai|6 years ago|reply
Historic... Unprecedented... Thundering herd, a bunch of excuses to explain why they couldn't handle the volume that most real brokerages handle every second.
[+] alishan-l|6 years ago|reply
I heard it was related to the leap year. Apparently they had downtime 4 years ago as well.
[+] ablekh|6 years ago|reply
I'm curious about your thoughts on why a technical infrastructure, which, by nature of being cloud-native, is supposed to be (and likely has been) architected as a highly elastic platform, have not stood the test of time in this regard.

Based on the in information from Robinhood's careers site, their platform is largely based on the following technology stack:

  - Python, Django, Django Rest Framework
  - Go
  - PostgreSQL
  - Container and container orchestration technologies (Docker, Kubernetes)
  - Microservice-oriented architectures and related OSS technologies (Kafka, Celery/RabbitMQ, nginx, Redis, Memcached, Airflow, Consul)
  - Cloud-native infrastructure (AWS, GCP)
  - Infrastructure as Code and configuration management (Terraform, SaltStack, Ansible, Chef, Puppet)
  - CI/CD and test automation frameworks (Cypress.io, Jenkins, Appium, UIAutomation, Bazel)
[+] vbtemp|6 years ago|reply
On Reddit I've been trying to ask this ELI5:

Why would you use RH instead of a normal, mainstream brokerage like Vanguard, Fidelity, etc that already has (1) an app and (2) commission-free trades?

[+] lkbm|6 years ago|reply
Easy answer: As someone who's used Vanguard for index funds and the like for a couple decades now, I had no idea they had an app or commission-free trades. They don't market this at all.

As a secondary answer, normal, mainstream brokerages have pretty bad tech, tbh. I don't expect it to be worse than Robinhood in terms of things like security, and I expect UX to be worse. (Side note: I just discovered that Vanguard actually has a secret security key option hidden under Account maintenance, so I can finally switch from sms 2fa. +1 to Vanguard.)

[+] fny|6 years ago|reply
My brother has a Fidelity account and apparently even he was blocked from putting in orders online last Thursday, so I'm not sure they're immune either.
[+] acchow|6 years ago|reply
Can't wait for the post mortem
[+] joobus|6 years ago|reply
I don't think we will get a postmortem. Their lawyers will kill it because it will be an admission of guilt and open them up to even more legal liability.
[+] numlock86|6 years ago|reply
Maybe in another four years when they finally realize they still haven't fixed the leap bug. Didn't work out for this year apparently. Last leap year had the exact same problem. The problem is that the ticket is very low priority because right now it is working again and won't happen again until at least 2024 ... By then it will most likely be forgotten. Again.
[+] lemmox|6 years ago|reply
I'm always amazed by how tricky DNS failures can be.