How Discord Stores Billions of Messages (2017)

[+] lmilcin|4 years ago|reply

Well... we have 3 node MongoDB cluster and are processing up to a million trades... per second. And a trade is way more complex than a chat message. Has tens to hundreds of fields, may require enriching with data from multiple external services and then requires to be stored, be searchable with unknown, arbitrary bitemporal queries and may need multiple downstream systems to be notified depending on a lot of factors when it is modified.

All this happens on the aforementioned MongoDB cluster and just two server nodes. And the two server nodes are really only for redundancy, a single node easily fits the load.

What I want to say is:

-- processing a hundred million simple transactions per day is nothing difficult on modern hardware.

-- modern servers have stupendous potential to process transactions which is 99.99% wasted by "modern" application stacks,

-- if you are willing to spend a little bit of learning effort, it is easily possible to run millions of non trivial transactions per second on a single server,

-- most databases (even as bad as MongoDB is) have a potential to handle much more load than people think they can. You just need to kind of understand how it works and what its strengths are and play into rather than against them.

And if you think we are running Rust on bare metal and some super large servers -- you would be wrong. It is a normal Java reactive application running on OpenJDK on an 8 core server with couple hundred GB of memory. And the last time I needed to look at the profiler was about a year ago.

[+] bob1029|4 years ago|reply

A million sounds impressive, but this is clearly not serialized throughput based on other comments here. Getting a million of anything to fast NVMe is trivial if there is no contention and you are a little clever with IO.

I have written experimental datastores that can hit in excess of 2 million writes per second on a samsung 980 pro. 1k object size, fully serialized throughput (~2 gigabytes/second, saturates the disk). I still struggle to find problem domains this kind of perf can't deal with.

If you just care about going fast, use 1 computer and batch everything before you try to put it to disk. Doesn't matter what fancy branding is on it. Just need to play by some basic rules.

Primary advantage with 1 computer is that you can much more easily enforce a total global ordering of events (serialization) without resorting to round trip or PTP error bound delays.

[+] kevinsundar|4 years ago|reply

Thats not really what this article is about. Their problem wasn't throughput. What's the size of all the data in your MongoDB instance? And what's the latency in your reads?

In the big data world the "complexity" of the data doesn't really mean much. It's just bytes.

[+] giancarlostoro|4 years ago|reply

> -- if you are willing to spend a little bit of learning effort, it is easily possible to run millions of non trivial transactions per second on a single server,

I got into programming through the Private Server (gaming) scene. You learn that the more you optimize and refactor your code to be more efficient, the more you can handle on less hardware, including embedded systems. So yeah, it's amazing how much is wasted. I'm kind of holding hope that things like Rust and Go focus on letting you get more out of less hardware.

[+] thibauts|4 years ago|reply

This is my experience too. Millions of persisted and serialized TPS on regular NVMe with ms latency. Though it took a bit of effort to get to these numbers.

https://github.com/dataptive/styx

[+] ljw1001|4 years ago|reply

Curious as to how many days of data you have in your cluster. It seems like it could be ~1/2 billion records per day, 125 billion per year-ish. In a few years your 3 node Mongo cluster would be getting towards volumes I associate with a 'big data' kind of solution like BigTable.

[+] dpedu|4 years ago|reply

Sounds interesting. Does this "we" have any writings about this?

[+] sumnole|4 years ago|reply

Just curious, what was the rationale for choosing MongoDB?

[+] tonymet|4 years ago|reply

Considering that a cpu can do 3 billion things a second , and a typical laptop can store 16 billion things in memory , it shouldn’t take more than 5 of these to handle “billions of messages” . I agree with you that modern frameworks are inefficient

[+] kerblang|4 years ago|reply

I've used cassandra quite a bit and even I had to go back and figure out what this primary key means:

    ((channel_id, bucket), message_id)

The primary key consists of partition key + clustering columns, so this says that channel_id & bucket are the partition key, and message_id is the one and only clustering column (you can have more).

They also cite the most common cassandra mistake, which is not understanding that your partition key has to limit partition size to less than 300MB, and no surprise: They had to craft the "bucket" column as a function of message date-time because that's usually the only way to prevent a partition from eventually growing too large. Anyhow, this is incredibly important if you don't want to suffer a catastrophic failure months/years after you thought everything was good to go.

They didn't mention this part: Oh, I have to include all partition key columns in every query's "where" clause, so... I have to run as many queries as are needed for the time period of data I want to see, and stitch the results together... ugh... Yeah it's a little messy.

[+] habibur|4 years ago|reply

Reading the article I was right now visiting Cassandra site to figure out what the catch is. Surely there should be a catch.

Well, here it is. The partitioning in manual upto the SQL level.

[+] tester756|4 years ago|reply

Discord had like $300M invested and they created unparalleled piece of software that ate whole market, damn.

One of the most impressive softwares that I've seen and use after years of using ventrilo/mumble/teamspeak.

[+] RHSeeger|4 years ago|reply

Honestly, I find discord super frustrating. Can't have multiple chats open at the same time, can't close the right rail, etc. It's UX is subpar in almost every way that matters to me. I use it because _everyone_ uses it, not because I want to.

[+] yelnatz|4 years ago|reply

Well it solved a lot of pain points with the target market.

I remember my friends and I kept bickering who would pay for this month's bill for the vent/mumble servers. That kept on for years until I had enough and hosted my own in a droplet in digital ocean. None of my friends knew how to do that since they're not very technical.

Discord you just had to click a couple buttons and its free.

[+] sascha_sl|4 years ago|reply

Discord's smart move was emulating the concept of servers (including all the teenage drama coming from having administrators and moderators more interested in (ab)using their power than community building) while making them accessible to anyone without technical knowledge.

But it's important to remember that Discord is not that. Discord holds all your data, in luxurious detail, with no option to delete. They go as far as ignoring GDPR when people ask for their messages to be deleted. "Deleting" your account will not even anonymize your ID, it unsets your avatar, renames you, kicks you from all guilds and disables logging in. That's it. And if they ban you there is no place to move on to.

[+] kccqzy|4 years ago|reply

I've also noticed that in a lot of tech-related social circles people are increasingly choosing Discord over Slack. That's a trend I totally didn't expect: at least until a few years ago it was clear that Discord was for gamers and Slack was for work and everyone else. That changed quickly. Impressive indeed!

[+] dvt|4 years ago|reply

> ventrilo/mumble/teamspeak

To be fair, Mumble is FOSS, and Ventrilo and Teamspeak have literally not iterated since 2005. Discord is pretty mediocre software (remember when they accidentally allowed iframe XSS RCE attacks? A very amateurish mistake), but the incumbents were an absolute dumpster fire.

[+] zitterbewegung|4 years ago|reply

IMHO it is even better than Slack.

[+] Aeolun|4 years ago|reply

I don’t know about your definitions, but 300M is a fuckton of money to me.

[+] 5faulker|4 years ago|reply

If that's 2017, imagine what it would be in 2021.

[+] brightball|4 years ago|reply

Elixir ftw

[+] PHGamer|4 years ago|reply

its just slack for gaming. the ui is ripped off as suck. it is better than ventrilo but its not like they are that much better, just they realized a good concept and took it.

[+] tschellenbach|4 years ago|reply

Better engineering than Slack, but not as good of a business

[+] preetamjinka|4 years ago|reply

Looks like they migrated (at least partially) to Scylla: "Discord Chooses Scylla as Its Core Storage Layer" (2020)

https://www.scylladb.com/press-release/discord-chooses-scyll...

[+] NotAnOtter|4 years ago|reply

This is like a masterclass in how to answer system design questions. Maybe a bit verbose. They cover requirements, how to answer those requirements, relevant tech for the problem, implementation, and techniques for maintenance

[+] unknown|4 years ago|reply

[deleted]

[+] bcrosby95|4 years ago|reply

[deleted]

[+] legerdemain|4 years ago|reply

We took a big bet on Cassandra, and then on an opinionated wrapper around Cassandra at $PASTJOB. The use case was a text search engine for syslog-type stuff.

The product we built using Cassandra was widely known as our buggiest and least maintainable, and it died a merciful death after several years of being inflicted on customers.

We didn't have a good handle on the exact perf implications of different values of read/write replication. Writing product code to handle a range of eventual consistency scenarios is challenging. The memory consumption and duration of compactions and column/node repair jobs is hard to model and accommodate. It's hard to tell what the cluster is doing at any given moment. Our experience with support plans from Datastax was also pretty dismal.

Maybe the situation has changed since 2016. In my experience with several employers since then, it seems like every enterprise architect fell in love with Cassandra around 2014-2015 and then had a long, painful, protracted breakup.

[+] umvi|4 years ago|reply

Discord is so good. I just can't imagine it can stay this good forever. My fear is that eventually it will be bought out and aggressively monetized.

[+] alpb|4 years ago|reply

If you're paying for Discord every month, it's actually fairly expensive. A lot of the good features unlock once people start boosting servers with Nitros and those aren't cheap either. So I'd assume they aren't bleeding cash left and right on infra costs. They might actually breaking even on the infra costs at least.

[+] helen___keller|4 years ago|reply

For years I've had a little bet going with friends about who ends up buying them to subsidize all this. My money was on amazon, because it could work so well with twitch + amazon prime.

[+] dancemethis|4 years ago|reply

It's not good. It's user hostile software.

[+] ryanianian|4 years ago|reply

TFA states:

> we knew we were not going to use MongoDB sharding because it is complicated to use and not known for stability

But then goes on to describe using Cassandra and overcoming sharding and stability issues. I.e., changing the key, changing TTL knobs, adding anti-entropy sweepers, and considering switching to a different cassandra impl entirely.

Are these issues significantly harder to solve in MongoDB than Cassandra?

[+] dragonfax|4 years ago|reply

KKV databases (Cassandra and DynamoDB are good examples) have a common problem with hotspots or "hot partitions". The most common mistake is to use a timestamp of any kind in the range (cluster) column. Then, whatever partition represents "today" or "this hour" ends up being the hot partition.

The article mentions hot partitions becomming a problem with max partition size, but they're also a problem with scalability. Say, if your writing a very high throughput of logs into the table (contrived example), then your bottlenecked by the rate at which you can write to one partition.

Adding the bucket id (say, the current day or hour), is a common solution, and solves the max partition size issue, but not the scalability issue of hot partitions.

[+] geenat|4 years ago|reply

Cockroach DB recently addressed hotspots on sequence/timestamp workloads with: https://www.cockroachlabs.com/blog/hash-sharded-indexes-unlo...

Does what it says on the tin for the primary key.

That said, hotspots are 100% the reason why Cockroach encourages UUID primary keys. The disadvantage to UUID is you want sequential data, you then need a secondary index which you'll have to bucket anyway.

[+] andrewstuart|4 years ago|reply

I'd first reach for Postgres to do this. Anyone have any idea how Postgres would stack up in a similar challenge?

[+] c7DJTLrn|4 years ago|reply

They wouldn't need to store so many if they actually let people delete their messages on account deletion. Instead, they ban many people who attempt to do so via automated scripts.

[+] simonw|4 years ago|reply

Deletion of data at scale is a really difficult technical problem, unfortunately.

I'm not saying they shouldn't do that though - especially given regulations like GDPR. Designing systems for deletion is important! But it's also really hard, especially if you didn't design for it from the start.

There's also no way the tiny fraction of users who want to delete their data would make up a significant enough proportion of the messages that it would impact their scaling strategy.

[+] coldblues|4 years ago|reply

One of the big reasons I refuse to use Discord. Deleting your messages is a right every user should have. Whether it be individually or in bulk. The way it's done now just makes it more susceptible for users to be open to malicious attacks. Whether someone archives your content before you delete it, that's not of importance, that can happen on any internet medium.

[+] anigbrowl|4 years ago|reply

I don't think they delete anything really. I've retrieved stuff from servers that were ostensibly deleted over a year ago.

[+] jjice|4 years ago|reply

I love Discord's tech blog. There are a few corporate tech blogs that are just fantastic. Fly.io is another one that has great writing and interesting topics.

[+] jgilias|4 years ago|reply

Can anyone share experiences with using Discord as a communications tool in a workplace? We're currently on Google Chat because it comes with the package that we pay for anyway, but it's pretty lame. So from time to time we consider jumping to Slack. But then, why not Discord?

[+] secondcoming|4 years ago|reply

They should check out Scylla if they want even faster queries and - depending on their workload - fewer instances.

Edit: It seems they have moved to Scylla

[+] kaladin-jasnah|4 years ago|reply

They already did—https://news.ycombinator.com/item?id=28293097, main article: https://www.scylladb.com/press-release/discord-chooses-scyll...

[+] jitans|4 years ago|reply

I would have used CockroachDB, it has all the requirements listed and you don't need to know in advance the queries you will perform when deciding the database schema.

[+] vortico|4 years ago|reply

I'm curious of the 2021 measure of total disk space that Discord consumes. Servers that I'm in share images every few minutes, which must add up pretty quick.

[+] Notanothertoo|4 years ago|reply

We use scylla for our IoT stream, bucket per day, with a date index for second resolution data. The current day is a hot spot of, but we throw that in redis. It's running one of the largest re insurance providers IoT deployments.

[+] Ansil849|4 years ago|reply

Nothing about any sort of encryption.

[+] tbarbugli|4 years ago|reply

If you can, use Scylla over Cassandra. The performance difference is tremendous in my experience and replacing can be trivial (easier if you start with Scylla on day 0)

[+] siculars|4 years ago|reply

Partition Key selection aside, if you want a better Cassandra, look at ScyllaDB. Much, much better engine, imho.

/disclaimer/ I used to work at Scylla.

368 comments