The most important part of this article is the concept of back pressure and being able to detect it. It's common in a ton of other engineering disciplines but especially important when designing fault tolerant or load balancing systems at scale.
Basically it is just some type of feedback so that you don't overload subsystems. One of the most common failure modes I see in load balanced systems is when one box goes down the others try to compensate for the additional load. But there is nothing that tells the system overall "hey there is less capacity now because we lost a box". So you overwhelm all the other boxes and then you get this crazy cascade of failures.
This is a good article about overload and back pressure. It also lists some tools in Erlang to solve these sorts of issues. It also mentions genstage (very) briefly.
Backpressure is more common than water across engineering disciplines and it isn't an integrated part of every distributed system out there? Isn't that a bit of an oversight?
Hate to be a party pooper, but I'd like to give people here a more generic mental tool to solve this problem.
Ignoring Elixir and Erlang - when you discover you have a backpressure problem, that is - any kind of throttling - connections or req/sec, you need to immediately tell yourself "I need a queue", and more importantly "I need a queue that has a prefetch capabilities". Don't try to build this. Use something that's already solid.
I've solved this problems 3 years ago, having 5M msg/minute pushed _reliably_ without loss of messages, and each of these messages were checked against a couple rules for assertion per user (to not bombard users with messages, when is the best time to push to a a user, etc.), so this adds complexity. Later approved messages were bundled into groups of a 1000, and passed on to GCM HTTP (today, Firebase/FCM).
I've used Java and Storm and RabbitMQ to build a scalable, dynamic, streaming cluster of workers.
You can also do this with Kafka but it'll be less transactional.
After tackling this problem a couple times, I'm completely convinced Discord's solution is suboptimal. Sorry guys, I love what you do, and this article is a good nudge for Elixir.
On the second time I've solved this, I've used XMPP. I knew there were risks, because essentially I'm moving from a stateless protocol to a stateful protocol. Eventually, it wasn't worth the effort and I kept using the old system.
I think you misunderstand the problem we are solving here. We are not trying to solve this because our system can't handle it. We are protecting it from when Firebase decides to slowdown in a way that causes data to backup and OOM the system. Since these are push notifications that have a time bound on usefulness we don't care about dumping to an external persisted queue like RabbitMQ or Kafka (we rather deliver newer notifications faster, than wait for the backed up buffer to flush). Firebase also only allows 1000 concurrent connections per senderId with 100 inflight pushes (that have not received an ack) which means that only 100,000 can be inflight. Ultimately if a remote service is providing backpressure because it is having a struggle no amount of auto scaling on your end is going to help you.
This service buffers potential pushes for all users being messages, that then watches the presence system to determine if they are on their desktop or mobile (this is millions of presence watchers and 10s of millions of buffered messages), and users are constantly clearing these buffers by reading on the clients and finally when a user is offline or goes offline we emit their pushes to them (which is what this article talks about). This service was evolved from our push system from the game we worked on and when it just did pushes only and no other logic it could push at 1m/sec in batches, but its responsibility has changed.
Anyone have any nice links to describe more about queue prefetching as described in this case? My google skills are failing because of all the CPU related articles.
This worries me a bit. At the moment they are providing the software and hosting it all absolutely free.
I would happily still use Discord if they provided the exact same thing with a monthly fee. Hopefully at some point they throw in some extra features for a pro version and start charging.
I just use Discord for gaming and haven't used Slack a lot, but I think Discord will be great for work as soon as they release search.
If they need ideas for a pro version I'd probably pay far too much to be able to record each user into individual audio files (recorded locally to each user and combined on my system) for podcasts, letsplays (YouTube videos), remote meetings, etc
> Discord is always completely free to use with no gotchas. This means you can make as many servers as you want with no slot limitations.
Wondering how we’ll make money? In the future there will be optional cosmetics like themes, sticker packs, and sound packs available for purchase. We’ll never charge for Discord’s core functionality. [0]
They have an awful lot of information about video gamers in conversation history. They could mine that data for game companies and sell it as a way to help companies build better, more addictive and mechanically pleasing games.
I sure hope one possible end-game is to re-skin it without the gaming focus and price it competitively to Hipchat/Slack.
As soon as screensharing lands it'll be an across the board upgrade to Hipchat at my work (only missing feature I can think of is video calls, but quality over hipchat has always been a bit sketchy so we usually fall back to Hangouts).
They're in the gaming market, there's cash plenty. Even if they just went the advertisement route, which I doubt they will they should end up doing well.
That's awesome and it just goes to show how simple something can be that would otherwise involve a certain degree of concurrent (and distributed) programming.
"Obviously a few notifications were dropped. If a few notifications weren’t dropped, the system may never have recovered, or the Push Collector might have fallen over."
How many is a few? It looks like the buffer reaches about 50k, does a few mean literally in the single digits or 100s?
Good question. We don't have metrics on the exact number dropped. We're using an earlier version of GenStage that doesn't give any information about dropped events. Once we upgrade we'll have a better idea.
I was wondering the same thing. Dropping an unknown number of requests isn't all that impressive. It seems like a simpler approach would have been to use a Message Queue of some sort with pushers pulling items from the queue.
Not sure why are you getting downvoted, I came here to make the same comment.
QPM is a useless metric. When talking about distributed systems from engineering point of view, you always want to use QPS. QPM is simply not fined-grained enough to show whether the traffic is bursty or not. For example in this particular case, when you say 1M QPM that can mean anything - they might be idle for 50s and then get 100k QPS for the next ten seconds, or they might be getting 15k QPS all the time (like it's visible on the graph). Distributed systems are designed for the peak workload, not for the average one. Using misleading numbers like QPM leads to bad design and sizing decisions.
The only case where you would use QPM, QPD and similar metrics is when you want to artificially show your numbers bigger than they are (10M transactions a day sounds better than 115 transactions a second). But those should be used by sales, not by engineers.
50k seems like a low bar to start losing messages at. If this was done with Celery and a decently sized RabbitMQ box, I would expect it to get into the millions before problems started happening.
These machines do more than just push. They also buffer messages for each individual user to "potentially" push if they don't read them on the desktop client. This happens before the flow this article talks about.
We currently have 3 machines doing this for millions of concurrent users. At the writing of this article it was 2 machines.
At some point, when a system has entered a failure mode for a while, it makes sense to start shedding load, rather than attempting to deliver every single push notification. Also worth mentioning, a minute of downtime is already a million backed up pushes. Beyond that, it becomes infeasible to attempt deliver them.
Edit: Also worth mentioning, the 50k buffer is for a single server, we run multiple push servers in the cluster.
At 15k notifications per minute, a million notifications would take 1hr to clear before the queue returns to normal. I would imagine they prefer to shed load early so notifications don't get delayed, hence the small buffer.
The issue was not the ability of their servers to handle the load, but the ability of Firebase to ingest the notifications - at least, that's how I read it.
I love Discord, and love Elixir too, so this is a pretty great post.
Unfortunate that the final bottleneck was an upstream provider, though it's good that they documented rate limits. I feel like my last attempt to find documented rate limits for GCM/APNS was fruitless, perhaps Firebase messaging has improved that?
It does, doesn't it? I used to use Ventrillo, but then they screwed our small group out of our server connection. And we happily used Dolby Axon for a while. We tried Google Hangouts... for a while; until it just really didn't work well (it just disconnected and crapped out a lot). We tried using the Steam client's chat, but while ok for screensharing, it wasn't so great for chat.
But at some point we heard of Discord, which posed itself as a chat/vent replacement, started using it, and it just works. Which is huge, since the other stuff generally didn't (Axon was actually good).
They had a really well defined user-space, marketed at it well, and really nailed the user experience, while still being free for the typical user. There is a lot to love about Discord.
I ended up there for the first time last night and must say that there is a lot to like about it. I found some good communities and integrating media and so on all felt quite streamlined, and the system was snappy.
It's just too bad there are a dozen IMs/voice/video and a dozen slacky/feed companies.
I spend a lot of time in the PCMR Discord, which is pretty lively. The technology seems to be solid, while the UI has issues (notifications from half a day ago are really hard to find for example on mobile devices). Otherwise I'm on Discord every day and love using the service. I miss some slack features, but the VOIP is very good.
16k per second. 83k per second during peak (assuming 80/20 default traffic rule).
- 100 /s = typical limit of a standard web application (python/ruby), per core
- 1.000 /s = typical limit of an application running on a full system
- 10.000 /s = typical limit for fast systems (load balancers, DB, haproxy, redis, tomcat...).
- Over 10.000/s You gotta scale horizontally because a single box [shouldn't] can't take it.
The difficulty depends on the architecture and what the application has to do (dunno, didn't go through the article). You make something that can scale by just adding more boxes, then it's trivial, just add more boxes. Well, it's gonna costs money and that's about it.
So no. Not a big deal at all... if you've done that before and you've got the experience :D
Everything is relative. In this case, it's not so much the actual load itself but rather the throttling ability to match the upstream provider's throughput and limitations.
It is constant - but iirc, it'd be trivial to make a dynamically scaling pool. At the end of the day, a pusher is just a TCP connection. Keeping a pool of fixed size and planning capacity around scaling horizontally is a perfectly acceptable approach - given you know the potential throughput for each pusher.
just wondering, what is the difference if I use two kind of [producer, consumer] message queues (say rabbitmq) instead of this? Does genstage being a erlang system makes a difference?
RabbitMQ is written in erlang. So basically you use it natively instead of bringing and configurating a big dependency. It just come with your language for free without needing another process, etc etc.
how does one achieve this in Celery 4? I remember there was a celery "batch" contrib module that allowed this kind of a batching behavior. But i dont see that in 4
> "Firebase requires that each XMPP connection has no more than 100 pending requests at a time. If you have 100 requests in flight, you must wait for Firebase to acknowledge a request before sending another."
So... get 100 firebase accounts and blast them in parallel.
[+] [-] jtchang|9 years ago|reply
Basically it is just some type of feedback so that you don't overload subsystems. One of the most common failure modes I see in load balanced systems is when one box goes down the others try to compensate for the additional load. But there is nothing that tells the system overall "hey there is less capacity now because we lost a box". So you overwhelm all the other boxes and then you get this crazy cascade of failures.
[+] [-] pdexter|9 years ago|reply
This is a good article about overload and back pressure. It also lists some tools in Erlang to solve these sorts of issues. It also mentions genstage (very) briefly.
[+] [-] user5994461|9 years ago|reply
Corollary: If you have 2 boxes, each of them has to be able to handle all the traffic, so you can't save money by using smaller boxes :D
Corollary #2: If you have 2 datacenters, each of them has to be able to handle all the traffic, so you burn a lot of money :D
[+] [-] nurettin|9 years ago|reply
[+] [-] jondot|9 years ago|reply
Ignoring Elixir and Erlang - when you discover you have a backpressure problem, that is - any kind of throttling - connections or req/sec, you need to immediately tell yourself "I need a queue", and more importantly "I need a queue that has a prefetch capabilities". Don't try to build this. Use something that's already solid.
I've solved this problems 3 years ago, having 5M msg/minute pushed _reliably_ without loss of messages, and each of these messages were checked against a couple rules for assertion per user (to not bombard users with messages, when is the best time to push to a a user, etc.), so this adds complexity. Later approved messages were bundled into groups of a 1000, and passed on to GCM HTTP (today, Firebase/FCM).
I've used Java and Storm and RabbitMQ to build a scalable, dynamic, streaming cluster of workers.
You can also do this with Kafka but it'll be less transactional.
After tackling this problem a couple times, I'm completely convinced Discord's solution is suboptimal. Sorry guys, I love what you do, and this article is a good nudge for Elixir.
On the second time I've solved this, I've used XMPP. I knew there were risks, because essentially I'm moving from a stateless protocol to a stateful protocol. Eventually, it wasn't worth the effort and I kept using the old system.
[+] [-] Vishnevskiy|9 years ago|reply
This service buffers potential pushes for all users being messages, that then watches the presence system to determine if they are on their desktop or mobile (this is millions of presence watchers and 10s of millions of buffered messages), and users are constantly clearing these buffers by reading on the clients and finally when a user is offline or goes offline we emit their pushes to them (which is what this article talks about). This service was evolved from our push system from the game we worked on and when it just did pushes only and no other logic it could push at 1m/sec in batches, but its responsibility has changed.
Context matters :)
[+] [-] di4na|9 years ago|reply
[+] [-] teacpde|9 years ago|reply
Could you explain why using RabbitMQ is more transactional?
[+] [-] neiled|9 years ago|reply
[+] [-] coverband|9 years ago|reply
[1] "We've raised over $30,000,000 from top VCs in the valley like Greylock, Benchmark, and Tencent. In other words, we’ll be around for a while."
[+] [-] robryan|9 years ago|reply
I would happily still use Discord if they provided the exact same thing with a monthly fee. Hopefully at some point they throw in some extra features for a pro version and start charging.
I just use Discord for gaming and haven't used Slack a lot, but I think Discord will be great for work as soon as they release search.
[+] [-] chinhodado|9 years ago|reply
[+] [-] HCIdivision17|9 years ago|reply
[+] [-] corobo|9 years ago|reply
[+] [-] nstj|9 years ago|reply
[0]: www.discordapp.com
[+] [-] meddlepal|9 years ago|reply
[+] [-] baldfat|9 years ago|reply
[+] [-] jamie_ca|9 years ago|reply
As soon as screensharing lands it'll be an across the board upgrade to Hipchat at my work (only missing feature I can think of is video calls, but quality over hipchat has always been a bit sketchy so we usually fall back to Hangouts).
[+] [-] lsmarigo|9 years ago|reply
[+] [-] mevile|9 years ago|reply
[+] [-] poorman|9 years ago|reply
GenStage has a lot of uses at scale. Even more so is going to be GenStage Flow (https://hexdocs.pm/gen_stage/Experimental.Flow.html). It will be a game changer for a lot of developers.
[+] [-] hotdogs|9 years ago|reply
How many is a few? It looks like the buffer reaches about 50k, does a few mean literally in the single digits or 100s?
[+] [-] Sikul|9 years ago|reply
[+] [-] DougN7|9 years ago|reply
[+] [-] erikbern|9 years ago|reply
Makes me think of the Abraham Simpson quote: "My car gets 40 rods to the hogshead and that's the way I likes it!"
[+] [-] ipozgaj|9 years ago|reply
QPM is a useless metric. When talking about distributed systems from engineering point of view, you always want to use QPS. QPM is simply not fined-grained enough to show whether the traffic is bursty or not. For example in this particular case, when you say 1M QPM that can mean anything - they might be idle for 50s and then get 100k QPS for the next ten seconds, or they might be getting 15k QPS all the time (like it's visible on the graph). Distributed systems are designed for the peak workload, not for the average one. Using misleading numbers like QPM leads to bad design and sizing decisions.
The only case where you would use QPM, QPD and similar metrics is when you want to artificially show your numbers bigger than they are (10M transactions a day sounds better than 115 transactions a second). But those should be used by sales, not by engineers.
[+] [-] pwf|9 years ago|reply
[+] [-] Vishnevskiy|9 years ago|reply
We currently have 3 machines doing this for millions of concurrent users. At the writing of this article it was 2 machines.
[+] [-] jhgg|9 years ago|reply
Edit: Also worth mentioning, the 50k buffer is for a single server, we run multiple push servers in the cluster.
[+] [-] ramchip|9 years ago|reply
[+] [-] abrookewood|9 years ago|reply
[+] [-] bpicolo|9 years ago|reply
Unfortunate that the final bottleneck was an upstream provider, though it's good that they documented rate limits. I feel like my last attempt to find documented rate limits for GCM/APNS was fruitless, perhaps Firebase messaging has improved that?
[+] [-] dimino|9 years ago|reply
It seems to have totally taken over a space that wasn't even clearly defined before they got there.
[+] [-] HCIdivision17|9 years ago|reply
But at some point we heard of Discord, which posed itself as a chat/vent replacement, started using it, and it just works. Which is huge, since the other stuff generally didn't (Axon was actually good).
[+] [-] bpicolo|9 years ago|reply
[+] [-] Numberwang|9 years ago|reply
It's just too bad there are a dozen IMs/voice/video and a dozen slacky/feed companies.
[+] [-] user5994461|9 years ago|reply
The average per minute only gets to be used because many systems have so little load that the number per second is negligible.
[+] [-] AgentK20|9 years ago|reply
I'd definitely be able to put to use things like flow limiters and queuing and such, but none of my company's projects use Elixir :(
[+] [-] mevile|9 years ago|reply
[+] [-] snambi|9 years ago|reply
[+] [-] user5994461|9 years ago|reply
- 100 /s = typical limit of a standard web application (python/ruby), per core
- 1.000 /s = typical limit of an application running on a full system
- 10.000 /s = typical limit for fast systems (load balancers, DB, haproxy, redis, tomcat...).
- Over 10.000/s You gotta scale horizontally because a single box [shouldn't] can't take it.
The difficulty depends on the architecture and what the application has to do (dunno, didn't go through the article). You make something that can scale by just adding more boxes, then it's trivial, just add more boxes. Well, it's gonna costs money and that's about it.
So no. Not a big deal at all... if you've done that before and you've got the experience :D
[+] [-] manigandham|9 years ago|reply
[+] [-] manigandham|9 years ago|reply
[+] [-] sbov|9 years ago|reply
[+] [-] jhgg|9 years ago|reply
[+] [-] rv11|9 years ago|reply
[+] [-] di4na|9 years ago|reply
[+] [-] sandGorgon|9 years ago|reply
[+] [-] IOT_Apprentice|9 years ago|reply
[+] [-] imaginenore|9 years ago|reply
So... get 100 firebase accounts and blast them in parallel.