I found the concept of affinity used for bunching messages to be a new concept. The other concepts are not surprising, they are the typical engineering solutions used.
One thing I am curious about: how does it compare to how the old postal systems used to handle Christmas and new year loads?
On a lighter note: perhaps you can predictively generate and cache messages at the receiver's end based on their contacts and their style of communication. When a sender actually sends a message, just send one bit across, and the local cache gets flushed and displayed :)
> I found the concept of affinity used for bunching messages to be a new concept.
I'm reasonably sure that UUCP + things like INN were "batching messages per destination" for efficiency a long time ago. Nowhere near the same scale, obviously, but the same kind of concept, no?
take the 16 most common messages, normalize them a bit, and encode them in 4 bit, and I would guess that would probably cover over 50% of the messages sent, maybe even close to 80%.
It may be a good idea to provide range of fixed templates for standard messages and send just selective message Id along with recipient name over the network.
I haven't used FB Messenger, but I also hate features where they show if the other person is typing a response. I intentionally disable it in Slack, so the other person simply sees a response when I'm done with it.
Back in the day when SMS text message was the way to send messages to each other over the mobile phone network (in the UK), people would jump the gun by sending 'Happy NY!' messages 5 minutes before midnight, because the moment 12am hit, any messages sent then could be queued for hours as the mobile networks struggled to cope with the massive uptick in messages being sent at the same time.
I used to use bulk sending tools (yeah, they existed as J2ME apps) to send 10-50 SMS (it's been a while, not sure how many it was) and some would only arrive on January 1st quite a while into the day. At some point, it changed and everything would arrive just a few minutes after sending. I think that was when we had iPhone and Android already and I'm not sure if it was because of messengers (were there any back then?) or because the German telcos finally upgraded their infrastructure enough.
As someone who typically works on front-end projects, this was a very interesting read. I particularly loved the discussion of “graceful degradation.” That’s the kind of collaboration across the stack that makes a service like Messenger very pleasant to use.
The photo caption says that's just the infrastructure team. I'd imagine the product team is much larger given how many features are crammed into Messenger.
Afaik the real team name is Messenger Foundation. FB has the concept of foundation teams, with the only goal of keeping a part of the service working no mater what.
/edit: been informed by some of the people in the picture that technically there are 2 teams there: Messenger Infra and Messenger Foundation
I didn't realize message queues were used for this type of task. I'm assuming you would then also use autoscaling pods that respond to the number of messages in the queue. How do you scale pods fast enough for a messaging application or anything else trying for 100ms or less per operation?
I think over-provisioning is way more common and sane approach that can address the bulk of spikes versus auto-scaling. Especially if you have these big known events (new years day, black friday, ...) where you can over-provision (or controlled auto-scale if you will) for a short window.
My guess is they're doing both.
Anecdote time. I worked at a company where one project was over-provisioned on dedicated hardware and another auto-scaled in the cloud. The over-provisioned project was much cheaper, had significantly better response times and was easier to manage. It was load tested to handle over an order of magnitude more traffic than the all-time-peak and even though fully over-provisioned, it was cheaper than the baseline usage (and slower, and harder to manage) cloud solution.
Messaging queues are a core part of a lot of high scale distributed systems (source - Twitter) You want enough queue space to handle the expected volume and then some. Assuming you have that, you don't need to instantly scale instances out to match the amount of messages, you just need to catch up before the queue space runs out.
Message queues (or similar things like Kafka, which isn't quite a proper "message queue") are used for basically everything at this scale. Messages are being passed indirectly. An event happens, it gets popped on a queue, and then the recipients do something with it.
One way is to over-allocate in the first place. When your spare pool is draining below a watermark, you scale in. Hopefuly there is enough time for that scale event to complete before the pool drains completely.
One thing I find very manipulative about Facebook is how, when someone sends you a message, the email notification has a link to open messenger and it says that messenger is the only way you can read that message, even if you don’t have messenger installed. They are trying everything they can do to have everyone install that app. Yet, of course you can just read and respond directly on their website without any app, but they don’t link to that or mention it.
mbasic.facebook.com for anyone reading this and looking for a way to work around Facebook's dark UI pattern on mobile, where a click on the "messages" button wants you to install the app.
Essentially what everyone else does - distributed systems with load balancing, load balancing and more load balancing. And if that goes awry, triage - where they prioritize messages and simply timeout and drop the lower priority messages. Of course the Messenger team is lucky in that they can drop messages since your family and friends missing a "Happy New Years" message isn't the end of the world. Other systems ( such as finance ), aren't so lucky. Drop a few transactions or apply them out of order and it is the end of the world. Was an interesting read, though it would have been nice if there were more specifics but I guess Facebook wouldn't approve that.
I don’t know, a missing “Happy New Years” isn’t missing dollars, but it’s definitely not cool to drop such a greeting —or any message—in my opinion. It should definitely be possible to at least store and then deliver these messages late. The baseline should be 100% deliverability and anything less than that should be subject to intense scrutiny. I mean, how big of a Kafka cluster do you need to make this happen?
Actual messages with content are never dropped. Only ‘meta-messages’ like read receipts - it’s not critical if on a group chat with many participants the state of ‘who’s seen the last message” is not 100% correct on New Year’s Eve.
> Of course the Messenger team is lucky in that they can drop messages since your family and friends missing a "Happy New Years" message isn't the end of the world. Other systems ( such as finance ),
Messenger is (also) a finance system. You can send money via Messenger. You can purchase products directly in Messenger. It's had all that for more than two years now.
From one of my projects (a MMORPG) I've learned that the required accuracy in non-financial transactions is often underestimated, while, on the other hand, financial transactions are often less critical that initially assumed. After all, compensation in financial transactions is often straightforward to calculate and apply. But the damage done through dropped/failed non-financial transactions is often hard to assess, and it's also more involved to find appropriate compensation.
Facebook get its share amount of bad PR (some are well deserved), but we shouldn’t dismiss amazing engineering work because of those. This is a technical piece that highlights solutions to problems not many out there get to solve.
This is a 2018 internet-connected app, not a 1985 GSM network.
1 billion 100 Byte messages sums up to an almost trivial 100GB. This might be a technical challenge for the neighborhood's web admin but not for any real company.
[+] [-] sn41|7 years ago|reply
One thing I am curious about: how does it compare to how the old postal systems used to handle Christmas and new year loads?
On a lighter note: perhaps you can predictively generate and cache messages at the receiver's end based on their contacts and their style of communication. When a sender actually sends a message, just send one bit across, and the local cache gets flushed and displayed :)
[+] [-] zimpenfish|7 years ago|reply
I'm reasonably sure that UUCP + things like INN were "batching messages per destination" for efficiency a long time ago. Nowhere near the same scale, obviously, but the same kind of concept, no?
[+] [-] ehsankia|7 years ago|reply
[+] [-] kaustyap|7 years ago|reply
[+] [-] robertAngst|7 years ago|reply
Offtopic- I hate those features to begin with, but I know I'm the product not the customer, and those features are to keep people on the app.
[+] [-] AznHisoka|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] oscar_wong67|7 years ago|reply
[+] [-] Jaruzel|7 years ago|reply
[+] [-] Semaphor|7 years ago|reply
[+] [-] sjroot|7 years ago|reply
[+] [-] secabeen|7 years ago|reply
[+] [-] traek|7 years ago|reply
[+] [-] pbalau|7 years ago|reply
/edit: been informed by some of the people in the picture that technically there are 2 teams there: Messenger Infra and Messenger Foundation
[+] [-] ngngngng|7 years ago|reply
[+] [-] latch|7 years ago|reply
My guess is they're doing both.
Anecdote time. I worked at a company where one project was over-provisioned on dedicated hardware and another auto-scaled in the cloud. The over-provisioned project was much cheaper, had significantly better response times and was easier to manage. It was load tested to handle over an order of magnitude more traffic than the all-time-peak and even though fully over-provisioned, it was cheaper than the baseline usage (and slower, and harder to manage) cloud solution.
[+] [-] jcmi|7 years ago|reply
[+] [-] gmmeyer|7 years ago|reply
[+] [-] underwater|7 years ago|reply
[+] [-] int0x80|7 years ago|reply
[+] [-] hellofunk|7 years ago|reply
[+] [-] groestl|7 years ago|reply
[+] [-] mrdickbig|7 years ago|reply
[+] [-] ckwalsh|7 years ago|reply
https://www.facebook.com/notes/facebook-engineering/chat-sta...
[+] [-] porpoisely|7 years ago|reply
[+] [-] linkmotif|7 years ago|reply
[+] [-] radicality|7 years ago|reply
[+] [-] freddie_mercury|7 years ago|reply
Messenger is (also) a finance system. You can send money via Messenger. You can purchase products directly in Messenger. It's had all that for more than two years now.
[+] [-] groestl|7 years ago|reply
[+] [-] arusahni|7 years ago|reply
[+] [-] lostmsu|7 years ago|reply
[+] [-] gerdesj|7 years ago|reply
[+] [-] mliker|7 years ago|reply
[+] [-] sluggg|7 years ago|reply
are you on a web browser? are any of your extensions mucking things up?
[+] [-] Markoff|7 years ago|reply
[+] [-] openloop|7 years ago|reply
[deleted]
[+] [-] moron4hire|7 years ago|reply
[deleted]
[+] [-] webo|7 years ago|reply
[+] [-] RaceWon|7 years ago|reply
[+] [-] dominotw|7 years ago|reply
[+] [-] mrdickbig|7 years ago|reply
[+] [-] John_KZ|7 years ago|reply
1 billion 100 Byte messages sums up to an almost trivial 100GB. This might be a technical challenge for the neighborhood's web admin but not for any real company.
[+] [-] notacoward|7 years ago|reply