top | item 13442846

(no title)

100,000's of servers for 100,000,000 of messages/day ?

I understand that half the servers aren't even doing messages, but, isn't WhatsApp doing 2 orders of magnitude more messages with 3 orders of magnitude (?) fewer servers?

Is that right? I'm curious how one would justify 10,000X worse?

So for each message, 10,000X more equipment is needed?

discuss

sametmax|9 years ago

Also:

- whatsapp doesn't have to allow browsing the entire history of their billions of messages;

- whatsapp doesn't have tags. A message can go not only to 1000000 users, but also to so many apps requesting updates for one tag.

- Twitter allows advanced search, where you can browse, in real time (or down to the entire history), a complex combination of people, tags and free text. With settings such as choosing the lang or the date.

- Whatsapp has a list of messages. But Twitter has a graph : message can be RT again and again, answered to and liked.

- all those features have some impact or the other on the way the tweets are displayed to the user.

- Twitter's API is much more used than Whatsapp's.

jraedisch|9 years ago

WhatsApp also does not need ad related analytics.

StreamBright|9 years ago

WhatsApp from the very beginning went with Erlang and it perfectly suits their needs. You can almost map 1:1 the messages in WhatsApp to messages in Erlang. On the top of that they optimized the hell out of their stack[1].

Twitter on the other hand is a very different problem where you need to broadcast messages in a 1:N fashion where N can be 100.000.000 (KATY PERRY @katyperry. Followers 95,366,810). On the top of that they need extensive analytics on the users so they can target them in the ad system. I am pretty sure there is some space for optimisation in their stack, not sure how much % of these servers could be saved.

http://www.erlang-factory.com/upload/presentations/558/efsf2...

vidarh|9 years ago

Twitters analytics are either lossy or eventually consistent [1]. I'm sure they're resource intensive, but they're taking shortcuts that makes them very amenable to saving resources (unless it's just very buggy).

In terms of the broadcast problem, it's trivially handled by splitting large follower lists into trees, and introducing message reflectors. Twitters message counts is high for a public IM system, but it's not that high overall messaging volume for private/internal message flows. More importantly, despite the issue of large follower counts, if breaking large accounts into trees of reflectors, it decomposes neatly, and federating large message flows like this is a well understood problem:

I've half-jokingly in the past you could replace a lot of Twitters core transmission of tweets with mail servers and off the shelf mailing-list reflectors, and some code to create mailboxes for accounts an reflectors to break up large follower lists (no, it wouldn't be efficient, but the point is distributing message transfers including reflecting messages to large lists is a well understood problem), and based on the mail volumes I've handled with off the shelf servers I'll confidently say that 100's of millions of messages a day that way is not all that hard to handle with relatively modest server counts.

Fast delivery of tweets using reflectors to extreme accounts would be the one thing that could drive the server number up high, but on the other hand, there are also plenty of far more efficient ways of handling it (e.g. extensive caching + pulling rather than pushing for the most extreme accounts)

Note, I'm not saying Twitter doesn't have a legitimate need or the servers they use - their web app does a lot of expensive history/timeline generation on top of the core message exchange for example. And the number of servers does not say much about their chosen tradeoffs in terms of server size/cost vs. number of servers, but the core message exchange should not be where the complexity is unless they're doing something very weird.

[1] Taking snapshots of their analytics and the API follower/following count shows they don't agree, and the analytics numbers changes after the fact on a regular basis.

kalleboo|9 years ago

In WhatsApp, a typical message goes to 1 other person. On Twitter, it can go to millions of people.

When Twitter initially got their failwhaling under control, I recall reading they solved it by changing from a relational "join in and merge the timelines of everyone you're following on each refresh" model to a messagebox model. If that's true, maybe that naive model is now showing its limitations (I doubt they stopped there though, it seems like they have things under control)

danaliv|9 years ago

I suspect the writer was using the phrase "hundreds of millions" figuratively. When I worked there years ago there were already 14 billion API requests a day, iirc. (That number was public at the time, for the record.)

burgreblast|9 years ago

I belive it's in the low 100's of millions of tweets per day. I've seen that stat elsewhere

> 14 billion API Do you mean 14B internal, services-requesting-services API requests?

Surely you can't mean 14B API requests from the outside world, can you? I'm scratching my head over how their real user base could generate anywhere near that load.

threeseed|9 years ago

You really need to read the article.

Those servers aren't just for managing the messages. It's also for their advertising and analytics platforms. And since over a third of their servers are for generic Mesos it could be for anything e.g. development containers.

nrjdhsbsid|9 years ago

That's only 1000 messages per second on average. A single database +app server could handle that load. Assuming a bunch of other stuff is happening 500 servers sounds generous.

Wtf are they doing that each server can only handles one tweet every two minutes?

burgreblast|9 years ago

Actually, it's 1000-9000 messages per server per day. Or about 1 message every 10-100 seconds.

Of course, that's just the new messages inbound. They may need to distribute that single message to 100M people (who likely won't even see it, but still.)

Problems that are trivally solvable with one database don't simply scale by adding more DBs or machines. Scaling isn't easy or they would have done it. I'm in no way disparaging their team, because I don't know what kind of constraints they had getting to this point.

Still, I'd bet it could be optimized by 2+ of orders of magnitude if people sat down and re-evaluated the whole structure again at this point in time.

Regardless, is that really a priority?

They may have bigger issues on their plate now (growing revenue, growing users, making users happy). Assuming their business can generate the cash flow to overcome the inefficiencies, they may be better served to focus on growth.