The Architecture Twitter Uses to Deal with 150M Active Users

[+] apaprocki|12 years ago|reply

Since tweets are very much like financial ticks (fixed size, tiny), we can put the numbers in perspective. Let's compare Twitter vs say, OPRA (North American options), a single "exchange" feed.

Twitter: 300K QPS, firehose 22 MB/sec, 400 million tweets/day

OPRA: 12.73M MPS, firehose 3.36 GB/sec, 26.4 billion messages/day

http://opradata.com/specs/2012_2013_Traffic_Projections.pdf

edit: Also worth noting, the diff between OPRA 1/1/13 & 7/1/13 is ~256 MB/sec, 2.0 billion messages/day. So in just 6 months the existing firehose increased roughly 10x Twitter's entire output and roughly 5x'd the number of messages/day.

[+] thezilch|12 years ago|reply

Tweets are fanned out to more than a single feed and, in the "most important" cases, millions of feeds -- 31 million feeds for Lady Gaga. Your comparing 300K reads, which might not even include the sub-queries, to 12.7M writes. A single tweet from a single user would trump the writes; it's unclear whether it could be done in the same one-second window.

Twitter already does 30B Timeline deliveries a day, compared to the 26.4B of OPRA. Again, there is no telling what can be implied by a "Timeline delivery;" does it include pulling user and other secondary and tertiary objects? An HTML renderer? The doesn't say much for the capabilities and focuses on what they do.

There's also little comparison to be made on what powers OPRA. If Twitter can simply add nodes to their architecture, are they doing it wrong?

[+] squarecog|12 years ago|reply

It's not the data coming in that makes things hard. It's the fanout of messages into their subscribers' timelines.

[+] dchichkov|12 years ago|reply

300K QPS is pretty impressive. If you'd want to sustain that rate on a single modern machine you'll have to fit all your processing into about thirty LLC misses per request.

[+] aroman|12 years ago|reply

> Twitter no longer wants to be a web app. Twitter wants to be a set of APIs that power mobile clients worldwide, acting as one of the largest real-time event busses on the planet.

Wait, then why are they actively destroying their third-party app ecosystem...?

[+] blazingfrog2|12 years ago|reply

The piece did not say "third party". I'm assuming these clients are the official Twitter mobile apps.

[+] Yhippa|12 years ago|reply

Probably their mobile clients. After I read the article perhaps Twitter felt that having control of the clients reduces variability in stress to their system. Like bad actors accessing the firehose suboptimally.

[+] mey|12 years ago|reply

The two are not the same. It sounds like they are becoming API driven to support multiple interfaces and to put clear boundaries between systems.

[+] grbalaffa|12 years ago|reply

Yeah, somebody hasn't gotten the memo it seems.

[+] NelsonMinar|12 years ago|reply

This article cites its source as a talk by Twitter VP Raffi Krikorian. 38 minutes of video, audio, and slides are at http://www.infoq.com/presentations/Twitter-Timeline-Scalabil...

[+] aespinoza|12 years ago|reply

Thank you for sharing it.... I was wondering about the video.

[+] davidw|12 years ago|reply

> Your home timeline sits in a Redis cluster and has a maximum of 800 entries.

Wow, that's pretty cool. Congrats to antirez - it must be a nice feeling knowing that your software powers such a big system!

[+] antirez|12 years ago|reply

Thank you David, I'm very happy indeed that many companies are using Redis to get work done, this is basically one of the biggest reasons why after 4 years I'm not giving up...

As you know I'm usually not the kind of guy focused to the task at hand for more than a limited timeframe, and then I've the temptation to switch to something else, but this time I'm finding the right motivations in the big user base.

[+] nasalgoat|12 years ago|reply

Redis is probably the most useful tool powering the internet after nginx. It really is an amazing piece of engineering.

[+] jacques_chester|12 years ago|reply

Figuratively, Twitter have switched from doing a design rooted in Databases 101 (OLTP) to a design more rooted in Databases 102 (OLAP).

That is, they moved processing from query time to write time. And that's a perfectly legitimate strategy; it's the basis of data warehousing.

OLTP is about write-time speed. It's great for stuff like credit card transactions, where you only really care about the tally at the end of the banking day. The dominant mode of operation is writing.

OLAP is about read-time speed. You do a bunch of upfront processing to turn your data into something that can be queried quickly and flexibly.

One thing that's problematic about the teaching of database technology is that the read/write balance isn't explicitly taught. Your head is filled with chapter and verse of relational algebra, which is essential for good OLTP design. But the problems of querying large datasets is usually left for a different course, if it's treated at all.

[+] joshuaellinger|12 years ago|reply

Maybe at a high level, but OLAP involves precomputing aggregations and bit indexes. It's a pretty different beast.

OLAP is very rarely under real-time constraints and, when it is, it tends to push the heavy lifting out to OLTP.

[+] squarecog|12 years ago|reply

That's an interesting thought, but rather than focusing on write vs read balance, I would claim that OLAP and OLTP are most distinguished by the nature of queries they need to support.

OLAP is characterized by fairly low-volume aggregation queries that touch very large volumes of data ("what fraction of tweets in English come from non-english speaking countries?"). Sure, the ingest is large, but it tends to be batch and has fairly loose real-time (in the EE sense) requirements. What makes OLAP hard is the sheer volume of data from which an aggregation must be calculated.

OLTP is characterized by very selective queries with much tighter real-time bounds ("what are the last 20 things that the 100 people I follow said?"). The overall size of the dataset might even be the same, but each individual query needs fast access to a tiny fraction of the dataset. In many applications, this is accompanied by very high QPS and in Twitter's case, extremely high write volume.

[+] cwt137|12 years ago|reply

This near hour long video is a deep look into Twitter's backend. Specially into the Firehose feature, Flock, etc. They go into detail on how they use Redis and even show one of the actual data structures they store in Redis. A must see video for anyone into high scalability.

http://www.infoq.com/presentations/Real-Time-Delivery-Twitte...

[+] mikemoka|12 years ago|reply

I am playing the armchair architect and my question will be probably wrong in infinite ways,but I might learn something, what is the reason why the service has to write a tweet on two million timelines, wouldn't it be cheaper if they let the client build the page on its own via restful apis?

[+] rythie|12 years ago|reply

Requesting the timelines of each of the people you follow is slow process for the client, meaning that the client has to make 100s or even 1000s or requests.

Also, much of the data is thrown away because it's replies to people you don't follow or too old. It's hard for clients to reconstruct the timeline that way. Also, it would vastly increase the number of HTTP requests and data Twitter has to ship out.

Also, it's basically impossible to make a system like that real time because you have to check 1000 feeds to see if anything is new.

I used to write a multi-network client, basically the combined home timeline request is the only feasible method for a client to use.

[+] beagle3|12 years ago|reply

Mostly because they chose the wrong backend technology (which they have been doing repeatedly since their early days).

The right way to solve twitter would be to have 140-byte tweets sorted by a <userid,time> 64-bit key, with a few more attributes (all falls into 256-bytes neatly), shard them across servers and keep everything recent in memory.

Logging into a server would fetch the list of following to the front end server, broadcast the request to all tweet servers, wait 50ms or so for responses, merge, sort and HTML format them.

The front end servers would not need any memory or disk (could be an army of $500 servers behind a load balance, or a few beefy ones). The backend servers would have to have some beefy CPU and memory, but still ultra commodity (256 bytes/tween means 1GB=4M tweets, so one 64GB server=256M tweets). Shard for latency, redundancy, etc. Also, special case the Gagas/Kutchers of this world by giving them their own server, and/or have them broadcast to and cache their tweets in the front end servers (Spend 256MB memory on tweet cache in the front end servers, and you get 1M cached tweets - which would cover all of the popular people and then some).

Network broadcast was invented for a reason.

[+] harryh|12 years ago|reply

That is actually a very interesting question and it turns out that whether it's better to fanout on write or on read depends on a few different things. There's a very widely read paper on the subject you might enjoy:

http://research.yahoo.net/node/3203

[+] brown9-2|12 years ago|reply

They are making a tradeoff between cost of reading your timeline and cost of a write at tweet-time to optimize the former.

[+] unknown|12 years ago|reply

[deleted]

[+] logic|12 years ago|reply

In addition to what others have mentioned, consider passive endpoints, such as SMS and push notifications.

[+] brown9-2|12 years ago|reply

Here is a similar talk that Jeremy Cloud gave at QCon NY a few weeks ago: http://www.infoq.com/presentations/twitter-soa

Jeremy Cloud discusses SOA at Twitter, approaches taken for maintaining high levels of concurrency, and briefly touches on some functional design patterns used to manage code complexity.

[+] mistertrotsky|12 years ago|reply

I would absolutely trust someone named Jeremy Cloud on this subject.

[+] joshuaellinger|12 years ago|reply

The surprise for me is that the core component is Redis.

My first guess would have been custom C code. Yeah, you have to do everything yourself. Yeah, it would be hard to write. But you'd control every little bit of it.

Obviously, I must not fully understand the problem and what Redis buys.

Sam Puralla (if you are reading) -- do you know why didn't Twitter go with a full custom system at its heart?

Josh

[+] smandou|12 years ago|reply

Well... It seems that hisghscalability doesn't scale... Error from squarespace on that page.

[+] ampersandy|12 years ago|reply

No one else seems to have noticed that these details are out of date. Twitter has publicly stated that they currently have "well over 200M active users". The stats are also misleading in that -- I'm pretty sure -- 300k reads and 6k writes per second are only referencing tweets. Flock, Twitter's graph database, handles more than 20k writes and 100k reads per second on its own (peak numbers available from the two year old Readme on Github).

https://blog.twitter.com/2013/celebrating-twitter7

[+] webwanderings|12 years ago|reply

> Twitter knows a lot about you from who you follow and what links you click on.

No kidding. But we don't care as we live in the glass house of a celebrity culture.

PS: downvote the quoted text if you must. My point is not obvious.

[+] mjolk|12 years ago|reply

If you expect others to misunderstand you, perhaps try to better represent your thoughts?

[+] StavrosK|12 years ago|reply

You realize it's public data, right? When I tweet something, I don't expect it to be private. HN knows a lot about me too.

[+] Myrmornis|12 years ago|reply

The meaning of your metaphor certainly seems obvious; explicit even. Are you referring to a layer of more profound non-obviousness?

[+] jebblue|12 years ago|reply

>> it can take up to 5 minutes for a tweet to flow from Lady Gaga’s fingers to her 31 million followers

Why not break the load up among a farm of servers? 5 minutes to deliver a single message? It's too bad multicast can't be made to work for this use case.

At least analyze to see if there's a pattern of geographic concentration of her followers and optimize for where their datacenters are.

Use peer to peer, let the clients help distribute the messages.

[+] tschellenbach|12 years ago|reply

Their setup is very similar to what we use at Fashiolista. (though of course we only have millions and not hundreds of millions of users). We've open sourced our approach and you can see an early example here: https://github.com/tschellenbach/Feedly/

[+] pesenti|12 years ago|reply

Does anybody know how that compares to Facebook? I believe they do not use write fanout but instead rely on the search federation model (the model claimed not to have worked for Twitter).

[+] sheldoan|12 years ago|reply

Facebook uses a fan-out-on-read approach

http://www.quora.com/Facebook-Engineering/Did-Facebook-devel...

[+] kushti|12 years ago|reply

Scala is awesome language to implement scalable things, like Twitter services

[+] swah|12 years ago|reply

Why?

[+] bpicolo|12 years ago|reply

They have a lot of RAM. Dang.

[+] buro9|12 years ago|reply

Not as much as I would have thought.

2TB can be one server: http://www.supermicro.co.uk/products/system/5U/5086/SYS-5086...

An expensive server for sure... but it's just one server.

[+] druiid|12 years ago|reply

As others have said they don't actually have a lot of RAM dedicated to Redis. You can put 1.5TB of memory in many 'inexpensive' Dell servers (single machines, not clusters). So basically a cabinet of machines you could have over 30TB of memory available to you. Basically some of the design choices Twitter has made seem tailored to their (previous? Not sure if they have their own hardware now) choices to run on 'cloud' services in the past.

With good hardware and a bit of a budget you can easily scale to crazy numbers of processor cores and memory. That isn't to say the software side of the solution is going to be any easier to solve.

[+] binarycrusader|12 years ago|reply

That's really not a lot, especially when you consider that it's distributed among a cluster.

A single SPARC M5-32 can have up to 32TB of memory:

http://www.oracle.com/us/products/servers-storage/servers/sp...

[+] papsosouid|12 years ago|reply

Even the low end of single servers goes up to a couple TB of RAM. "Enterprise" hardware has been there for ages.

[+] akuchlous|12 years ago|reply

[deleted]

[+] papsosouid|12 years ago|reply

I really question the current trend of creating big, complex, fragile architectures to "be able to scale". These numbers are a great example of why, the entire thing could run on a single server, in a very straight forward setup. When you are creating a cluster for scalability, and it has less CPU, RAM and IO than a single server, what are you gaining? They are only doing 6k writes a second for crying out loud.

162 comments