We reduced the cost of building Mastodon at Twitter-scale by 100x

[+] RomanPushkin|2 years ago|reply

> ...10k lines of code. This is 100x less code than the ~1M lines Twitter

I wish I didn't see this comparison, which is not fair at all. Everyone in their right mind understands that the number of features is much less, that's why you have 10k lines.

Add large-scale distributed live video support at the top of that, and you won't get any close to 10k lines. It's only one of many many examples. I really wish you compare Mastodon to Twitter 0.1 and don't do false advertising

> 100M bots posting 3,500 times per second... to demonstrate its scale

I'm wondering why 100M bots post only 3500 times per second? Is it 3500 per second for each bot? Seems like it's not, since https termination will consume the most of resources in this case. So I'm afraid it's just not enough.

When I worked in Statuspage, we had support of 50-100k requests per second, because this is how it works - you have spikes, and traffic which is not evenly distributed. TBH, if it's only 3500 per second total, then I have to admit it is not enough.

[+] nathanmarz|2 years ago|reply

We're comparing just to the original consumer product, which is about the same as Mastodon is today. That's why we said "original consumer product" and not "Twitter's current consumer product".

Mastodon actually has more features than the original Twitter consumer product like hashtag follows, global timelines, and more sophisticated filtering/muting capabilities.

Some people argue it's not so expensive to build a scalable Twitter with modern tools, which is why we also included the comparison against Threads. That's a very recent data point showing how ridiculously expensive it is to build applications like this, and they didn't even start from scratch as Instagram/Meta already had infrastructure powering similar products.

[+] WheelsAtLarge|2 years ago|reply

We see this type of post regularly. Something like, "How I built a better <pick your app> clone by myself in a month." Well, no, usually it's just a bare skeleton with the least amount of functionality. Not only that, the software is the least of the functionality. The organizational structure around the app is what matters most to keep it going. It's an attention seeking ploy and the whole thing usually disappears real quick.

[+] hosh|2 years ago|reply

How much of Twitter’s code base is dedicated to things like security, compliance, and moderation?

Granted, a decentralized platform would eliminate some of those, just by being decentralized

[+] KingOfCoders|2 years ago|reply

The 100x less code always reminds me of a hypothetical Half Life 2 game engine. Then your code would be:

   StartHalfLife2LikeGame()

and you'd replace millions LOC with one line. If there is a perfect match between the framework and your app, there is no code. The more your app diverges from the ideal app the framework was written for, the more code you have.

[+] codedokode|2 years ago|reply

Twitter has many features, but not all of them are necessary. For example, there is no need to implement popups blocking the page and demanding a user to register, or detailed telemetry collecting user data from 20 different providers.

[+] codedokode|2 years ago|reply

> When I worked in Statuspage, we had support of 50-100k requests per second

Serving one cached HTML with Nginx and serving dynamically generated content, updating a database is a different thing.

[+] littlestymaar|2 years ago|reply

> Add large-scale distributed live video support at the top of that, and you won't get any close to 10k lines.

But Twitter isn't, and was never, about live video support: this is pure feature creep and that's how you get headcount inflation and a company that can be run for 17 years without making profit (AKA terrible business).

> When I worked in Statuspage, we had support of 50-100k requests per second

Having served 150kqps in the past as part of a very small team (3 back-end eng.), this isn't necessarily as big of a deal as you make it sound: it mostly depends on your workload and whether or not you need consistency (or even persistence at all) in your data.

In practice, building scalable system is hard mostly because it's hard to get the management forgot their vanity ideas that go against your (their, actually) system's scalability.

[+] anothernewdude|2 years ago|reply

It takes more self-control and effort to reduce the number of features to the ones that matter. Twitter having more features is a liability, not a benefit.

> Add large-scale distributed live video support at the top of that,

Why? For the love of all that is good and efficient, why? Why not have a separate platform for that? Or link to a different federated video service? Why does every platform need to do all the things?

[+] otikik|2 years ago|reply

Indeed. Add a single JavaScript dependency… you will get the banana, the gorilla holding the banana, the tree holding the gorilla, and the whole jungle.

[+] smcl|2 years ago|reply

Not to mention the phrase "x times less than" doesn't really make sense the way it's often used. For it to make sense you have to reinterpret it to mean something that it doesn't based on being the opposite of "x times more than" (which is also often misused).

[+] MikePlacid|2 years ago|reply

So

>> 100M bots posting 3,500 times per second...

and

> We used the OpenAI API to generate 50,000 statuses for the bots to choose from at random.

I wonder: 100M OpenAI bots talking to each other continuously and with much vigor - how is this affecting OpenAI’s uhm… intellect?

[+] dataangel|2 years ago|reply

I do C++ backend work in a non-web industry and this entire post is Greek to me. Even though this is targeted at developers, you need a better pitch. I get "we did this 100x faster" but the obvious followup question is "how" but then the answer seems to be a ton of flow diagrams with way too many nodes that tell me approximately nothing and some handwaving about something called P-States that are basically defined to be entirely nebulous because they are any kind of data structure.

I'm not saying there's nothing here, but I am adjacent to your core audience and I have no idea whether there is after reading your post. I think you are strongly assuming a shared basis where everybody has worked on the same kind of large scale web app before; I would find it much more useful to have an overview of, "This what you would usually do, here are the problems with it, here is what we do instead" with side by side code comparison of Rama vs what a newbie is likely to hack together with single instance postgres.

[+] sdwr|2 years ago|reply

In a typical architecture, the DB stores data, and the backend calls the DB to make updates and compile views.

Here, the "views" are defined formally (the P-states), and incrementally, automatically updated when the underlying data changes.

Example problem:

Get a list of accounts that follow account 1306

"Classic architecture":

- Naive approach. Search through all accounts follow lists for "1306". Super slow, scales terribly with # of accounts.

- Normal approach. Create a "followed by" table, update it whenever an account follows / unfollows / is deleted / is blocked.

Normal sounds good, but add 10x features, or 1000x users, and it gets trickier. You need to make a new table for each feature, and add conditions to the update calls, and they start overlapping... Or you have to split the database up so it scales, but then you have to pay attention to consistency, and watch which order stuff gets updated in.

Their solution is separating the "true" data tables from the "view" tables, formally defining the relationship between the two, and creating the "view" tables magically behind the scenes.

[+] ldayley|2 years ago|reply

Nathan Marz created Apache Storm, coauthored the book "Big Data", and founded an early real-time infrastructure team at Twitter. It's likely the 'curse of knowledge' of working on this specific problem for so long is responsible for the unique and/or unfamiliar style of communication here.

EDIT: Specifics

[+] falsandtru|2 years ago|reply

> Whereas Twitter stores home timelines in a dedicated in-memory database, in Rama they’re stored in-memory in the same processes executing the ETL for timeline fanout. So instead of having to do network operations, serialization, and deserialization, the reads and writes to home timelines in our implementation are literally just in-memory operations on a hash map. This is dramatically simpler and more efficient than operating a separate in-memory database. The timelines themselves are stored like this:

> To minimize memory usage and GC pressure, we use a ring buffer and Java primitives to represent each home timeline. The buffer contains pairs of author ID and status ID. The author ID is stored along with the status ID since it is static information that will never change, and materializing it means that information doesn’t need to be looked up at query time. The home timeline stores the most recent 600 statuses, so the buffer size is 1,200 to accommodate each author ID and status ID pair. The size is fixed since storing full timelines would require a prohibitive amount of memory (the number of statuses times the average number of followers).

> Each user utilizes about 10kb of memory to represent their home timeline. For a Twitter-scale deployment of 500M users, that requires about 4.7TB of memory total around the cluster, which is easily achievable.

Isn't this where the most difficult(expensive) part is and Rama has little to do with it? It appears that the other parts also do not have to be Rama.

[+] jwmoz|2 years ago|reply

Agreed, just reading through half of it I have no idea what Rama is.

[+] buro9|2 years ago|reply

Measuring "Twitter Scale" by tweets per second seems to be not how I would measure it.

Updates per second to end users who follow the 7K tweets per second seems more realistic, it's the timelines and notifications that hurt, not the top of ingest tweets per second prior to the fan out... and then of course it's whether you can do that continuously so as not to back up on it.

[+] mping|2 years ago|reply

Congrats on the (kinda) launch. I was curious to see what you guys were up to. The blog post is pretty detailed, and with good insights. Reducing modern app development complexity to mixing data structures sounds like a good abstraction. I'm sure you thought really hard about the building blocks of Rama and you know your problems better than most of the hn crowd.

Now, the really hard part becomes selling. If companies start using your product to get ahead, that will be the real proof, otherwise its "just" tech that is good on paper.

On a side note, did you guys got any inspiration from clojure? I see lots of interesting projects propping up from clojure people...

Best of luck!

[+] nathanmarz|2 years ago|reply

Rama is written in Clojure :)

[+] Pxtl|2 years ago|reply

I've seen many people describe frameworks like this - you know, first you have the slow back-end event-driven master database that you don't query live against, then you've got eventual-consistency flows against the various data-warehouses and data-stores and partitioned sharded databases in useful query-friendly layouts that you actually read live from... and I never see it clearly explained: how do you read a change back to the user literally just after they made the change? How do you say "other views eventual-consistency is fine but for this view of this bit of info we need it updated now".

This write-up is very detailed but I couldn't find that explanation.

[+] jedberg|2 years ago|reply

The short answer is write-through cache.

You write the update directly to the cache closest to the user and into the eventually consistent queue.

We did this at reddit. When you make a comment the HTML is rendered and put straight into the cache, and the raw text is put into the queue to go into the database. Same with votes. I suspect they do this client side now, which is now the closest cache to the user, but back then it was the server cache.

[+] sixo|2 years ago|reply

I imagine you get some UUID back from your write, and effectively "block" until you see it committed to the event stream. The intent of such a system is certainly for the read-after-write latency to be not much longer than a traditional RDBMS. (This is roughly what the RDBMS is doing under the hood anyway.) Probably you can isolate latency-critical paths so they don't get stuck behind big stream processing jobs.

The advantage of the overall architecture is that nearly all application functionality (for something like a social network) can tolerate much higher latency than an RDBMS, so you really want to have architectural building blocks that let you actually use this headroom.

[+] jokethrowaway|2 years ago|reply

You can hack it and optimistically render the data you know about because your client created it - on the frontend, at no additional cost.

[+] chubot|2 years ago|reply

Yeah definitely, these ideas always sound very appealing to me, in theory -- I almost wonder why nobody has built it before

e.g. they mention "event sourcing" and "materialized views" in the post -- sounds good

But I thought I heard from a few people who were like "we ripped event sourcing" out of our codebase and so forth

And yeah your question is an obvious good one, and the Reddit answer of "write through cache" ... is less than satisfying to me

I FREQUENTLY have the problem where I reload the page and Reddit shows me stale data. It's SUPER buggy.

---

Anyway I definitely look forward to hearing people try this and what their longer term impressions are !

I basically want to know what the tradeoffs are -- it sounds good, but there are always tradeoffs

So is the tradeoff "eventual consistency" ? What are the other tradeoffs?

[+] Groxx|2 years ago|reply

One strategy (somewhat common in lambda architectures) is to query both the long-term store and the in-flight operations, and blend the results. The in-flight stuff is both small and already in memory so it's pretty often trivially fast, even if blending the data is relatively complex.

That does limit you to operations/queries you can describe in this dual format, but pretty often that's fine. Or if you can relax read-after-write you can just ignore the in-flight stuff and read from the main store and then there are no (added) limitations.

[+] lossolo|2 years ago|reply

You have the option to track the latest update time and, during the minute immediately following this update, direct all reads to come from the leader. Additionally, you could oversee the replication lag among followers and block queries on any follower that lags more than a minute behind the leader.

For the client, it's feasible to retain the timestamp of its most recent write. In this way, the system can ensure that the replica responsible for any reads related to that user incorporates updates at minimum up to that recorded timestamp. If a replica isn't adequately current, the read can either be managed by another replica or the query can wait until the replica catches up. The timestamp might take the form of a logical timestamp, signifying the order of writes (e.g., log sequence number), or it could be based on the actual system clock, where synchronized clocks become vital.

When your replicas are spread across multiple datacenters—whether for user proximity or enhanced availability—there's an added layer of complexity. Requests requiring the leader's involvement must be directed to the datacenter housing the leader.

[+] throw14082020|2 years ago|reply

It's called "read-after-write consistency". Write-through cache is one way, or just use a strongly consistent database, haha.

https://avikdas.com/2020/04/13/scalability-concepts-read-aft...

https://en.wikipedia.org/wiki/Cache_%28computing%29#Writing_...

https://cloud.google.com/blog/products/databases/why-you-sho...

[+] unknown|2 years ago|reply

[deleted]

[+] softwaredoug|2 years ago|reply

It’s a massive ask, even if the platform was 100x better, for all developers to give up every programming language and database they’ve ever used to depend on a startups closed source platform for all functionality.

It’s hard enough trusting Google or Amazons cloud offerings won’t change.

It seems that’s what they’re proposing right? What am I missing?

[+] nathanmarz|2 years ago|reply

We're actually not asking anyone to give up anything. First off, it has a simple integration API (which you'll be able to see the details of next week) that allows it to seamlessly integrate with any other backend tool (databases, monitoring systems, queues, etc.). So Rama can be incrementally introduced into any existing architecture.

Second, Rama has a pure Java API and is not a bespoke language. So no new language needs to be learned.

[+] afro88|2 years ago|reply

Looks amazing and incredibly smart. But I found the LOC and implementation time comparisons to Twitter and Threads very disingenuous. It makes me wonder what other wool will be pulled over our eyes with Rama in future (or important real world details missed / future footguns).

Still super impressive. Reminds me of when I discovered Elixir while building a social-ish music discovery app. Switching the backend from Rails to Elixir felt like putting on clothes that actually fit after wearing old sweats. Rama looks like a similar jump, but another layer up, encompassing system architecture.

[+] StephenAmar|2 years ago|reply

+1 the comparisons are not great. How much engineer-hours did it take to build Rama itself?

The numbers they got for Twitter likely include the time it took to build their infrastructure, common libraries (like finagle,…)

[+] softwaredoug|2 years ago|reply

It’s hard to construct a true randomized control trial for software engineering methods. People make many claims about programming paradigms or tools hard to validate.

It’s also unsure what we would compare a tool like this to. I doubt you could just say “compare it to Rails” given how frameworks like rails are bound to specific data models, and most realistic applications. You’d have to compare it to some other opinion about how to wire together different data structures.

[+] sharms|2 years ago|reply

The performance on the example Mastodon instance is very responsive - almost anywhere I clicked loaded nearly instantly. I created an account and the only thing I found missing was it doesn't implement full text search unless my user was tagged, but that might be a Mastodon specific item.

I think they have thought a lot about typical hard problems, such as having the timeline processing happen along side the pipeline, taking network / storage etc out of the picture. Nice work!

[+] jitl|2 years ago|reply

This architecture seems very similar to existing offerings in the "in-memory data grid" category, like Apache Ignite and Hazelcast. I'm more familiar with Ignite (I built a toy Notion backend with it over a few afternoons in 2020).

The way Ignite works overall is similar. You make a cluster of JVM processes, your data partitioned and replicated across the cluster, and you upload some JARs of business logic to the cluster to do things. Your business logic can specify locality so it runs on the same nodes as the relevant data, which ideally makes things a lot faster compared to systems where you need to pull all your data across the wire from a DB. Like Rama, Ignite uses a Java API for everything, including serializing and storing plain 'ol java objects.

Ignite's architecture isn't focused on "ETL" into "PStates". Instead it's more about distributed "caches" of data. It does have streaming for ingestion (https://ignite.apache.org/docs/latest/data-streaming), but you can transactionally update the datastore directly (https://ignite.apache.org/docs/latest/key-value-api/transact...). It also has a "continuous query" feature for those reactive queries to retrieve data (https://ignite.apache.org/docs/latest/key-value-api/continuo...).

Rama's data-structure oriented PState index seems easier to work with than building indexes yourself on top of Ignite's KV cache, but Ignite also offers an SQL language, so you can insert your data into the KV cache however, add some custom SQL functions, and then accept more flexible SQL querying of your data compared to the very purpose-built PCache things, but still be able to do lower-level or more performance-oriented logic with data locality.

Anyways, if you like some of this stuff but want to use an existing, already battle-tested open source project, you can look for these "in-memory data grid", "distributed cache", kind of projects. There's a few more out there that have similar JVM cluster computing models.

[+] theptip|2 years ago|reply

Hazelcast has been on my list to explore for a while. Anyone have pointers to a good sample project / deep-dive in the same sort of spirit as the OP here?

Also would love to hear folks’ thoughts on the sort of usecase where this data grid excels.

[+] clusterhacks|2 years ago|reply

I'm excited to see the docs for Rama. But I am also a little scared of the comment " I came to suspect a new programming paradigm was needed" from Nathan.

It's not so much that I think the comment is wrong or anything, but rather that it seems so similar to what I have heard in the past from power-lisp (or Clojure in this case) super-smart engineers.

I feel like we have reached a point in software development where "better" paradigms don't necessarily gain much adoption. But if Rama wins in the marketplace, that will be interesting. And I am quite excited to see what a smart tech leader and good team have been able to grind out given a years-long timeframe in this programming platform space . . .

[+] nathanmarz|2 years ago|reply

This is why we exposed Rama as a Java API rather than Clojure or our internal language (which is defined with Clojure macros, so it's technically also Clojure). Rama's Java dataflow API is effectively a subset of our internal language, with operations like "partitioners" being implemented using continuations.

[+] ThinkBeat|2 years ago|reply

I am confused.

This is meant to be hyped to sell your Rama platform/product/framework? That you have spent 10 years building in secret? During that time you have built a datastore and a Kafke competitor and ?

Should not those 10 years be factored into the time it took to develop this technical demo?

Is it 100x less code including every LOC in all of Rama?

I mean I am sure you picked a use cast that is well suited to creating a Twitterish architecture implementation.

If I went off and wrote a ThinkBeat platform for creating Twitterish systems and then created a Twitterish implementation on top if it, its real easy to reach low LOCs.

[+] skybrian|2 years ago|reply

It sounds like interesting technology for someone, but I wonder more about scaling down. What does a developer instance running on a laptop look like?

[+] failuser|2 years ago|reply

Is there a breakdown of effort Twitter spent doing the mastodon-level service (serving a feed of the accounts you are subscribed to) vs everything else like ads, algorithmic feed, moderation, fighting spam, copyright claims, localization, GR, PR, safety, etc?

[+] miki123211|2 years ago|reply

Is this just me, or does the code in the post feel like they've implemented what should have been a new programming language on top of Java?

Their "variables" have names that you have to keep as Java strings and pass to random functions. If you want composable code, you don't declare a function, you call .macro(). For control flow and loops, you don't use if and for, but a weird abstraction of theirs.

I feel like this code could have been a lot simpler if it was written in a specialized language (or a mainstream language with a specialized transpiler and/or Macro capabilities.)

I'd quote the old adage about every big program containing a slow and buggy implementation of Common Lisp, but considering that this thing is written in Clojure, the authors have probably heard it before.

[+] kyle-rb|2 years ago|reply

Kinda disappointed by the simulation, where are all the viral posts?

I've been digging around for a while and haven't found any posts with more than 20 faves. The accounts I've found with ~1 million followers have little to no engagement. I want to see how a post with a million faves holds up to the promises of "fast constant time".

I'm especially curious about these queries — fave-count and has-user-faved — since a couple years ago Twitter stopped checking has-user-faved when rendering posts more than a month or so old, so I imagine it was expensive at scale.

[+] NoraCodes|2 years ago|reply

I would argue that this is not "a Mastodon instance", since it is not running Mastodon - other than that, very very neat work! I'm excited for that "Source Code" link to be live :)

[+] gfodor|2 years ago|reply

Something I'm immediately thinking about with this is change management and inertia at the early stages of a new, underdefined project. Less code is great, the big question is how such a system compares to the usual hack-and-slash method of getting a v1 up and running as you search for PMF from the perspectives of ops, cost, data migrations, rapid deployments, and so on. Presumably, the idea here is to start from the beginning with Rama, skipping over the usual "monolith fetches from RDBMS" happy paths, even for your basic prototype, this way you don't slip into a situation like Twitter did where that grew slowly into an unscalable monstrosity requiring a rewrite. So an article focused on the "easy" part that's required in the beginning of rapid change, as much as it's not as important as the "simple" part that shines later at scale, seems useful.

[+] nathanmarz|2 years ago|reply

Thanks, this is a good idea for a another post.

The basic operation Rama provides for evolving an application over time is "module update". This lets you update the code for an existing module, including adding new depots, PStates, and topologies.

[+] yayitswei|2 years ago|reply

For context, nathanmarz created what is now Apache Storm, which is used for stream processing at some of the world's largest companies, so he knows a thing or two about scale.

[+] unknown|2 years ago|reply

[deleted]

[+] duped|2 years ago|reply

This is what they've been hyping on Twitter for a week?

FWIW, why hype at all? Why "We'll more in a week. Then more in two weeks." Show the code today!

355 comments