top | item 3306027

What Powers Instagram: Hundreds of Instances, Dozens of Technologies

237 points| systrom | 14 years ago |instagram-engineering.tumblr.com

41 comments

order
[+] jphackworth|14 years ago|reply
I'm a little surprised so many machines are used to run Instagram. TechCrunch mentioned their peak has been 50 photo uploads per second (which they say go directly to S3, so Instagram's servers only need to pass a token). Of course there are other forms of requests, but just back of the envelope it seems like it should not require anywhere near "hundreds" of machines.

Not to be too harsh - it's just three engineers, so it makes sense if the setup is still evolving.

[+] rdouble|14 years ago|reply
I was surprised they had so few... I once worked on a site with 1/6th the users and 3.5 times the number of instances.

They could do better but they'd have to manage their own datacenter and write portions of the app in C++. It's probably not worth it at this point unless they hire someone with that specific expertise.

[+] armandososa|14 years ago|reply
I like posts like this a lot. I'm just a web designer, but I found scaling web sites fascinating, like some kind of dark art or secret craft.

Where do you learn this stuff? Do you need a CS Degree from stanford or something? I like the black magic aura, it's romantic, but I'd really like to understand how to scale websites doing stuff like the OP describes.

[+] jules|14 years ago|reply
While scaling web sites is fascinating, for most people it is also unnecessary. The vast majority of web sites runs just fine on 1 computer. For example, hacker news runs not just on one computer, but actually on 1 core. So with a single 8 core box with ~100GB ram you can get quite far and save yourself a big hassle.
[+] tkahn6|14 years ago|reply
I don't get the impression that one would need a degree to devise the scaling strategies they're employing. This would seem more the product of battle-hardened experience rather than a formal education.
[+] d_r|14 years ago|reply
I know that this is probably a recruiting-inspired post, but detailed posts like this genuinely benefit the community. Thanks for specifically mentioning the reasons for choosing particular technologies (i.e. why you switched to Gunicorn from mod_wsgi) -- this makes the already excellent post even more helpful for someone trying to build things.
[+] latchkey|14 years ago|reply
I guess my question is, how do they make money? I really like instagram images. I've used the site myself, but it certainly isn't something I'd feel the need to pay money for.
[+] gallerytungsten|14 years ago|reply
Funding Total: $7.5M (per techcrunch)

Server bill: $35k/month, $420k/year, per estimates in other comments.

Personnel, overhead, other expenses: $1.5M/year (guess).

Runway: 3.9 years to figure it out.

[+] ell|14 years ago|reply
It won't be difficult to make money if they don't try to be clever. They can display ad just like Twitpic on the their website. They can have storage limit.
[+] latchkey|14 years ago|reply
Those Quadruple Extra Large instances are $2/hr. The 24 of them used for postgres would be like $35k/month just for that part alone. I'm guessing they are spending >$100k/month on just hosting 100+ instances. Not to mention disk, bandwidth, dns, s3, public ip's, etc.
[+] tptacek|14 years ago|reply
At ~35k/mo (they may have a deal here, though), that's the fully loaded headcount of 2-3 FTE devops people. In return, they get EC2's turnaround time on new instances. Not to mention that they're constantly pushing images to S3.

I would agree that EC2 isn't a no-brainer decision here, but it seems like a reasonable one.

[+] foobarbazetc|14 years ago|reply
Every time I see numbers like this, I wonder why everyone seems to think you have to use AWS or else you've failed at scaling.

They could run their operation for 10-20% of their AWS costs at a dedicated server host. And everything would be much, much faster.

[+] rkalla|14 years ago|reply
Instagram isn't paying on-demand prices, 3yr reserved is 48% cheaper than on-demand.
[+] geuis|14 years ago|reply
One thing about how Instagram's load balancing that I don't like is that they rate-limit their proxies on image requests. In my recent testing, its roughly 5-6 requests every 3 seconds or so. Any requests more frequent than that return 503 status codes. I don't entirely understand why they do this, since their load balancer simply does 302 redirects to the S3-hosted image resource.

I can guess at some of the reasons, such as they didn't foresee a user loading more than a few images at once. Perhaps they perceive rate limiting as a protective measure.

However, I've done testing on Twitpic, imgur, and yfrog and haven't run into the same issues. Twitpic, for example, generates a lot more traffic than Instagram and they don't have the same rate-limiting.

[+] ceejayoz|14 years ago|reply
> I don't entirely understand why they do this, since their load balancer simply does 302 redirects to the S3-hosted image resource.

S3 accesses cost money, so it makes sense that they'd rate limit access to them. A botnet hitting an S3 URL could incur large fees for the owner of the file very rapidly.

[+] mkjones|14 years ago|reply
Glad to see other people using vmtouch. It's also great for keeping large codebases in the filesystem cache on [shared] dev machines.
[+] cagenut|14 years ago|reply
With that big a monthly AWS bill, I could pretty easily justify my salary and the costs of building out a 4 - 10 rack colo setup. With room leftover for a dba consultant on retainer and a pro-serv budget for ad-hoc stuff.
[+] sant0sk1|14 years ago|reply
That's a lot of instances! It'd be interesting to run the numbers and get an idea of what their monthly AWS bill looks like.
[+] clarkni5|14 years ago|reply
By my math, the bill for their app and database servers would be approaching $30,000 per month. That doesn't include storage costs, bandwidth, or any of the other aspects of their infrastructure.

That's crazy, if you ask me.

[+] mcginleyr1|14 years ago|reply
For their load balances, why aren't they assigning elastic ip. Then they would have to wait for DNS just reassign the ip...
[+] vidar|14 years ago|reply
What was your take on Gunicorn over uWsgi?