I'm a little surprised so many machines are used to run Instagram. TechCrunch mentioned their peak has been 50 photo uploads per second (which they say go directly to S3, so Instagram's servers only need to pass a token). Of course there are other forms of requests, but just back of the envelope it seems like it should not require anywhere near "hundreds" of machines.
Not to be too harsh - it's just three engineers, so it makes sense if the setup is still evolving.
I was surprised they had so few... I once worked on a site with 1/6th the users and 3.5 times the number of instances.
They could do better but they'd have to manage their own datacenter and write portions of the app in C++. It's probably not worth it at this point unless they hire someone with that specific expertise.
I like posts like this a lot. I'm just a web designer, but I found scaling web sites fascinating, like some kind of dark art or secret craft.
Where do you learn this stuff? Do you need a CS Degree from stanford or something? I like the black magic aura, it's romantic, but I'd really like to understand how to scale websites doing stuff like the OP describes.
While scaling web sites is fascinating, for most people it is also unnecessary. The vast majority of web sites runs just fine on 1 computer. For example, hacker news runs not just on one computer, but actually on 1 core. So with a single 8 core box with ~100GB ram you can get quite far and save yourself a big hassle.
I don't get the impression that one would need a degree to devise the scaling strategies they're employing. This would seem more the product of battle-hardened experience rather than a formal education.
I know that this is probably a recruiting-inspired post, but detailed posts like this genuinely benefit the community. Thanks for specifically mentioning the reasons for choosing particular technologies (i.e. why you switched to Gunicorn from mod_wsgi) -- this makes the already excellent post even more helpful for someone trying to build things.
I guess my question is, how do they make money? I really like instagram images. I've used the site myself, but it certainly isn't something I'd feel the need to pay money for.
It won't be difficult to make money if they don't try to be clever. They can display ad just like Twitpic on the their website. They can have storage limit.
Those Quadruple Extra Large instances are $2/hr. The 24 of them used for postgres would be like $35k/month just for that part alone. I'm guessing they are spending >$100k/month on just hosting 100+ instances. Not to mention disk, bandwidth, dns, s3, public ip's, etc.
At ~35k/mo (they may have a deal here, though), that's the fully loaded headcount of 2-3 FTE devops people. In return, they get EC2's turnaround time on new instances. Not to mention that they're constantly pushing images to S3.
I would agree that EC2 isn't a no-brainer decision here, but it seems like a reasonable one.
One thing about how Instagram's load balancing that I don't like is that they rate-limit their proxies on image requests. In my recent testing, its roughly 5-6 requests every 3 seconds or so. Any requests more frequent than that return 503 status codes. I don't entirely understand why they do this, since their load balancer simply does 302 redirects to the S3-hosted image resource.
I can guess at some of the reasons, such as they didn't foresee a user loading more than a few images at once. Perhaps they perceive rate limiting as a protective measure.
However, I've done testing on Twitpic, imgur, and yfrog and haven't run into the same issues. Twitpic, for example, generates a lot more traffic than Instagram and they don't have the same rate-limiting.
> I don't entirely understand why they do this, since their load balancer simply does 302 redirects to the S3-hosted image resource.
S3 accesses cost money, so it makes sense that they'd rate limit access to them. A botnet hitting an S3 URL could incur large fees for the owner of the file very rapidly.
With that big a monthly AWS bill, I could pretty easily justify my salary and the costs of building out a 4 - 10 rack colo setup. With room leftover for a dba consultant on retainer and a pro-serv budget for ad-hoc stuff.
By my math, the bill for their app and database servers would be approaching $30,000 per month. That doesn't include storage costs, bandwidth, or any of the other aspects of their infrastructure.
[+] [-] jphackworth|14 years ago|reply
Not to be too harsh - it's just three engineers, so it makes sense if the setup is still evolving.
[+] [-] rdouble|14 years ago|reply
They could do better but they'd have to manage their own datacenter and write portions of the app in C++. It's probably not worth it at this point unless they hire someone with that specific expertise.
[+] [-] armandososa|14 years ago|reply
Where do you learn this stuff? Do you need a CS Degree from stanford or something? I like the black magic aura, it's romantic, but I'd really like to understand how to scale websites doing stuff like the OP describes.
[+] [-] ippisl|14 years ago|reply
[+] [-] jules|14 years ago|reply
[+] [-] tkahn6|14 years ago|reply
[+] [-] d_r|14 years ago|reply
[+] [-] latchkey|14 years ago|reply
[+] [-] gallerytungsten|14 years ago|reply
Server bill: $35k/month, $420k/year, per estimates in other comments.
Personnel, overhead, other expenses: $1.5M/year (guess).
Runway: 3.9 years to figure it out.
[+] [-] ell|14 years ago|reply
[+] [-] latchkey|14 years ago|reply
[+] [-] tptacek|14 years ago|reply
I would agree that EC2 isn't a no-brainer decision here, but it seems like a reasonable one.
[+] [-] foobarbazetc|14 years ago|reply
They could run their operation for 10-20% of their AWS costs at a dedicated server host. And everything would be much, much faster.
[+] [-] rkalla|14 years ago|reply
[+] [-] apu|14 years ago|reply
[+] [-] cadr|14 years ago|reply
[+] [-] SkyMarshal|14 years ago|reply
http://stackparts.com/
http://news.ycombinator.com/item?id=2993371
[+] [-] geuis|14 years ago|reply
I can guess at some of the reasons, such as they didn't foresee a user loading more than a few images at once. Perhaps they perceive rate limiting as a protective measure.
However, I've done testing on Twitpic, imgur, and yfrog and haven't run into the same issues. Twitpic, for example, generates a lot more traffic than Instagram and they don't have the same rate-limiting.
[+] [-] ceejayoz|14 years ago|reply
S3 accesses cost money, so it makes sense that they'd rate limit access to them. A botnet hitting an S3 URL could incur large fees for the owner of the file very rapidly.
[+] [-] mkjones|14 years ago|reply
[+] [-] cagenut|14 years ago|reply
[+] [-] sant0sk1|14 years ago|reply
[+] [-] clarkni5|14 years ago|reply
That's crazy, if you ask me.
[+] [-] mcginleyr1|14 years ago|reply
[+] [-] vidar|14 years ago|reply
[+] [-] rewiter2011|14 years ago|reply
[deleted]