The idea of being able to "rent" a supercomputer at a moment's notice (relative to buying one, at least) is interesting. Makes one wonder; what % of Amazon's capacity is this; and how high up the Top500 would all of it be?
In the article they said the 30K core machine would be #42 in the Top500. Except it wasn't submitted as a candidate.
It would be interesting to have a super computer config in one's EC2 portfolio for the quick CFD run, or animation sequence.
It is the latter which I find an interesting question. How hard would it be to create an 'on the fly' CGI studio? Something where your artists and their workstations are around 24/7 but when it comes time to render you pop it off to EC2 and just bill the studio for the time. That could be pretty disruptive in that business.
A mere 1000-core cluster should cost a smallish fraction of a programmer's hourly rate. How to use this power to improve your productivity by at least that much? I've been wondering this for awhile.
At the that scale, CPU and developer time are mostly orthogonal. If you need to crunch 1000x more numbers, you won't be able to program your way out of the problem. If your code is 1000x inefficient, it should be easy to optimize a bit. If you really need 1000 machines for just one hour, setup costs dwarf compute costs. If you need 1000 machines for a year, you will need someone to administer the work and wrangle the data.
> If you created a 30,000-core cluster in a data center, that would cost you $5 million, $10 million, and you’d have to pick a vendor, buy all the hardware, wait for it to come, rack it, stack it, cable it, and actually get it working. You’d have to wait six months, 12 months before you go it running.
If you've got bursting needs for a supercomputer - simulations that need to run for a couple days, maybe - it's going to be much, much more cost effective this way.
Nope, 1279$ * 98% up time * 24 hours a day * 365 days a year = 11.2 million a year. Now, you can workout the HW split on an actual super computer but my guess is its' not that far off on raw compute power but has terrible interconnects and zero resell value.
The real question is why did they not just buy time on an actual super computer? My guess is super computers tend to have interconnects that cost a lot of cash and are overkill for many workloads. So amazon is probably cost competitive for their workloads and more importantly more flexible.
$200/kW delivered gets you wholesale Colo space. That includes the building, the power, the cooling, and the cage.
Modern processors are 12 cores/machine (2 sockets, 6 cores per socket), so 30K cores is 30K/12 or 2500 machines. Assuming you use 2U sized machines you get 20 machines / standard 'four post' telco rack or 125 racks. Now power for that rack is probably about 8Kw/rack used unless you're using the GPU in each machine, we'll assume you aren't since Amazon doesn't have that option. So 125 racks @ 8kW per is 1000kW of power.
Now we can get the monthly cost of keeping those lit up to be 200 * 1000 or $200K/month.
There are 720 hours in a 30 day month, so $200K/720 hrs is $278/hr to run your own.
We'll assume your staff costs are nearly the same since you've got someone to run those Amazon instances and they are going to run these machines. We'll also assume that every single one of them is exactly identical configuration wise so your system administration overhead boils down to making sure each machine gets the right host name and IP address.
There are other costs that are built into the Amazon number, but you've $1000/hr head start on them in terms of your local costs.
That being said, the point about going "blam!" and they are there and then "blam!" and they are gone, should not be underestimated. The opportunity cost is large if you don't have a setup like this already.
[+] [-] ConstantineXVI|14 years ago|reply
[+] [-] ChuckMcM|14 years ago|reply
It would be interesting to have a super computer config in one's EC2 portfolio for the quick CFD run, or animation sequence.
It is the latter which I find an interesting question. How hard would it be to create an 'on the fly' CGI studio? Something where your artists and their workstations are around 24/7 but when it comes time to render you pop it off to EC2 and just bill the studio for the time. That could be pretty disruptive in that business.
[+] [-] abecedarius|14 years ago|reply
[+] [-] gujk|14 years ago|reply
[+] [-] cosmez|14 years ago|reply
i dont know anything about supercomputers, but is that really cheap?
[+] [-] ceejayoz|14 years ago|reply
> If you created a 30,000-core cluster in a data center, that would cost you $5 million, $10 million, and you’d have to pick a vendor, buy all the hardware, wait for it to come, rack it, stack it, cable it, and actually get it working. You’d have to wait six months, 12 months before you go it running.
If you've got bursting needs for a supercomputer - simulations that need to run for a couple days, maybe - it's going to be much, much more cost effective this way.
[+] [-] Retric|14 years ago|reply
The real question is why did they not just buy time on an actual super computer? My guess is super computers tend to have interconnects that cost a lot of cash and are overkill for many workloads. So amazon is probably cost competitive for their workloads and more importantly more flexible.
[+] [-] ChuckMcM|14 years ago|reply
$200/kW delivered gets you wholesale Colo space. That includes the building, the power, the cooling, and the cage.
Modern processors are 12 cores/machine (2 sockets, 6 cores per socket), so 30K cores is 30K/12 or 2500 machines. Assuming you use 2U sized machines you get 20 machines / standard 'four post' telco rack or 125 racks. Now power for that rack is probably about 8Kw/rack used unless you're using the GPU in each machine, we'll assume you aren't since Amazon doesn't have that option. So 125 racks @ 8kW per is 1000kW of power.
Now we can get the monthly cost of keeping those lit up to be 200 * 1000 or $200K/month.
There are 720 hours in a 30 day month, so $200K/720 hrs is $278/hr to run your own.
We'll assume your staff costs are nearly the same since you've got someone to run those Amazon instances and they are going to run these machines. We'll also assume that every single one of them is exactly identical configuration wise so your system administration overhead boils down to making sure each machine gets the right host name and IP address.
There are other costs that are built into the Amazon number, but you've $1000/hr head start on them in terms of your local costs.
That being said, the point about going "blam!" and they are there and then "blam!" and they are gone, should not be underestimated. The opportunity cost is large if you don't have a setup like this already.
[+] [-] nobody314159|14 years ago|reply
[deleted]