top | item 11722882

AWS X1 instances – 1.9 TB of memory

340 points| spullara | 10 years ago |aws.amazon.com | reply

180 comments

order
[+] jedbrown|10 years ago|reply
Does anyone have numbers on memory bandwidth and latency?

The x1 cost per GB is about 2/3 that of r3 instances, but you get 4x as many memory channels if spec the same amount of memory via r3 instances so the cost per memory channel is more than twice as high for x1 as r3. DRAM is valuable precisely because of its speed, but the speed itself is not cost-effective with the x1. As such, the x1 is really for the applications that can't scale with distributed memory. (Nothing new here, but this point is often overlooked.)

Similarly, you get a lot more SSDs with several r3 instances, so the aggregate disk bandwidth is also more cost-effective with r3.

[+] lovelearning|10 years ago|reply
This is probably a dumb question, but what does the hardware of such a massive machine look like? Is it just a single server box with a single motherboard? Are there server motherboards out there that support 2 TB of RAM, or is this some kind of distributed RAM?
[+] technologia|10 years ago|reply
We have some supermicros that have about 12TB RAM, but the built in fans sound like a jumbo jet taking off so consider the noise pollution for a second there.
[+] ereyes01|10 years ago|reply
Once upon a time I hacked on the AIX kernel which ran on POWER hardware (I think they're up to POWER8 or higher now). In my time there the latest hardware was POWER7-based. It maxed out at 48 cores (with 4-way hyperthreading giving you 192 logical cores) and a max of I think 32TB RAM. Not the same hardware as mentioned in the OP, but pretty big scale nonetheless.

This shows a logical diagram of how they cobble all these cores together: http://www.redbooks.ibm.com/abstracts/tips0972.html?Open

I've seen these both opened up and racked up. They are basically split into max 4 rackmount systems, each I think was 2U IIRC. The 4 systems (max configuration) are connected together by a big fat cable, which is the interconnect between nodes in the Redbook I've linked above. The RAM was split 4 ways among the nodes, and NUMA really matters in these systems, since memory local to your nodes is much faster to access than memory across the interconnect.

This is what I observed about 5-6 years ago. I'm sure things have miniaturized further since then...

[+] dekhn|10 years ago|reply
yeah, sure, you can get a quad xeon 2U server with 2TB of RAM for around $40K. Here's a sample configurator: https://www.swt.com/rq2u.php change the RAM and CPUs to your preference and add some flash.
[+] rconti|10 years ago|reply
No insight into what Amazon uses, but we've got HP DL980s (g7s, so they're OLD) with 4TB of RAM) and just started using Oracle x5-8 x86 boxes with 6TB of RAM 8 sockets. I believe 144 cores/288 threads.
[+] rodgerd|10 years ago|reply
I can stick 1.5 TB and two sockets in blades right now. Blades. Servers can carry a lot more, amd it's not even especially expensive.
[+] zymhan|10 years ago|reply
4 physical CPUs and 1.9TB of RAM is doable in a 4U server for sure, and possibly in a 2U. So, it just looks like a big server.
[+] lossolo|10 years ago|reply
Intel processor support up to 1536 GB of ram so basically 1.5 TB per processor.
[+] rzzzt|10 years ago|reply
I think I have picked this up from an earlier thread discussing huge servers: http://yourdatafitsinram.com/

One of the links on the top points to a server with 96 DIMM slots, supporting up to 6 TB of memory in total.

[+] mbesto|10 years ago|reply
IDK about AWS, but for SAP HANA, this is done via blades. I've seen 10 TB+.
[+] MasterScrat|10 years ago|reply
As a reference the archive of all Reddit comments from October 2007 to May 2015 is around 1 terabyte uncompressed.

You could do exhaustive analysis on that dataset fully in memory.

[+] ChuckMcM|10 years ago|reply
That is pretty remarkable. One of the limitations of doing one's own version of mass analytics is the cost of acquiring, installing, configuring, and then maintaining the hardware. Generally I've found AWS to be more expensive but you get to "turn it on, turn it off" which is not something you can do when you have to pay monthly for data center space.

It makes for an interesting exercise to load in your data, do your analytics, and then store out the meta data. I wonder if the oil and gas people are looking at this for pre-processing their seismic data dumps.

[+] 1024core|10 years ago|reply
Spot instances are about $13 - $19/hr, depending on zone. Not available in NorCal, Seoul, Sydney and a couple of other places.
[+] dman|10 years ago|reply
Going to comment out the deallocation bits in all my code now.
[+] pritambarhate|10 years ago|reply
Question for those who have used monster servers before:

Can PostgreSQL/MySQL use such type of hardware efficiently and scale up vertically? Also can MemCached/Redis use all this RAM effectively?

I am genuinely interested in knowing this. Most of the times I work on small apps and don't have access to anything more than 16GB RAM on regular basis.

[+] chucky_z|10 years ago|reply
Postgres scales great up to 256gb, at least with 9.4. After that it'll use it, but there's no real benefit. I don't know about MySQL. SQL Server scales linearly with memory even up to and past the 1TB point. I did encounter some NUMA node spanning speed issues, but numactl tuning fixed that.

I setup a handful of pgsql and Windows servers around this size. SQL Server at the time scaled better with memory. Pgsql never really got faster after a certain point, but with a lot of cores it handled tons of connections gracefully.

[+] alfalfasprout|10 years ago|reply
I don't work on 2TB+ memory servers, but one of my servers is close to 1TB of RAM.

PostgreSQL scales nicely here. Main thing you're getting is a huge disk cache. Makes repeated queries nice and fast. Still I/O bound to some extent though.

Redis will scale nicely as well. But it won't be I/O bound.

Honestly, if you really need 1TB+ it's usually going to be for numerically intensive code. This kind of code is generally written to be highly vectorizable so the hardware prefetcher will usually mask memory access latency and you get massive speedups by having your entire dataset in memory. Algorithms that can memoize heavily also benefit greatly.

[+] adwf|10 years ago|reply
I've used Postgres out to the terabyte+ range with no probs, so it all works fine. Of course, whenever you approach huge data sizes like this, it tends to change how you access the data a little. eg. Do more threads equal more user connections, or more parallel computation? Generally though, databases aren't really hindered by CPU, instead by the amount of memory in the machine and this new instance is huge.

No idea about MySQL, people tend to scale that out rather than up.

[+] jfindley|10 years ago|reply
For MySQL, it depends a bit what you're hoping to get out of scaling.

Scaling for performance reasons: Past a certain point, many workloads become difficult to scale due to limitations in the database process scheduler and various internals such as auto increment implementation and locking strategy. As you scale up, it's common to spend increasing percentages of your time sitting on a spinlock, with the result that diminishing returns start to kick in pretty hard.

Scaling for dataset size reasons: Still a bit complex, but generally more successful. For example, to avoid various nasty effects from having to handle IO operations on very large files, you need to start splitting your tables out into multiple files, and the sharding key for that can be hard to get right. But MySQL

In short, it's not impossible, but you need to be very careful with your schema and query design. In practice, this rarely happens because it's usually cheaper (in terms of engineering effort) to scale out rather than up.

[+] vegancap|10 years ago|reply
Finally, an instance made for Java!
[+] granos|10 years ago|reply
I dislike developing in Java. I am not a fanboy by any stretch of the imagination. That being said, someone who takes the time to understand how the JVM works and how to configure their processes with a proper operator's mindset can do amazing things in terms of resource usage.

It's easy to poke at Java for being a hog when in reality its just poor coding and operating practices that lead to bloated runtime behavior.

[+] sievebrain|10 years ago|reply
You jest, but think about how unbelievably painful it'd be to write a program that uses >1TB of RAM in C++ .... any bug that causes a segfault, div by zero, or really any kind of crash at all would mean you'd have to reload the entire dataset into RAM from scratch. That's gonna take a while no matter what.

You could work around it by using shared memory regions and the like but then you're doing a lot of extra work.

With a managed language and a bit of care around exception handling, you can write code that's pretty much invincible without much effort because you can't corrupt things arbitrarily.

Also, depending on the dataset in question you might find that things shrink. The latest HotSpots can deduplicate strings in memory as they garbage collect. If your dataset has a lot of repeated strings then you effectively get an interning scheme for free. I don't know if G1 can really work well with over 1TB of heap, though. I've only ever heard of it going up to a few hundred gigabytes.

[+] scaleout1|10 years ago|reply
When you suddenly realize that your "big" data is not really that big!. Who needs a Hadoop/Spark cluster when you can run one of these bad boys
[+] saosebastiao|10 years ago|reply
All I can think about is the 30 minute garbage collection pauses.
[+] stcredzero|10 years ago|reply
Actually, as far as VMs go, the JVM is fairly spare in comparison with earlier versions of Ruby and Python -- on a per object basis. (Because of its Smalltalk roots. Yes, I had to get that in there. Drink!) That said, I've seen those horrors of cargo-cult imitation of the Gang of Four patterns, resulting in my having to instantiate 7 freaking objects to send one JMS message.

If practice in recent decades has taught us anything, it's that performance is found in intelligently using the cache. In a multi-core concurrent world, our tools should be biased towards pass by value, allocation on the stack/avoiding allocating on the heap, and avoiding chasing pointers and branching just to facilitate code organization.

EDIT: Or, as placybordeaux puts it more succinctly in a nephew comment, "VM or culture? It's the culture."

EDIT: It just occurred to me -- Programming suffers from a worship of Context-Free "Clever"!

Whether or not a particular pattern or decision is smart is highly dependent on context. (In the general sense, not the function call one.) The difficulty with programming, is that often context is very involved and hard to convey in media. As a result, a whole lot of arguments are made for or against patterns/paradigms/languages using largely context free examples.

This is why we end up in so many meaningless arguments akin to, "What is the ultimate bladed weapon?" That's simply a meaningless question, because the effectiveness of such items is very highly dependent on context. (Look up Matt Easton on YouTube.)

The analogy works in terms of the degree of fanboi nonsense.

[+] aaronkrolik|10 years ago|reply
A small word of caution: I'd strongly recommend against using a huge java heap size. Java GC is stop the world, and a huge java heap size can lead to hour long gc sessions. It's much better to store data in a memory mapped file that is off heap, and access accordingly. Still very fast.
[+] tracker1|10 years ago|reply
I know that you are probably going to be modded into oblivion, but can Java address this much memory in a single application? I'm genuinely curious, as I would assume, depending on the OS that you'd have to run several (many) processes in order to even address that much ram effectively.

Still really cool to see something like this, I didn't even know you could get close to 2TB of ram in a single server at any kind of scale.

[+] krschultz|10 years ago|reply
A bit under $35,000 for the year.
[+] Erwin|10 years ago|reply
I'm curious about this AWS feature mentioned: https://aws.amazon.com/blogs/aws/new-auto-recovery-for-amazo...

We've experiemnted with something similar on Google Cloud, where an instance that is considered dead has its IP address and persistent disks taken away, then attached to another (live or just created instance). It's hard to say whether this can recover from all failures however without having experienced them or even work better than what Google claims it already does (moving around failing servers from hardware to hardware). Anyone with practical experience in this type of recovery where you don't duplicate your resource requirements?

[+] zbjornson|10 years ago|reply
How does this thing still only have 10 GigE (plus 10 dedicated to EBS)? It should have multiple 10 Gig NICs that could get it to way more than that.
[+] jayhuang|10 years ago|reply
Funny how the title made me instantly think: SAP HANA. After not seeing it for the first 5 paragraphs or so, Ctrl+F, ah yes.

Not too surprising given how close SAP and Amazon AWS have been ever since SAP started offering cloud solutions. Going back a couple years when SAP HANA was still in its infancy; trying it on servers with 20~100+ TB of memory, this seems like an obvious progression.

Of course there's always the barrier of AWS pricing.

[+] amazon_not|10 years ago|reply
The pricing is surprisingly enough not terrible. Given that dedicated servers cost $1-1.5 per GB of RAM per month the three year price is actually almost reasonable.

That being said, a three year commitment is still hard to swallow compared to dedicated servers that are month-to-month.

[+] bdcravens|10 years ago|reply
It's not just the commitment, but the fact you have to cough up $52,000-$98,000 up front.
[+] manav|10 years ago|reply
Hmm around $4/hr after a partial upfront. I'm guessing that upfront is going to be just about the cost of a server which is around $50k.
[+] micro-ram|10 years ago|reply
What happened to the other 16 threads?

18(core) * 4(cpus) * 2(+ht) = 144

[+] ben_jones|10 years ago|reply
I'd be guilty if I ever used something like this and under utilized the ram.

"Ben we're not utilizing all the ram."

"Add another for loop."

[+] mrmondo|10 years ago|reply
I'm taking it this is so people can run NodeJS or MSSQL on AWS now? Heh, sorry for the jab - what could this be used for considering that AWS' top tier provisioned storage IOP/s are still so low (and expensive)?

Something volatile running una RAM disk maybe?

[+] samstave|10 years ago|reply
~$4/hour???

Thats amazing.

[+] bdcravens|10 years ago|reply
That's the reserved pricing. You have to pay $52,000+ plus up front to get that price. Standard on-demand is over $13.
[+] tobz|10 years ago|reply
Only if you reserve them for three years, which carries an upfront cost of ~$98K.