Does anyone have numbers on memory bandwidth and latency?
The x1 cost per GB is about 2/3 that of r3 instances, but you get 4x as many memory channels if spec the same amount of memory via r3 instances so the cost per memory channel is more than twice as high for x1 as r3. DRAM is valuable precisely because of its speed, but the speed itself is not cost-effective with the x1. As such, the x1 is really for the applications that can't scale with distributed memory. (Nothing new here, but this point is often overlooked.)
Similarly, you get a lot more SSDs with several r3 instances, so the aggregate disk bandwidth is also more cost-effective with r3.
Not sure I quite understand your math here. The largest R3 instance is the r3.8xlarge with 244 GB of memory. 4 times of that would only get you to 1 TB. Also, this: "DRAM is valuable precisely because of its speed", is wrong (https://en.wikipedia.org/wiki/Dynamic_random-access_memory).
This is probably a dumb question, but what does the hardware of such a massive machine look like? Is it just a single server box with a single motherboard? Are there server motherboards out there that support 2 TB of RAM, or is this some kind of distributed RAM?
For example Dell sells 4U servers straight out of their webshop which max out at 96x32GB (that's 3TB) of RAM with 4 CPUs (max 18 cores/CPU => 72 cores total). They seem to have some (training?) videos on youtube that show the internals if you are curious:
We have some supermicros that have about 12TB RAM, but the built in fans sound like a jumbo jet taking off so consider the noise pollution for a second there.
Once upon a time I hacked on the AIX kernel which ran on POWER hardware (I think they're up to POWER8 or higher now). In my time there the latest hardware was POWER7-based. It maxed out at 48 cores (with 4-way hyperthreading giving you 192 logical cores) and a max of I think 32TB RAM. Not the same hardware as mentioned in the OP, but pretty big scale nonetheless.
I've seen these both opened up and racked up. They are basically split into max 4 rackmount systems, each I think was 2U IIRC. The 4 systems (max configuration) are connected together by a big fat cable, which is the interconnect between nodes in the Redbook I've linked above. The RAM was split 4 ways among the nodes, and NUMA really matters in these systems, since memory local to your nodes is much faster to access than memory across the interconnect.
This is what I observed about 5-6 years ago. I'm sure things have miniaturized further since then...
yeah, sure, you can get a quad xeon 2U server with 2TB of RAM for around $40K. Here's a sample configurator:
https://www.swt.com/rq2u.php
change the RAM and CPUs to your preference and add some flash.
No insight into what Amazon uses, but we've got HP DL980s (g7s, so they're OLD) with 4TB of RAM) and just started using Oracle x5-8 x86 boxes with 6TB of RAM 8 sockets. I believe 144 cores/288 threads.
That is pretty remarkable. One of the limitations of doing one's own version of mass analytics is the cost of acquiring, installing, configuring, and then maintaining the hardware. Generally I've found AWS to be more expensive but you get to "turn it on, turn it off" which is not something you can do when you have to pay monthly for data center space.
It makes for an interesting exercise to load in your data, do your analytics, and then store out the meta data. I wonder if the oil and gas people are looking at this for pre-processing their seismic data dumps.
Question for those who have used monster servers before:
Can PostgreSQL/MySQL use such type of hardware efficiently and scale up vertically? Also can MemCached/Redis use all this RAM effectively?
I am genuinely interested in knowing this. Most of the times I work on small apps and don't have access to anything more than 16GB RAM on regular basis.
Postgres scales great up to 256gb, at least with 9.4. After that it'll use it, but there's no real benefit. I don't know about MySQL. SQL Server scales linearly with memory even up to and past the 1TB point. I did encounter some NUMA node spanning speed issues, but numactl tuning fixed that.
I setup a handful of pgsql and Windows servers around this size. SQL Server at the time scaled better with memory. Pgsql never really got faster after a certain point, but with a lot of cores it handled tons of connections gracefully.
I don't work on 2TB+ memory servers, but one of my servers is close to 1TB of RAM.
PostgreSQL scales nicely here. Main thing you're getting is a huge disk cache. Makes repeated queries nice and fast. Still I/O bound to some extent though.
Redis will scale nicely as well. But it won't be I/O bound.
Honestly, if you really need 1TB+ it's usually going to be for numerically intensive code. This kind of code is generally written to be highly vectorizable so the hardware prefetcher will usually mask memory access latency and you get massive speedups by having your entire dataset in memory. Algorithms that can memoize heavily also benefit greatly.
I've used Postgres out to the terabyte+ range with no probs, so it all works fine. Of course, whenever you approach huge data sizes like this, it tends to change how you access the data a little. eg. Do more threads equal more user connections, or more parallel computation? Generally though, databases aren't really hindered by CPU, instead by the amount of memory in the machine and this new instance is huge.
No idea about MySQL, people tend to scale that out rather than up.
For MySQL, it depends a bit what you're hoping to get out of scaling.
Scaling for performance reasons: Past a certain point, many workloads become difficult to scale due to limitations in the database process scheduler and various internals such as auto increment implementation and locking strategy. As you scale up, it's common to spend increasing percentages of your time sitting on a spinlock, with the result that diminishing returns start to kick in pretty hard.
Scaling for dataset size reasons: Still a bit complex, but generally more successful. For example, to avoid various nasty effects from having to handle IO operations on very large files, you need to start splitting your tables out into multiple files, and the sharding key for that can be hard to get right. But MySQL
In short, it's not impossible, but you need to be very careful with your schema and query design. In practice, this rarely happens because it's usually cheaper (in terms of engineering effort) to scale out rather than up.
I dislike developing in Java. I am not a fanboy by any stretch of the imagination. That being said, someone who takes the time to understand how the JVM works and how to configure their processes with a proper operator's mindset can do amazing things in terms of resource usage.
It's easy to poke at Java for being a hog when in reality its just poor coding and operating practices that lead to bloated runtime behavior.
You jest, but think about how unbelievably painful it'd be to write a program that uses >1TB of RAM in C++ .... any bug that causes a segfault, div by zero, or really any kind of crash at all would mean you'd have to reload the entire dataset into RAM from scratch. That's gonna take a while no matter what.
You could work around it by using shared memory regions and the like but then you're doing a lot of extra work.
With a managed language and a bit of care around exception handling, you can write code that's pretty much invincible without much effort because you can't corrupt things arbitrarily.
Also, depending on the dataset in question you might find that things shrink. The latest HotSpots can deduplicate strings in memory as they garbage collect. If your dataset has a lot of repeated strings then you effectively get an interning scheme for free. I don't know if G1 can really work well with over 1TB of heap, though. I've only ever heard of it going up to a few hundred gigabytes.
Actually, as far as VMs go, the JVM is fairly spare in comparison with earlier versions of Ruby and Python -- on a per object basis. (Because of its Smalltalk roots. Yes, I had to get that in there. Drink!) That said, I've seen those horrors of cargo-cult imitation of the Gang of Four patterns, resulting in my having to instantiate 7 freaking objects to send one JMS message.
If practice in recent decades has taught us anything, it's that performance is found in intelligently using the cache. In a multi-core concurrent world, our tools should be biased towards pass by value, allocation on the stack/avoiding allocating on the heap, and avoiding chasing pointers and branching just to facilitate code organization.
EDIT: Or, as placybordeaux puts it more succinctly in a nephew comment, "VM or culture? It's the culture."
EDIT: It just occurred to me -- Programming suffers from a worship of Context-Free "Clever"!
Whether or not a particular pattern or decision is smart is highly dependent on context. (In the general sense, not the function call one.) The difficulty with programming, is that often context is very involved and hard to convey in media. As a result, a whole lot of arguments are made for or against patterns/paradigms/languages using largely context free examples.
This is why we end up in so many meaningless arguments akin to, "What is the ultimate bladed weapon?" That's simply a meaningless question, because the effectiveness of such items is very highly dependent on context. (Look up Matt Easton on YouTube.)
The analogy works in terms of the degree of fanboi nonsense.
A small word of caution: I'd strongly recommend against using a huge java heap size. Java GC is stop the world, and a huge java heap size can lead to hour long gc sessions. It's much better to store data in a memory mapped file that is off heap, and access accordingly. Still very fast.
I know that you are probably going to be modded into oblivion, but can Java address this much memory in a single application? I'm genuinely curious, as I would assume, depending on the OS that you'd have to run several (many) processes in order to even address that much ram effectively.
Still really cool to see something like this, I didn't even know you could get close to 2TB of ram in a single server at any kind of scale.
We've experiemnted with something similar on Google Cloud, where an instance that is considered dead has its IP address and persistent disks taken away, then attached to another (live or just created instance). It's hard to say whether this can recover from all failures however without having experienced them or even work better than what Google claims it already does (moving around failing servers from hardware to hardware). Anyone with practical experience in this type of recovery where you don't duplicate your resource requirements?
Funny how the title made me instantly think: SAP HANA. After not seeing it for the first 5 paragraphs or so, Ctrl+F, ah yes.
Not too surprising given how close SAP and Amazon AWS have been ever since SAP started offering cloud solutions. Going back a couple years when SAP HANA was still in its infancy; trying it on servers with 20~100+ TB of memory, this seems like an obvious progression.
Of course there's always the barrier of AWS pricing.
The pricing is surprisingly enough not terrible. Given that dedicated servers cost $1-1.5 per GB of RAM per month the three year price is actually almost reasonable.
That being said, a three year commitment is still hard to swallow compared to dedicated servers that are month-to-month.
I'm taking it this is so people can run NodeJS or MSSQL on AWS now? Heh, sorry for the jab - what could this be used for considering that AWS' top tier provisioned storage IOP/s are still so low (and expensive)?
[+] [-] jedbrown|10 years ago|reply
The x1 cost per GB is about 2/3 that of r3 instances, but you get 4x as many memory channels if spec the same amount of memory via r3 instances so the cost per memory channel is more than twice as high for x1 as r3. DRAM is valuable precisely because of its speed, but the speed itself is not cost-effective with the x1. As such, the x1 is really for the applications that can't scale with distributed memory. (Nothing new here, but this point is often overlooked.)
Similarly, you get a lot more SSDs with several r3 instances, so the aggregate disk bandwidth is also more cost-effective with r3.
[+] [-] sun_n_surf|10 years ago|reply
[+] [-] lovelearning|10 years ago|reply
[+] [-] zokier|10 years ago|reply
https://www.youtube.com/watch?v=vS47RVrfBvE main system board
https://www.youtube.com/watch?v=_poMPOUGRa0 memory risers
[+] [-] schlarpc|10 years ago|reply
Edit: Supermicro has several 2TB boards, and even some 3TB ones: http://www.supermicro.com/products/motherboard/Xeon1333/#201...
(Disclaimer: AWS employee, no relation to EC2)
[+] [-] technologia|10 years ago|reply
[+] [-] cbg0|10 years ago|reply
Sure, http://www.supermicro.com/products/motherboard/Xeon/C600/X10... supports 3TB in a 48 x 64GB DIMM configuration.
[+] [-] ereyes01|10 years ago|reply
This shows a logical diagram of how they cobble all these cores together: http://www.redbooks.ibm.com/abstracts/tips0972.html?Open
I've seen these both opened up and racked up. They are basically split into max 4 rackmount systems, each I think was 2U IIRC. The 4 systems (max configuration) are connected together by a big fat cable, which is the interconnect between nodes in the Redbook I've linked above. The RAM was split 4 ways among the nodes, and NUMA really matters in these systems, since memory local to your nodes is much faster to access than memory across the interconnect.
This is what I observed about 5-6 years ago. I'm sure things have miniaturized further since then...
[+] [-] dekhn|10 years ago|reply
[+] [-] rconti|10 years ago|reply
[+] [-] eip|10 years ago|reply
4 CPU, 60 cores, 120 threads (cloud cores), 3TB RAM, 90TB SSD, 4 x 40GB Ethernet, 4 RU. $120K.
Same price as the AWS instance for one year of on demand.
[+] [-] rodgerd|10 years ago|reply
[+] [-] zymhan|10 years ago|reply
[+] [-] lossolo|10 years ago|reply
[+] [-] rzzzt|10 years ago|reply
One of the links on the top points to a server with 96 DIMM slots, supporting up to 6 TB of memory in total.
[+] [-] mbesto|10 years ago|reply
[+] [-] KSS42|10 years ago|reply
http://www.diablo-technologies.com/memory1/
[+] [-] MasterScrat|10 years ago|reply
You could do exhaustive analysis on that dataset fully in memory.
[+] [-] ChuckMcM|10 years ago|reply
It makes for an interesting exercise to load in your data, do your analytics, and then store out the meta data. I wonder if the oil and gas people are looking at this for pre-processing their seismic data dumps.
[+] [-] 1024core|10 years ago|reply
[+] [-] dman|10 years ago|reply
[+] [-] pritambarhate|10 years ago|reply
Can PostgreSQL/MySQL use such type of hardware efficiently and scale up vertically? Also can MemCached/Redis use all this RAM effectively?
I am genuinely interested in knowing this. Most of the times I work on small apps and don't have access to anything more than 16GB RAM on regular basis.
[+] [-] chucky_z|10 years ago|reply
I setup a handful of pgsql and Windows servers around this size. SQL Server at the time scaled better with memory. Pgsql never really got faster after a certain point, but with a lot of cores it handled tons of connections gracefully.
[+] [-] alfalfasprout|10 years ago|reply
PostgreSQL scales nicely here. Main thing you're getting is a huge disk cache. Makes repeated queries nice and fast. Still I/O bound to some extent though.
Redis will scale nicely as well. But it won't be I/O bound.
Honestly, if you really need 1TB+ it's usually going to be for numerically intensive code. This kind of code is generally written to be highly vectorizable so the hardware prefetcher will usually mask memory access latency and you get massive speedups by having your entire dataset in memory. Algorithms that can memoize heavily also benefit greatly.
[+] [-] adwf|10 years ago|reply
No idea about MySQL, people tend to scale that out rather than up.
[+] [-] jfindley|10 years ago|reply
Scaling for performance reasons: Past a certain point, many workloads become difficult to scale due to limitations in the database process scheduler and various internals such as auto increment implementation and locking strategy. As you scale up, it's common to spend increasing percentages of your time sitting on a spinlock, with the result that diminishing returns start to kick in pretty hard.
Scaling for dataset size reasons: Still a bit complex, but generally more successful. For example, to avoid various nasty effects from having to handle IO operations on very large files, you need to start splitting your tables out into multiple files, and the sharding key for that can be hard to get right. But MySQL
In short, it's not impossible, but you need to be very careful with your schema and query design. In practice, this rarely happens because it's usually cheaper (in terms of engineering effort) to scale out rather than up.
[+] [-] vegancap|10 years ago|reply
[+] [-] granos|10 years ago|reply
It's easy to poke at Java for being a hog when in reality its just poor coding and operating practices that lead to bloated runtime behavior.
[+] [-] sievebrain|10 years ago|reply
You could work around it by using shared memory regions and the like but then you're doing a lot of extra work.
With a managed language and a bit of care around exception handling, you can write code that's pretty much invincible without much effort because you can't corrupt things arbitrarily.
Also, depending on the dataset in question you might find that things shrink. The latest HotSpots can deduplicate strings in memory as they garbage collect. If your dataset has a lot of repeated strings then you effectively get an interning scheme for free. I don't know if G1 can really work well with over 1TB of heap, though. I've only ever heard of it going up to a few hundred gigabytes.
[+] [-] scaleout1|10 years ago|reply
[+] [-] saosebastiao|10 years ago|reply
[+] [-] stcredzero|10 years ago|reply
If practice in recent decades has taught us anything, it's that performance is found in intelligently using the cache. In a multi-core concurrent world, our tools should be biased towards pass by value, allocation on the stack/avoiding allocating on the heap, and avoiding chasing pointers and branching just to facilitate code organization.
EDIT: Or, as placybordeaux puts it more succinctly in a nephew comment, "VM or culture? It's the culture."
EDIT: It just occurred to me -- Programming suffers from a worship of Context-Free "Clever"!
Whether or not a particular pattern or decision is smart is highly dependent on context. (In the general sense, not the function call one.) The difficulty with programming, is that often context is very involved and hard to convey in media. As a result, a whole lot of arguments are made for or against patterns/paradigms/languages using largely context free examples.
This is why we end up in so many meaningless arguments akin to, "What is the ultimate bladed weapon?" That's simply a meaningless question, because the effectiveness of such items is very highly dependent on context. (Look up Matt Easton on YouTube.)
The analogy works in terms of the degree of fanboi nonsense.
[+] [-] aaronkrolik|10 years ago|reply
[+] [-] tracker1|10 years ago|reply
Still really cool to see something like this, I didn't even know you could get close to 2TB of ram in a single server at any kind of scale.
[+] [-] 0xmohit|10 years ago|reply
Scala _beats_ Java in most of the benchmarks: http://benchmarksgame.alioth.debian.org/u64q/scala.html
[+] [-] krschultz|10 years ago|reply
[+] [-] realworldview|10 years ago|reply
[+] [-] Erwin|10 years ago|reply
We've experiemnted with something similar on Google Cloud, where an instance that is considered dead has its IP address and persistent disks taken away, then attached to another (live or just created instance). It's hard to say whether this can recover from all failures however without having experienced them or even work better than what Google claims it already does (moving around failing servers from hardware to hardware). Anyone with practical experience in this type of recovery where you don't duplicate your resource requirements?
[+] [-] zbjornson|10 years ago|reply
[+] [-] jayhuang|10 years ago|reply
Not too surprising given how close SAP and Amazon AWS have been ever since SAP started offering cloud solutions. Going back a couple years when SAP HANA was still in its infancy; trying it on servers with 20~100+ TB of memory, this seems like an obvious progression.
Of course there's always the barrier of AWS pricing.
[+] [-] amazon_not|10 years ago|reply
That being said, a three year commitment is still hard to swallow compared to dedicated servers that are month-to-month.
[+] [-] bdcravens|10 years ago|reply
[+] [-] 0xmohit|10 years ago|reply
[+] [-] manav|10 years ago|reply
[+] [-] micro-ram|10 years ago|reply
18(core) * 4(cpus) * 2(+ht) = 144
[+] [-] ben_jones|10 years ago|reply
"Ben we're not utilizing all the ram."
"Add another for loop."
[+] [-] mrmondo|10 years ago|reply
Something volatile running una RAM disk maybe?
[+] [-] samstave|10 years ago|reply
Thats amazing.
[+] [-] bdcravens|10 years ago|reply
[+] [-] tobz|10 years ago|reply