Liquid cooling next-gen servers getting hands-on

[+] KaiserPro|4 years ago|reply

> Data center liquid cooling is going to happen

I mean yeah, but I doubt its going to be mainstream anytime soon.

At the moment its cheaper to just use forced air, and yolo it. Running at half density is expensive, but not as expensive as backfilling everything with liquid cooling.

Also given that we've been able to run 2 socket blades at full bore without liquid cooling kinda suggests its not actually needed.

Having radiators on the front/back of the rack directly works really well. We had it on our render farm. combine that with enforced hot isle/cold isle and you can reduce the need for aircon dramatically, without multiplying your leak risk by at least 96 times.

The massive problem here is that its really difficult to hotswap anything. Those cooling pipes need to be removed before you can pull out the server. Unless they are self sealing (like hydrolic lines) then you need to drain the loop first. That costs a shitload of money at scale.

[+] ksec|4 years ago|reply

>The massive problem here is that its really difficult to hotswap anything.

And this problem has been there since Day 1 ( For well over a decade ). And no one seems to have solve it. It make sense if you treat the whole Rack as a single entity with water cooling. But for individual server I still dont understand how and why liquid cooling could benefits.

[+] apex3stoker|4 years ago|reply

>I doubt its going to be mainstream anytime soon.

I would like to have a quiet and powerful machine for deep learning. The best one I found come from NVIDIA DGX station. It uses liquid cooling and is <35dB acoustics. The problem is that it is too expensive and, last I check, I think it requires buying support contract from third-party.

If liquid cooling is what it takes to have a quiet powerful machine, I hope it will be mainstream.

Source: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Cent...

[+] ChuckMcM|4 years ago|reply

I cannot help but chuckle at how what is old is new again here. In the 70's and early 80's all of IBM's mainframes supported liquid cooling. Basically when a computer "uses" X kW of power it really means that it generates that many kW of heat while it is operating. Removing heat at scale has been a thing for a long time.

And what was alluded to in the video is the thermal mass of 'air' kind of sucks. So the old design of chilling air down to 67 degrees and filling a room with that air so that it can circulate around electronics putting out prodigious amounts of heat, and then collected and re-cooled, is not nearly as efficient as one would like.

Cooling water, piping it to the heat exchanger in the back door of the rack and then (unlike the video's idea) sucking air through it first, and then pushing the cooled air over the electronics to 're-heat' it, works better. Then you don't really care what temperature the air in the data center itself is as long as the heat exchanger can remove 'x' watts of heat from it before it gets blown over the computers. Suck air in from the floor (the coolest air) and blow it out the top (where it continues on to the ceiling).

Still, that only doubles the power capacity of the racks (maybe 2.5x if the heat exchanger is filled with actively chilled water)

Prior to heat exchanger doors people would have "cold" aisles and "hot" aisles. The cold air from the CRAC units would come up from the floor behind the servers, get sucked through them and exhausted forward into the "hot" aisle. There is a whole little mini industry of "cold air containment" which has stuff to build doors/covers for the cold aisle so that all of that air is sucked through servers.

[+] devwastaken|4 years ago|reply

I'm curious why liquid cooling for computers still uses compression fittings and other odd methods. Compression fittings are widely out of favor due to their tendency to leak, and nowadays everything is copper, pex, or metal+flared fittings. I wonder if there's a PC running PEX-A for tubing.

Further thoughts:

Brake lines use flared fittings, with either metal tubing or plastic tubing with metal ends. Uses a special bolt that allows liquid through the pipe to enter the caliper, and is removable.

I would imagine something like that could work to make the lines to the individual servers serviceable without relying on flaky plastic "quick connects".

[+] intpx|4 years ago|reply

RGB-life liquidcooling != Data Center liquid cooling

They have precision dry break quick connects. They use crazy jacketed hoses with PTFE liners and copolymer sheating, sometimes with reinforcment layers. The coolant is pumping through at an operating pressure of 10bar @ 200C. These applications exceed the limits of potable water and hydronic heating systems

[+] lazide|4 years ago|reply

Compression fittings are generally more convenient and cost effective than Pex or copper in small volumes and with little experience. The only downside as you note is leaks if someone doesn’t do them right (or as time passes), but that tends to be more of problem at higher household service line pressures (60+ psi)

If it is a low pressure recirculation system, you can get by with almost no tools and most people if they read instructions at all probably won’t mess it up bad enough to have a leak.

The brake line connectors could be interesting, but aren’t likely to work well stock - brakes are very low flow, high pressure as they are ‘dead ended’ hydraulic systems used to transfer force, not loops for cooling purposes. Also those bolts can work loose, and then you have leaks. Brakes require experienced technicians or people die, so folks working on brakes tend to be the more qualified and experienced mechanics, and they make sure to torque to spec, clean surfaces, do proper prep, etc. which is not what you want to have to worry about from an idiot proofing perspective.

[+] namibj|4 years ago|reply

I'm about to get me proper industrial quick disconnects (the hard part is the disconnecting, not the connecting, as otherwise a simple valve on each end and a screwed fitting would suffice) for my workstation, so that I can put a large radiator outside of my room and still have an easy-to-carry system when I need to be somewhere else and have a proper computer. It's just no longer silent at that point.

These quick disconnects are polymer, but that's in large part due to corrosion resistance. They are also rated to ~8 Bar and also vacuum, if one would use the latter.

Also, for flexible tubing, compression fittings don't seem like such a bad idea. Yes, don't use them for rigid pipes (PEX counts there, for connecting purposes), but the only thing better than squishing the hose into a seal-barb seems to be welding the hose to the connector...

[+] opwieurposiu|4 years ago|reply

A lot of high power plasma processing equipment is water cooled. RF and DC generators, matches, etc. Stuff that requires a coax cable as thick as your arm. The switch from air to water cooling happens around .5-1Kw, this is the same range they seem to be targeting for datacenter stuff.

Aside from leaks, the building maintenance staff really needs to stay on top of the cooling water chemistry. Cooling loops tend to make slime and if you let it get out of control it clogs up everything and becomes a real problem to clean.

[+] 20100thibault|4 years ago|reply

Liquid cooling also enables heat recovery and free cooling all year long. Here's a project using liquid cooling to recover energy from Data centers to heat greenhouses. https://www.qscale.com/

[+] jeffbee|4 years ago|reply

It seems odd to discuss liquid cooling but spend the first part of the article talking about AC PDUs and their issues. If you are able to use liquid cooling in a data center, aren’t you also sophisticated enough for modern DC to point-of-load?

I can see why supermicro needs to pursue this direction, but it seems inevitable that their business will get eaten up by OCP and integration at the rack, row, and building level.

[+] jmcguckin|4 years ago|reply

One place i worked used flourinert to cool the circuit cards of a integrated circuit tester.

Each card has a 'cold plate' where the outside of the cold plate has a brass tube brazed to the plate in a serpentine pattern leading to two quick disconnect connectors that automatically engage/disengage when the card in inserted or removed from the card cage.

The other side of the cold plate (the 'inside') has spring loaded fingers to pull the heat away from chips. We had some chips that dissipated over 100W.

This worked great - you'd lose maybe 1 drop of fluid per card insertion, etc. We used flourinert, but DI water, mineral oil or water with ethylene glycol could be used as well.

[+] Const-me|4 years ago|reply

Technically, the best solution is distribute. Specifically, compute stuff on user's own devices, as opposed to moving everything to these centralized clouds controlled by just a few large corporations.

Too bad that's unlikely to happen.

[+] elihu|4 years ago|reply

The "cloud" can be far more energy efficient than personal computing devices, and geographically distributed computation has costs as well, if you're doing computation that requires a lot of data that isn't all in one place.

I think keeping one's own data on one's own local hardware has other significant privacy/security/control benefits, but energy efficiency is probably not among them.

[+] alberth|4 years ago|reply

What makes this time different?

I've literally heard this is coming to datacenter for 20 YEARS and it hasn't yet.

Liquids like mineral oil, etc to be the cooling agent yet it never gains measured adoption.

And with so much of servers now being centrally managed by just a few major cloud providers (AWS, Azure, etc) - unless you can break into those few accounts - how will this time be any different than the past?

From 2003: https://www.hitachi.com/New/cnews/E/2003/0217/index.html

[+] kllrnohj|4 years ago|reply

> Liquids like mineral oil, etc to be the cooling agent yet it never gains measured adoption.

I've never seen a serious attempt to do mineral oil cooling outside of funny gag setups. It was never really pitched as a future of cooling. There's other attempts at other liquids for submersion cooling, but that always seems like a bit of a stretch.

Beyond that, liquid cooling has taken over. That Hitachi post, for example, seems to be about AIOs which are now very common on desktops with the AIO market exploding over the last ~5 years. Nearly every laptop also employs a form of "liquid cooling" these days with either heatpipes or vapor chambers.

As for "what makes this time different" for the datacenter, well heat density continues to rise. >100 CPU cores per 2U rack are trivial to achieve now, and more are readily plausible. GPU accelerators are also more common than ever, and are pushing staggering power draws (like the 8x 400W NVIDIA A100 in a single 4U chasis the article talks about). You can keep spending more power on fans, but there's diminishing returns there not to mention the power consumption of the fans themselves becomes non-trivial.

[+] titzer|4 years ago|reply

Cool stuff. I remember seeing liquid cooling solutions that immersed the entire system in some non-conductive fluid, like mineral oil. Of course, that's a bit messy, but I wonder if all components were designed to be immersed, whether it'd be feasible to consider the entire server enclosure as the watertight unit, e.g. in a rack, rather than hoses to individual servers.

[+] toast0|4 years ago|reply

Immersing the rack as a unit means commiting to servicing the rack as a unit.

Some organizations do that. Bring the rack up, do a burn in test, then run it until enough bits are broken that it makes sense to take the whole thing down to swap bits or until it's obsolete enough to justify replacing the whole thing. But a lot of organizations want to replace storage or ram or ?? as it fails without turning off the whole rack.

Would bring more meaning to the words 'draining' a server/rack/data center though.

Actually reading the article, they have immersion chambers which look more or less like a rack rotated 90 to be horizontal. Which means you can pull a rack mount server out vertically to service it, but it also takes up a lot more floorspace (at least 3x to my eye), which reduces density. If you were OK with reduced density, you could leave your racks 2/3rds empty or use 2U servers instead of 1U servers, etc. Power and cooling requirements are already the bottleneck for most datacenters rather than floorspace, and liquid cooling doesn't really help a lot there (unless you can pump your working fluid outdoors, but usually it's just dumping the heat elsewhere in the room more effectively, sometimes with more power for a pump than for fans).

[+] PragmaticPulp|4 years ago|reply

Submersion cooling systems are shown on page 3 of the article. The page selector is at the bottom of the text. I like how they have to have hoists to raise and lower the servers into the bath because they're so heavy.

[+] bob1029|4 years ago|reply

Direct liquid cooling makes a shitload of sense from a purely academic standpoint.

At what point in power density would it not even matter if the datacenter was at meat locker temperatures? I feel like we are getting pretty damn close.

The efficiency gains at scale must be really good. I can see why this is not super popular though.

[+] bastardoperator|4 years ago|reply

Data centers have been using chilled water to cool air for a long time. It would be interesting to see cooling delivered directly to the CPU/GPU from a/the chiller.

[+] sschueller|4 years ago|reply

No thanks, the equinix DC I have dealt with has had so many chiller outages I am glad that the fans were able to keep the temp low enough to prevent damage.

If I lost all cooling the equipment would literally be toast.

[+] cyberge99|4 years ago|reply

I think datacenters will opt for more ARM architectures before introducing liquid at scale. Things like Gravitron and Apple Silicon are changing the landscape there.

[+] sudosysgen|4 years ago|reply

Graviton and Apple Silicon still don't offer significant performance per watt improvements versus EPYC. I don't think it will make or break the need for liquid cooling.

[+] PragmaticPulp|4 years ago|reply

The article is mostly about cooling high power, GPU-like accelerator units that consume 400-600W or more.

Even with power efficient architectures, the push to scale up core count and increase density will still drive toward more liquid cooling.

[+] ksec|4 years ago|reply

You still have a 280W CPU, instead of 64 Core you now have 128. And if thermals permits they would like 128 x86 Core or 256 ARM Core.

Samsung just announced their work of 7xxGB of DDR5 per stick. With a future roadmap of 1TB per stick at a request of hyperscaler. Along with 400W or 500W GPU.

And when you have many new packaging technology where each die could be a 100W TDP you quickly run into all sort of thermals limits. And there are all sort of HPC and niche that wants super powerful Server.

[+] atty|4 years ago|reply

I’m pretty sure AMD offers perf/watt that is in the ballpark of ARM chips, and I’m guessing Intel, if they ever get on a similar process node again, will be able to get close.

None of that changes the need for accelerators though. Nvidia et al want to make the biggest chips they can, and at some point that means water cooling.

[+] Havoc|4 years ago|reply

Is there a reason the ai accelerators etc can’t just be parked in the polar circle?

Not every workload is that latency sensitive presumably

[+] zamalek|4 years ago|reply

> need to handle cooling at >1kW/U

I wonder if this waste heat could be used to power active/phase change cooling.

[+] icegreentea2|4 years ago|reply

The challenge is that it's really low grade heat (< 100C). DC waste heat would probably find its best use in district heating applications (though I did read about some cases still using a heat pump to boost the waste heat stream even higher - still for district heating though).

[+] swayvil|4 years ago|reply

Why is that guy wearing a mask? Either servers are infectious or wearing a mask is now just "expected".

Do any of you do a doubletake when you see people on tv without a mask?

Have you noticed characters in cartoons wearing masks? Check out the weather frog, sitting alone in the middle of a forest, wearing a mask.

This how we become Morlocks.

56 comments