Cray, AMD to Extend DOE’s Exascale Frontier

[+] tntn|6 years ago|reply

IMO the coolest thing about Summit/Sierra is that the GPUs and CPUs have a fully coherent single address space with all memory available to the GPUs by default, meaning that your stack- and malloc-allocated variables can be used directly from the GPUs.

I wonder if that will be the case on Frontier.

[+] eslaught|6 years ago|reply

Yes: https://www.olcf.ornl.gov/wp-content/uploads/2019/05/frontie...

However, maybe this is just me, but I don't completely trust this to work without losing a certain amount of peak memory performance. I hope they at least leave the option to turn it off, so we can verify the impact it has on a per-application basis.

[+] vvanders|6 years ago|reply

That's basically the entire state of mobile(minus a few weird SoCs) and many game consoles, makes for a much more convenient development platform.

[+] jacobush|6 years ago|reply

Amiga 500 all over again. :)

I feel these things always go back and forth in cycles in the industry.

[+] foobard|6 years ago|reply

> In a media briefing ahead of today’s announcement at Oak Ridge, the partners revealed that Frontier will span more than 100 Shasta supercomputer cabinets, each supporting 300 kilowatts of computing.

So 30 megawatts of computing, plus cooling and other supporting services. How do you power something like this? Does ORNL have their own power station (given they have reactor(s) on site)? If power comes from an external station do they coordinate with the station operator when bringing a system like this online?

[+] kincl|6 years ago|reply

As has been noted in other comments, we do not have a power station at ORNL. We buy power from TVA at about 5.5 cents per kW hour which in part is because of the locality of the lab to TVA power plants.

TVA recently completed a 210 MW substation on ORNL's campus to better serve our needs. We do not need to coordinate with them for large runs on the machines.

[+] noahl|6 years ago|reply

Oak ridge national laboratory was built where it is partly because they could get lots of cheap power from the TVA, so probably from that. (TVA is a regional electricity provider that operates a lot of hydro plants.)

[+] jessriedel|6 years ago|reply

For those who are curious, a typical American home uses of order a kilowatt, time-averaged (10,400 kWh per year = 1.2 kW). So 30 MW is roughly the average power usage of a city of 30,000 homes, or 80,000 people, although total capacity will be larger to handle fluctuations.

[+] nthompson|6 years ago|reply

They do not have their own power station. They have the Bull Run coal plant and hydro plants in the area. They do coordinate with TVA before a run.

[+] lettergram|6 years ago|reply

Most of the super computers today have their own power station on site. I know blue waters at UIUC had one, which I believe caused a power outage at one point.

[+] cr0sh|6 years ago|reply

So - on a more "applies to ordinary mortals" level - the fact that they are going to use all AMD components is intriguing.

In reference to AI, NVidia has things "locked up" with CUDA, versus 2nd cousin AMD's OpenCL.

From what I understand, it is possible to recompile TensorFlow (for instance - not that ORNL will be using TF) for OpenCL - but I don't know how well it works. Personally, I've only used TF with CUDA.

Does this mean we might see greater/better support for OpenCL in the AI realm? Might we seem it become on-par with CUDA because of this collaboration for this HPC?

Or will things stay as-is, at least "down here" in the consumer/business realm of AI hardware and applications? Do things like this trickle down, or are things so customized and/or proprietary for the needs of HPC at ORNL (or elsewhere) that anything to do with AI on this machine will have little to no bearing outside of the lab?

Ultimately, I'd just like to see another choice (a lower cost choice!) for GPU in the world of consumer/enthusiast/hobbyist AI/DL/ML - while today's higher-end GPUs, no matter the manufacturer, tend to be fairly expensive, AMD still has an edge here that make them attractive to users (not to mention the fact that their Linux drivers are open-source, which is also a plus).

[+] petschge|6 years ago|reply

I doubt they are going to run much AI on that machine. The national labs mostly run "traditional HPC" workloads such as fluid codes that simulate (magneto)hydrodynamics in one way or another.

[+] jedbrown|6 years ago|reply

HIP (https://rocm-documentation.readthedocs.io/en/latest/Programm...) is basically 1-1 with CUDA, and can be almost automatically generated from CUDA (https://github.com/ROCm-Developer-Tools/HIP/tree/master/hipi...). The datasheet doesn't mention OpenCL so evidently it isn't a priority any more.

[+] Symmetry|6 years ago|reply

I'd tend to assume, with the amount of resources going into this, that the software will be coded at a lower level here than it would be in a typical dev environment and so NVidia's library advantages will be less salient?

[+] AlphaSite|6 years ago|reply

They'll probably use ROCm rather than OpenCL.

[+] arcanus|6 years ago|reply

"greater than 1.5 exaflops" of performance will likely correspond to greater than 1 Exaflop of sustained performance on HPL (used for the top-500 ranking), making this a likely candidate for the first 'true' exascale computer.

[+] opportune|6 years ago|reply

Out of curiosity what makes this an exascale computer and not, say, an AWS or Azure datacenter? Just the fact that they are open about benchmarking #pflops?

[+] shifto|6 years ago|reply

But will it run Crysis?

[+] BooneJS|6 years ago|reply

Looks like Frontier will use Cray’s Slingshot network. https://www.cray.com/products/computing/slingshot

https://www.anandtech.com/show/14302/us-dept-of-energy-annou...

[+] berbec|6 years ago|reply

What's the advantages of infinity fabric over pcie 4 for cpu/gpu?

What interconnects do these sorts of machines use? I assume even 100GbE isn't enough?

Just curious. It's interesting what exists in the "so far beyond my price range as to be ludicrous" category.

[+] Symmetry|6 years ago|reply

PCIe provides communication but isn't intended to provide memory coherency. There's a lot of work that goes on in figuring out which cache(s) have a copy of which cache line and figuring out how to resolve conflicting access needs.

[+] gnode|6 years ago|reply

Infinity Fabric / HyperTransport is generally lower level and lower latency than PCIe. It's aimed more for use as a front-side bus than a peripheral interconnect. A better analogue would be Intel's QuickPath Interconnect.

[+] Grazester|6 years ago|reply

As to the interconnect question. They usually use InfiniBand but Cray uses their proprietary interconnect tech.

[+] ksec|6 years ago|reply

I wonder if this will help the adoption of ROCm. It seems everything I read settled on CUDA.

[+] gok|6 years ago|reply

1.5 exaflops, 30 megawatts, around 50 GFLOPs per watt? Impressive if true; that's 3x more efficient than the current top of the Green500.

[+] Symmetry|6 years ago|reply

Moore's law might have stopped with the clock gains when Denard scaling gave out but we've still got energy efficiency gains. Koomey's law[1] is holding strong. I don't know if it'll get us all the way to Landauer's Limit[2] but I hope so.

[1]https://en.wikipedia.org/wiki/Koomey%27s_law

[2]https://en.wikipedia.org/wiki/Landauer%27s_principle

[+] tntn|6 years ago|reply

It's probably not a great comparison to compare the theoretical numbers of Frontier to the achieved numbers of the green 500. The achieved flops is pretty much always considerably lower than the theoretical flops. Titan is a 27 PFLOPS machine that achieves 17.6 PFLOPS, sequoia is a 20 PFLOPS machine that achieves 17, summit is a 200 PFLOPS machine that achieves 143, ...

~ 37 GFLOPS/W is probably a better projection if we assume (out of nowhere) that the theoretical/achieved ratio of Frontier is comparable to Summit (75%). Still very impressive.

[+] kincl|6 years ago|reply

GPUs help considerably here, taking a look at previous ORNL machines:

  Jaguar | 2.3 PF XT5 (CPU-only)      | 7 MW for HPL
  Titan  | 27 PF XK7 (1:1 CPU to GPU) | 8.2 MW
  Summit | 186 PF (2:6 CPU to GPU     | 8.8 MW

Overall, substantial changes in computing power and 10%-20% increase in power.

62 comments