IMO the coolest thing about Summit/Sierra is that the GPUs and CPUs have a fully coherent single address space with all memory available to the GPUs by default, meaning that your stack- and malloc-allocated variables can be used directly from the GPUs.
However, maybe this is just me, but I don't completely trust this to work without losing a certain amount of peak memory performance. I hope they at least leave the option to turn it off, so we can verify the impact it has on a per-application basis.
> In a media briefing ahead of today’s announcement at Oak Ridge, the partners revealed that Frontier will span more than 100 Shasta supercomputer cabinets, each supporting 300 kilowatts of computing.
So 30 megawatts of computing, plus cooling and other supporting services. How do you power something like this? Does ORNL have their own power station (given they have reactor(s) on site)? If power comes from an external station do they coordinate with the station operator when bringing a system like this online?
As has been noted in other comments, we do not have a power station at ORNL. We buy power from TVA at about 5.5 cents per kW hour which in part is because of the locality of the lab to TVA power plants.
TVA recently completed a 210 MW substation on ORNL's campus to better serve our needs. We do not need to coordinate with them for large runs on the machines.
Oak ridge national laboratory was built where it is partly because they could get lots of cheap power from the TVA, so probably from that. (TVA is a regional electricity provider that operates a lot of hydro plants.)
For those who are curious, a typical American home uses of order a kilowatt, time-averaged (10,400 kWh per year = 1.2 kW). So 30 MW is roughly the average power usage of a city of 30,000 homes, or 80,000 people, although total capacity will be larger to handle fluctuations.
Most of the super computers today have their own power station on site. I know blue waters at UIUC had one, which I believe caused a power outage at one point.
So - on a more "applies to ordinary mortals" level - the fact that they are going to use all AMD components is intriguing.
In reference to AI, NVidia has things "locked up" with CUDA, versus 2nd cousin AMD's OpenCL.
From what I understand, it is possible to recompile TensorFlow (for instance - not that ORNL will be using TF) for OpenCL - but I don't know how well it works. Personally, I've only used TF with CUDA.
Does this mean we might see greater/better support for OpenCL in the AI realm? Might we seem it become on-par with CUDA because of this collaboration for this HPC?
Or will things stay as-is, at least "down here" in the consumer/business realm of AI hardware and applications? Do things like this trickle down, or are things so customized and/or proprietary for the needs of HPC at ORNL (or elsewhere) that anything to do with AI on this machine will have little to no bearing outside of the lab?
Ultimately, I'd just like to see another choice (a lower cost choice!) for GPU in the world of consumer/enthusiast/hobbyist AI/DL/ML - while today's higher-end GPUs, no matter the manufacturer, tend to be fairly expensive, AMD still has an edge here that make them attractive to users (not to mention the fact that their Linux drivers are open-source, which is also a plus).
I doubt they are going to run much AI on that machine. The national labs mostly run "traditional HPC" workloads such as fluid codes that simulate (magneto)hydrodynamics in one way or another.
I'd tend to assume, with the amount of resources going into this, that the software will be coded at a lower level here than it would be in a typical dev environment and so NVidia's library advantages will be less salient?
"greater than 1.5 exaflops" of performance will likely correspond to greater than 1 Exaflop of sustained performance on HPL (used for the top-500 ranking), making this a likely candidate for the first 'true' exascale computer.
Out of curiosity what makes this an exascale computer and not, say, an AWS or Azure datacenter? Just the fact that they are open about benchmarking #pflops?
PCIe provides communication but isn't intended to provide memory coherency. There's a lot of work that goes on in figuring out which cache(s) have a copy of which cache line and figuring out how to resolve conflicting access needs.
Infinity Fabric / HyperTransport is generally lower level and lower latency than PCIe. It's aimed more for use as a front-side bus than a peripheral interconnect. A better analogue would be Intel's QuickPath Interconnect.
Moore's law might have stopped with the clock gains when Denard scaling gave out but we've still got energy efficiency gains. Koomey's law[1] is holding strong. I don't know if it'll get us all the way to Landauer's Limit[2] but I hope so.
It's probably not a great comparison to compare the theoretical numbers of Frontier to the achieved numbers of the green 500. The achieved flops is pretty much always considerably lower than the theoretical flops. Titan is a 27 PFLOPS machine that achieves 17.6 PFLOPS, sequoia is a 20 PFLOPS machine that achieves 17, summit is a 200 PFLOPS machine that achieves 143, ...
~ 37 GFLOPS/W is probably a better projection if we assume (out of nowhere) that the theoretical/achieved ratio of Frontier is comparable to Summit (75%). Still very impressive.
[+] [-] tntn|6 years ago|reply
I wonder if that will be the case on Frontier.
[+] [-] eslaught|6 years ago|reply
However, maybe this is just me, but I don't completely trust this to work without losing a certain amount of peak memory performance. I hope they at least leave the option to turn it off, so we can verify the impact it has on a per-application basis.
[+] [-] vvanders|6 years ago|reply
[+] [-] jacobush|6 years ago|reply
I feel these things always go back and forth in cycles in the industry.
[+] [-] foobard|6 years ago|reply
So 30 megawatts of computing, plus cooling and other supporting services. How do you power something like this? Does ORNL have their own power station (given they have reactor(s) on site)? If power comes from an external station do they coordinate with the station operator when bringing a system like this online?
[+] [-] kincl|6 years ago|reply
TVA recently completed a 210 MW substation on ORNL's campus to better serve our needs. We do not need to coordinate with them for large runs on the machines.
[+] [-] noahl|6 years ago|reply
[+] [-] jessriedel|6 years ago|reply
[+] [-] nthompson|6 years ago|reply
[+] [-] lettergram|6 years ago|reply
[+] [-] cr0sh|6 years ago|reply
In reference to AI, NVidia has things "locked up" with CUDA, versus 2nd cousin AMD's OpenCL.
From what I understand, it is possible to recompile TensorFlow (for instance - not that ORNL will be using TF) for OpenCL - but I don't know how well it works. Personally, I've only used TF with CUDA.
Does this mean we might see greater/better support for OpenCL in the AI realm? Might we seem it become on-par with CUDA because of this collaboration for this HPC?
Or will things stay as-is, at least "down here" in the consumer/business realm of AI hardware and applications? Do things like this trickle down, or are things so customized and/or proprietary for the needs of HPC at ORNL (or elsewhere) that anything to do with AI on this machine will have little to no bearing outside of the lab?
Ultimately, I'd just like to see another choice (a lower cost choice!) for GPU in the world of consumer/enthusiast/hobbyist AI/DL/ML - while today's higher-end GPUs, no matter the manufacturer, tend to be fairly expensive, AMD still has an edge here that make them attractive to users (not to mention the fact that their Linux drivers are open-source, which is also a plus).
[+] [-] petschge|6 years ago|reply
[+] [-] jedbrown|6 years ago|reply
[+] [-] Symmetry|6 years ago|reply
[+] [-] AlphaSite|6 years ago|reply
[+] [-] arcanus|6 years ago|reply
[+] [-] opportune|6 years ago|reply
[+] [-] shifto|6 years ago|reply
[+] [-] BooneJS|6 years ago|reply
https://www.anandtech.com/show/14302/us-dept-of-energy-annou...
[+] [-] berbec|6 years ago|reply
What interconnects do these sorts of machines use? I assume even 100GbE isn't enough?
Just curious. It's interesting what exists in the "so far beyond my price range as to be ludicrous" category.
[+] [-] Symmetry|6 years ago|reply
[+] [-] gnode|6 years ago|reply
[+] [-] Grazester|6 years ago|reply
[+] [-] ksec|6 years ago|reply
[+] [-] gok|6 years ago|reply
[+] [-] Symmetry|6 years ago|reply
[1]https://en.wikipedia.org/wiki/Koomey%27s_law
[2]https://en.wikipedia.org/wiki/Landauer%27s_principle
[+] [-] tntn|6 years ago|reply
~ 37 GFLOPS/W is probably a better projection if we assume (out of nowhere) that the theoretical/achieved ratio of Frontier is comparable to Summit (75%). Still very impressive.
[+] [-] kincl|6 years ago|reply