New GPU-Accelerated Supercomputers Change the Balance of Power on the TOP500

[+] improbable22|7 years ago|reply

What's the definition of "one supercomputer" for the purposes of TOP500?

For example, why doesn't one of Google's warehouses qualify? Or the whole of Google, for that matter. A bit of googling didn't find my anything very satisfactory.

[+] bbatha|7 years ago|reply

One thing google et all are missing from a typical super computer is infiniband style interconnects. They provide integrations with parallel data libraries like mpi and offer “3d” networking that will take into account physical distance between nodes and can do single rack mesh networking to avoid the overhead of switching. Despite google having lots of compute power they probably can’t leverage it in the way that the LINPACK benchmarks need.

[+] dooglius|7 years ago|reply

I believe anything that can perform the LINPACK benchmark is eligible, though to qualify the owner of the computer would have to voluntarily run the benchmark and submit their results. Google has chosen not to submit any results, probably because they have better things to do with their warehouses than run benchmarks.

[+] ychen306|7 years ago|reply

The difference between a supercomputer and a data center is how "connected" the computations are; supercomputer optimizes the communication between nodes. To put it another way, a data center does a lot of work but, most of the time, for different applications (services) whose dependencies are "sparse".

[+] stephencanon|7 years ago|reply

Because they don't submit results. You have to enter to win.

Tangentially, Top500 results are based on one benchmark (latency of enormous double precision matrix triangular factorization), which is relatively far removed from what Google is optimizing for.

[+] dekhn|7 years ago|reply

Definition of supercomputer varies, but TOP500 is based on LINPACK. I am not aware of Google running any LINPACK benchmarks on their hardware (except maybe in Cloud VMs)?

As others will say, the classic Google warehouses weren't really supercomputers, but more like massive clusters with a high cross-sectional bandwidth, but with very high latency, and they didn't run an MPI stack.

[+] deepnotderp|7 years ago|reply

It's all in the Interconnects. The hard part of supercomputing is moving data, not computing.

[+] zeusk|7 years ago|reply

or the hyperscale clouds (AWS, Azure, GCP).

[+] jcranmer|7 years ago|reply

TOP500 doesn't include distributed systems. Essentially, every computer on TOP500 is a single computer than you can log onto. By contrast, Google's data warehouse would qualify as a large cluster of individual systems.

Note that not all supercomputers are on TOP500. Blue Waters is perhaps the most notable one to not bother reporting its performance (it would probably have been #1 had it done so when it came out, and today it would fall around 13th or so).

[+] mrb|7 years ago|reply

The number one supercomputer on the TOP500 list, Summit, is able to majority-attack about 95% of cryptocurrencies that are GPU-mined: https://twitter.com/zorinaq/status/1007005472505978880 That's one advantage that ASIC-mined currencies have over them. Specialized chips raise the security bar so high that the pre-existing installed base of GPUs cannot attack them.

[+] Arbalest|7 years ago|reply

That sounds good in theory, but I can't help but wonder if this has actually contributed to the huge power draw. Sure they're more efficient, but due to their limited availability, it encourages the big players to consolidate, knowing the barriers to entry are very high for would be competitors. This results in a technological arms race among the biggest players, confident that there will be no added competitors who will come out of nowhere. As they add to their holdings, they also necessarily increase their power consumption, making for an opportunity cost of that otherwise cheap power to other uses.

[+] fulafel|7 years ago|reply

Supercomputers are expensive because of the investment in interconnect & i/o bandwidth & latency.

So using one for a trivially parallelizable, low-communication task like crypto mining would be wasteful - very low bang for the buck.

(There are other cryptanalysis workloads that benefit though, eg parallel number field sieve).

[+] bmer|7 years ago|reply

At least for Intel processors, one can run the LINPACK benchmark using pre-built binaries provided by Intel: https://software.intel.com/en-us/articles/intel-linpack-benc...

There must be something similar for AMD processors too, but I can't find it with some quick duckduckgo. Perhaps someone else can link it?

Just a silly thing to compare your PC with the big dogs.

[+] 67_45|7 years ago|reply

I wish that you could buy a simple computer where all processing is integrated. The cores form a pyramid with a few really fast ones on top and tons and tons of slow ones below. All is exposed, with no speculation, in a very low level raw API. All abstractions like speculation etc are layers on top ala vulkan

[+] zackmorris|7 years ago|reply

This might be a good time to ask: my main reservation about TensorFlow is that it's a subset of general purpose computing, so will always be limited to niches like AI or physics simulations or protein folding. If we look at something like MATLAB (or GNU Octave) as general-purpose vector computing, then we need some kind of bridge between the two worlds. I couldn't find much other than this:

https://www.quora.com/How-can-I-connect-Matlab-to-TensorFlow

Does anyone have any ideas for moving towards something more general?

[+] geoalchimista|7 years ago|reply

If you are not confined by the MATLAB platform, then there are a few options. For example, Julia [1] is a general-purpose numerical computing language that has more or less similar syntax with MATLAB for vector/matrix computation, and there is a Julia package TensorFlow.jl [2] that allows you to call TensorFlow in Julia. There are also quite a few packages in development to adapt Julia to GPU-based computation.

And, to be fair, the NumPy/SciPy stack of Python can also be seen as a general-purpose vector computing platform. My feeling is that MATLAB had its days in the 90s. It's just that the most cutting-edge technologies do not seem to be developed in MATLAB any more.

[1] https://github.com/JuliaLang/julia

[2] https://github.com/malmaud/TensorFlow.jl

[+] dahart|7 years ago|reply

Maybe something like Jupyter or plain Python with GPU enabled numpy is what you’re looking for?

Tensorflow is a library and not a language, it’s meant to be used from a host language that is Turing complete. It’s goal is to make construction of graphs of vector evaluations easier and more performant, but not really to provide a general purpose computation environment, it’s assumed you’re calling it from a general purpose computation environment.

So, if you’re using Tensorflow, you normally have general purpose computing available to you, with the option to bake your vector tasks into graphs easily and/or speed them up.

Jupyter is becoming a decent alternative to Matlab, and you have many options for running vector computations from python, with or without a GPU.

[+] joe_the_user|7 years ago|reply

I don't know about tensor flow in particular but are little-known methods of running "general purpose" parallel programs on GPUs. Specifically, H. Dietz' MOG, "Mimd on GPU". It's a shame the project hasn't gotten more attention imo.

http://aggregate.org/MOG/

See: https://en.wikipedia.org/wiki/Flynn%27s_taxonomy for explanations of terms.

[+] bmer|7 years ago|reply

What do you mean by "TensorFlow is a subset of general purpose computing, and thus will always be limited to niches"? It's not clear to me at all what one could mean by this. Doesn't TensorFlow have to use matrix math deep down (just like any other digital computing system)?

[+] albertzeyer|7 years ago|reply

Why do you think TensorFlow is a subset of general purpose computing? What do you think what is missing? I think nothing is really missing, only that it's maybe more difficult to perform certain kind of tasks. But compared to Matlab/Octave, I don't really see much lacking (in the platform). I would even say the opposite, that the Matlab/Octave platform seems to me like a subset of what TensorFlow offers.

[+] ianbertolacci|7 years ago|reply

> niches like AI or physics simulations or protein folding These are 99% of HPC workloads. Nothing `niche` about them.

[+] fnbr|7 years ago|reply

At work, we have a bunch of vectorized computations that we run in Tensorflow, as it's a convenient way to get GPU-optimized code, so that is still an option (albeit an awkward one).

You could also use something like CUDA, or OpenGL to do this; there are some Python libraries to do basic numerical work, such as PyCUDA or gnumpy.

[+] harias|7 years ago|reply

CUDA libraries fit your description I guess.

[+] davrosthedalek|7 years ago|reply

Since TFA talks about deep learning so much, I wonder how many of the applications run on these machines actually are deep learning, or can make use of the tensor cores in some other way.

[+] godelski|7 years ago|reply

A lot of people are using GPUs for many other things than ML. The big advantage is the number of cores, and people that run on super computers write algorithms that are highly parallelized (otherwise what's the point). GPUs are getting fast enough that the number of cores they share is gaining an edge. Also the memory on them is MUCH faster than that on a CPU, but the cost is that you have less (20Gb compared to 256Gb).

As far as the TPUs, one big advantage for ML is that they are float16/float32 (normal being f32/f64)(in ML you care very little about precision) and are optimized for tensor calculations. For anything that you don't need that resolution and are doing tensor stuff (lots of math/physics does tensor stuff), then these will give you an advantage. (I'm not aware of anyone using these for things other than ML, but I wouldn't be surprised if people did use them) But other things you need more precision and those won't use the TPUs (AFAIK).

[+] bwanab|7 years ago|reply

Given that the top one is at Oak Ridge National Lab, my guess would be that they're not exploring deep learning. They've got other applications in mind.

[+] tntn|7 years ago|reply

One other point not mentioned in other comments: some work was presented at GTC regarding using tensor cores for a low precision solution followed by iterative refinement to a fp64-equivalent solution. IIRC, 2-4x speed up for fp64 dense system solvers.

[+] blihp|7 years ago|reply

My guess would be the vast majority. In addition to being an area that has everyone's interest right now, the hardware is getting more and more specialized so it just doesn't benefit general purpose computing. Just as FPU enhancements target a fraction of computing tasks, GPU's target an even smaller fraction, Tensor cores / 16-bit FP etc smaller still.

[+] unknown|7 years ago|reply

[deleted]

70 comments