top | item 11565036

GPUCC – An Open-Source GPGPU Compiler

195 points| haberman | 10 years ago |research.google.com | reply

53 comments

order
[+] haberman|10 years ago|reply
I don't know much about this (it's not my area of expertise), but I thought this G+ post was interesting: https://plus.google.com/u/0/+VincentVanhoucke/posts/6RQmgqcm...

It says that a lot of the reason TensorFlow initially lagged in performance is because a lot of those performance issues only manifested under NVCC, whereas they had been using GPUCC internally.

[+] namtrac|10 years ago|reply
This is part of llvm trunk (upcoming 3.9 release) now: http://llvm.org/docs/CompileCudaWithLLVM.html
[+] svensken|10 years ago|reply
Thanks for the link! Pretty exciting stuff.

Can anyone comment on the following quote:

The list below shows some of the more important optimizations for GPUs... A few of them have not been upstreamed due to lack of a customizable target-independent optimization pipeline.

So the LLVM version of gpucc will be incomplete? Will there be a release of the original stand-alone gpucc?

[+] m_mueller|10 years ago|reply
Looking forward to a CUDA Fortran frontend for this. Does it exist already?
[+] ashitlerferad|10 years ago|reply
If only it didn't still need the proprietary CUDA SDK.
[+] EliRivers|10 years ago|reply
I see Eli Bendersky's name on this; his site ( http://eli.thegreenplace.net/ ) has a number of interesting C++ articles, some of which I've even carefully printed out and taped into my notebook of really useful things. If you're a C++ programmer, there are a lot of useful reads on there.

I don't see anything specifically about this in the archives, but maybe that's something to look forwards to.

[+] wmf|10 years ago|reply
One wonders why they didn't invest that effort in making an awesome OpenCL 2.1 compiler instead.
[+] joe_the_user|10 years ago|reply
I'm looking at building a GPGPU program.

When I look at CUDA code, it seems to be a big loop targeting the GPU memory with standard c code, allocating memory with standard functions and specifying where code lives with simple defines.

When I look at OpenCL, it is... I don't know what it is. I haven't figure it out after considerable scanning. And that has cemented my decision to avoid it because I don't have infinite time to scan obscurity.

For example, here is a standard "first OpenCL program" - ~200 lines of boiler plate and no simple example of our many cores working together to do something brutally simple and useful like add two vectors. Just "hello world" from GPU.

As far as I can tell, as a production of a multitude of vendors all of which have different stuff, OpenCl is a monstrosity where you have a wide of variety of functionalities supported but none of those functionalities is guaranteed to be present - hence 200 lines of boiler plate. Kind of like the umpteen Unix flavors and such back in the day, "Open standards" that are bridges between only semi-compatible hardware have generally been doomed abortions discarded in favor of a single best approach that all vendors are forced to adopt.

So it seems like the best thing is jettisoning the monstrosity and cloning CUDA for other hardware.

https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBoo...

[+] mattnewton|10 years ago|reply
I think they still need NVIDIA's libraries (cuDNN specifically) alongside this compiler, which AFAIK don't have good OpenCL equivalents yet.
[+] yzh|10 years ago|reply
Not a compiler guy but a GPU programmer. This is exciting! Attended one of the authors' lecture a while ago. Although at this point I assume gpucc would be super-optimized for deep learning (by which I mean dense matrix multiplication), this is very good for the community so that people can work on various versions that either focus on better general performance, or difference feature sets for specific applications in the future.
[+] Alphasite_|10 years ago|reply
Just as a point of interest, is there any limitation to supporting CUDA on AMD hardware (were this to be compiled with the AMDGPU backend)? With the obvious lack of libraries, etc.
[+] barneso|10 years ago|reply
The Tensorflow code mentions "GCUDACC" in several places, and from the surrounding comments it seems to be targeted at OpenCL as well as CUDA. So it seems that this has been at least considered.
[+] fooblaster|10 years ago|reply
I suspect that this compiler is generating ptx and not true native binaries for nvidia's architectures. Nvidia's proprietary compiler stack is still heavily involved in the conversion of ptx ir to native binaries. Essentially.. this isn't a full open source stack.
[+] magicalist|10 years ago|reply
> I suspect that this compiler is generating ptx and not true native binaries for nvidia's architectures

It would take all of getting to page 2 of the article to confirm this instead of speculating...

OTOH, there is an intriguing footnote that

> We are also experimenting compiling [virtual ISA] PTX to [Nvidia's proprietary Shader ASSembler] SASS before program execution and embedding the SASS directly into the resultant binary

but the paper mentions in the conclusion that a SASS spec is not publicly available. It would be interesting for someone involved to comment more on that. Experiments on reverse engineering the compiled PTX results?

If implementing a replacement for nvcc gave these gains, I would imagine being able to control an offline version of the (normally JIT) compilation to SASS would also yield large benefits. It would likely be incredibly architecture dependent, but for the big machine learning projects that still might be worth the expense.

[+] wujingyue|10 years ago|reply
You are right that gpucc still depends on NVIDIA's ptxas tool that translates PTX to native binaries. NVIDIA does not publish the specification of their native binaries. Besides that, it is fully open-source.
[+] rsp1984|10 years ago|reply
What are the target GPUs for this? Will it run only on NVIDIA cards? What about mobile GPUs?
[+] maaku|10 years ago|reply
I presume it will run everywhere CUDA is supported. Draw your own conclusions.
[+] wujingyue|10 years ago|reply
It currently generates NVIDIA's PTX only.
[+] varelse|10 years ago|reply
Clang crashed upon impact trying to compile some of my CUDA code as in the very first .cu file. Not a good start IMO.