Rootbeer GPU Compiler Lets Almost Any Java Code Run On the GPU

[+] wmf|13 years ago|reply

I wonder if this is as big a win as it sounds. Regardless of what language you're using, you have to "think GPU" to get any performance from GPUs. The additional overhead of using CUDA/OpenCL syntax seems pretty small in comparison.

[+] relix|13 years ago|reply

Yes, and it requires quite a low-level understanding of the architecture to "think GPU". SIMD, warps, blocks, threads, different memory types, no branching/identical branching per core, ... Some of this could probably be abstracted away but you definitely need to be aware and adjust algorithms appropriately. You can't just convert code and hope for the best, unless you just want a slow co-processor.

[+] 3amOpsGuy|13 years ago|reply

That's my experience in a nutshell. The cost of doing a cudamemcpy() far outweighs the advantages for computationally small tasks. The surprising bit for me was what's classed as a small task.

Decompressing a 5MP jpg then applying various filters is too lightweight a task to benefit. I thought that would be more or less a perfect GPU task, not so.

Running on OpenCL, the CPU with vector instructions horses a small GPU performance wise for this problem.

[+] _3u10|13 years ago|reply

Probably not, typical java code is dominated by branching indirect code. Such code typically operates on mutable data which is not what GPUs are designed to do efficiently.

This should be a performance nightmare.

[+] kevingadd|13 years ago|reply

I had forgotten just how much I hate java namespaces.

import edu.syr.pcpratts.rootbeer.testcases.rootbeertest.serialization.MMult;

This seems like a pretty amazing project if the claims are true, though - I wasn't aware that CUDA was able to express so many of the concepts used to implement Java applications. The performance data in the slides is certainly compelling!

[+] kaffeinecoma|13 years ago|reply

It's nothing compared to the pain of debugging in a language that doesn't have/encourage proper namespaces.

Ruby has Modules, but many, many common libraries do not use them. I had fun recently debugging a project that (through transitive dependencies) relied on two different "progress bar" libraries, both with a class called "Progress", and neither of which was namespaced. Namespaces solve a real problem.

[+] rickette|13 years ago|reply

Does it really matter? Namespaces/packages only serve to:

- make things unique

- group things logically (which makes the systems design more explicit)

This applies to all programming languages. It's just that there's a convention in the Java community to prefix namespaces with a FQDN, which adds to the length. But you're free to choose another convention if you fancy. Although I wouldn't recommend since it's not a major issue, especially considering IDE support.

[+] mintplant|13 years ago|reply

It's really not a problem when IDEs like Eclipse and Netbeans automatically handle imports for you.

[+] rabbitfang|13 years ago|reply

That is an unusually long namespace. It says more about the coding style of the programmer that wrote it than it does about Java.

[+] pjmlp|13 years ago|reply

You find similarly named namespaces, packages, modules in most languages that support modules.

Don't blame the language for what some guys do with them.

[+] strictfp|13 years ago|reply

If they would remove all the redundant information it would be a lot better.

[+] taligent|13 years ago|reply

import edu.syr.pcpratts.<asterix>;

The only reason you see the fully qualified class names is because the IDE does this automatically for you. If there were no IDEs everyone would be using the *s;

[+] sillysaurus|13 years ago|reply

It would be good to post the code for the performance tests in the slides. https://raw.github.com/pcpratts/rootbeer1/master/doc/hpcc_ro...

[+] tmurray|13 years ago|reply

this sort of thing is why NVIDIA is supporting LLVM:

http://nvidianews.nvidia.com/Releases/NVIDIA-Contributes-CUD...

in other words, with that and the right frontend, you can take Language X, compile to LLVM IR, and run it through the PTX backend to get CUDA.

however, in the grand scheme of things, this probably doesn't make GPU programming significantly easier to your average developer (as you still have to deal with big complicated parallel machines); what it really does is ease integration into various codebases in those different languages.

[+] pjmlp|13 years ago|reply

A comparison with AMD's Java offering, Arapi (http://developer.amd.com/zones/java/aparapi/pages/default.as...), would be interesting.

[+] wetherbeei|13 years ago|reply

Looking through the code, this seems to do the exact same thing as Aparapi. I'm surprised this was given funding given the high quality implementation AMD has put together.

The headline is misleading; only a small subset of Java can be ported to the GPU. It works great for inner math loops and such, but not for higher level problems. Even if the author managed to find a way to translate more complicated problems (I see object locking in the list of features), they would be better suited to run on a CPU, or refactored to avoid locks.

[+] mjs|13 years ago|reply

"Rootbeer was created using Test Driven Development and testing is essentially important in Rootbeer"

I'm not sure what the "essentially" means here, but this is the first "big" program I'm aware of that name-checks TDD, and a counter-example to my theory that programs where much of the programmer effort goes into algorithms and data structures are not suited to TDD.

Was the TDD approach "pure"? (Only one feature implemented at a time, with absolutely no design thought given to what language features might need to be implemented in the future.)

[+] colomon|13 years ago|reply

I'd think a project like this is ideally suited to TDD: you know what the results should be for most operations, and they are easily testable. It's the same reason the Perl 6 test suite has been so valuable to the various Perl 6 compiler projects. (Not that any of them claim to use TDD.)

[+] ChuckMcM|13 years ago|reply

Its pretty cool. Of course you are probably using that GPU in a desktop, but an on die GPU in a server class machine? Something to consider.

[+] SwellJoe|13 years ago|reply

The GPU in a desktop is the only interesting kind of GPU. The built-in GPU in servers is ten years behind the current cutting edge on desktops. Though servers can have PCI Express slots for modern GPU installation.

I think most of the GPU-based bitcoin farmers are using desktop hardware, but I might be wrong.

[+] winter_blue|13 years ago|reply

Does this simply run your java code on the GPU, or does it parallelize your code automatically? The latter would be really cool.

[+] dlsym|13 years ago|reply

This is indeed IMHO a central question in this topic, since parallelizing an algorithm is not an easy task.

So I guess you still end up writing your algorithm in OpenCL / Cuda and maybe use the serialization provided by this lib.

Update: (Just read the hpcc_rootbeer.pdf slides.) You write your _parralellized_ implementation of an Algorithm in Java - and it will be executed on the GPU.

[+] AnthonBerg|13 years ago|reply

I prefer programming in CUDA over programming in Java. However, I have a lot of respect for the Java runtime.

If Rootbeer or something similar allows me to program CUDA stuff in Clojure, then I am impressed and excited.

[+] skardan|13 years ago|reply

Has anybody used Rootbeer with language which compiles to Java bytecode (like Clojure)?

It would be interesting to see how functional languages designed for parallelism perform on gpu.

[+] th0ma5|13 years ago|reply

this was my first thought in reading this! as other people would said you would have to think in gpu terms, but a quick glance at the code looks like you could just look into the compile functions. I hope someone out there beats me to this because they'll do a better job :D

[+] rbanffy|13 years ago|reply

> The license is currently GPL, but I am planning on changing to a more permissive license. My overall goal is to get as many people using Rootbeer as possible.

It would be bad to compromise the freedoms of the users in order to be able to limit the freedoms of more of them.

Any reason why the GPLv3 would be considered unsuitable? How about the LGPLv3?

[+] DaNmarner|13 years ago|reply

The title captured the true character of Java: Write once, almost run almost everywhere.

[+] malkia|13 years ago|reply

Don't spill the beans :)

[+] damncabbage|13 years ago|reply

  4. sleeping while inside a monitor.

... Can someone clarify what this is?

[+] mkross|13 years ago|reply

  // In Java, all instances of Object are also a monitor.
  // See https://en.wikipedia.org/wiki/Monitor_%28synchronization%29
  Foo myMonitor = new Foo()

  // To "enter" a monitor, you use 'synchronized'.
  synchronized (myMonitor) {
    // Inside the monitor, we could do a Thread.sleep.
    Thread.sleep(100);
  }

This is a very simplistic problem case. However, it is very possible for this to become a bigger problem. Because I can call arbitrary code when "inside" a monitor, it is very possible to call a method that does a sleep incidentally. (e.g. many implementations of IO operations will require some sort of sleep.)

[+] unknown|13 years ago|reply

[deleted]

[+] stephen272|13 years ago|reply

Seems much nicer than coding directly for CUDA or OpenCL

[+] joestringer|13 years ago|reply

The host-side application could already be written using Java, atleast for OpenCL applications[1] (The kernel--that is, the GPU code--was still written in OpenCL). My only concern is that Java will make it more difficult to find out exactly what's going on in the kernel code, and hence more difficult to optimise.

Now, this also doesn't solve the issue of needing to consider the parallel architecture when coding the kernel to actually make use of the hardware. Nevertheless, kudos to the guys behind this.

[1] http://code.google.com/p/javacl/

[+] pron|13 years ago|reply

At first glance this seems very impressive.

[+] algad|13 years ago|reply

Any security implications? Malware running on the GPU?

[+] Vulume|13 years ago|reply

This is already possible, but it never was a problem. The GPU is just a slave to the CPU.

The biggest problem I could see would be when your computer becomes part of a botnet. Your computer could be used for brute fore encryption cracking. Again, this was also possible with just CUDA or OpenCL.

79 comments