I wonder if this is as big a win as it sounds. Regardless of what language you're using, you have to "think GPU" to get any performance from GPUs. The additional overhead of using CUDA/OpenCL syntax seems pretty small in comparison.
Yes, and it requires quite a low-level understanding of the architecture to "think GPU". SIMD, warps, blocks, threads, different memory types, no branching/identical branching per core, ... Some of this could probably be abstracted away but you definitely need to be aware and adjust algorithms appropriately. You can't just convert code and hope for the best, unless you just want a slow co-processor.
That's my experience in a nutshell. The cost of doing a cudamemcpy() far outweighs the advantages for computationally small tasks. The surprising bit for me was what's classed as a small task.
Decompressing a 5MP jpg then applying various filters is too lightweight a task to benefit. I thought that would be more or less a perfect GPU task, not so.
Running on OpenCL, the CPU with vector instructions horses a small GPU performance wise for this problem.
Probably not, typical java code is dominated by branching indirect code. Such code typically operates on mutable data which is not what GPUs are designed to do efficiently.
This seems like a pretty amazing project if the claims are true, though - I wasn't aware that CUDA was able to express so many of the concepts used to implement Java applications. The performance data in the slides is certainly compelling!
It's nothing compared to the pain of debugging in a language that doesn't have/encourage proper namespaces.
Ruby has Modules, but many, many common libraries do not use them. I had fun recently debugging a project that (through transitive dependencies) relied on two different "progress bar" libraries, both with a class called "Progress", and neither of which was namespaced. Namespaces solve a real problem.
Does it really matter? Namespaces/packages only serve to:
- make things unique
- group things logically (which makes the systems design more explicit)
This applies to all programming languages. It's just that there's a convention in the Java community to prefix namespaces with a FQDN, which adds to the length. But you're free to choose another convention if you fancy. Although I wouldn't recommend since it's not a major issue, especially considering IDE support.
The only reason you see the fully qualified class names is because the IDE does this automatically for you. If there were no IDEs everyone would be using the *s;
in other words, with that and the right frontend, you can take Language X, compile to LLVM IR, and run it through the PTX backend to get CUDA.
however, in the grand scheme of things, this probably doesn't make GPU programming significantly easier to your average developer (as you still have to deal with big complicated parallel machines); what it really does is ease integration into various codebases in those different languages.
Looking through the code, this seems to do the exact same thing as Aparapi. I'm surprised this was given funding given the high quality implementation AMD has put together.
The headline is misleading; only a small subset of Java can be ported to the GPU. It works great for inner math loops and such, but not for higher level problems. Even if the author managed to find a way to translate more complicated problems (I see object locking in the list of features), they would be better suited to run on a CPU, or refactored to avoid locks.
"Rootbeer was created using Test Driven Development and testing is essentially important in Rootbeer"
I'm not sure what the "essentially" means here, but this is the first "big" program I'm aware of that name-checks TDD, and a counter-example to my theory that programs where much of the programmer effort goes into algorithms and data structures are not suited to TDD.
Was the TDD approach "pure"? (Only one feature implemented at a time, with absolutely no design thought given to what language features might need to be implemented in the future.)
I'd think a project like this is ideally suited to TDD: you know what the results should be for most operations, and they are easily testable. It's the same reason the Perl 6 test suite has been so valuable to the various Perl 6 compiler projects. (Not that any of them claim to use TDD.)
The GPU in a desktop is the only interesting kind of GPU. The built-in GPU in servers is ten years behind the current cutting edge on desktops. Though servers can have PCI Express slots for modern GPU installation.
I think most of the GPU-based bitcoin farmers are using desktop hardware, but I might be wrong.
This is indeed IMHO a central question in this topic, since parallelizing an algorithm is not an easy task.
So I guess you still end up writing your algorithm in OpenCL / Cuda and maybe use the serialization provided by this lib.
Update: (Just read the hpcc_rootbeer.pdf slides.) You write your _parralellized_ implementation of an Algorithm in Java - and it will be executed on the GPU.
this was my first thought in reading this! as other people would said you would have to think in gpu terms, but a quick glance at the code looks like you could just look into the compile functions. I hope someone out there beats me to this because they'll do a better job :D
> The license is currently GPL, but I am planning on changing to a more permissive license. My overall goal is to get as many people using Rootbeer as possible.
It would be bad to compromise the freedoms of the users in order to be able to limit the freedoms of more of them.
Any reason why the GPLv3 would be considered unsuitable? How about the LGPLv3?
// In Java, all instances of Object are also a monitor.
// See https://en.wikipedia.org/wiki/Monitor_%28synchronization%29
Foo myMonitor = new Foo()
// To "enter" a monitor, you use 'synchronized'.
synchronized (myMonitor) {
// Inside the monitor, we could do a Thread.sleep.
Thread.sleep(100);
}
This is a very simplistic problem case. However, it is very possible for this to become a bigger problem. Because I can call arbitrary code when "inside" a monitor, it is very possible to call a method that does a sleep incidentally. (e.g. many implementations of IO operations will require some sort of sleep.)
The host-side application could already be written using Java, atleast for OpenCL applications[1] (The kernel--that is, the GPU code--was still written in OpenCL). My only concern is that Java will make it more difficult to find out exactly what's going on in the kernel code, and hence more difficult to optimise.
Now, this also doesn't solve the issue of needing to consider the parallel architecture when coding the kernel to actually make use of the hardware. Nevertheless, kudos to the guys behind this.
This is already possible, but it never was a problem. The GPU is just a slave to the CPU.
The biggest problem I could see would be when your computer becomes part of a botnet. Your computer could be used for brute fore encryption cracking. Again, this was also possible with just CUDA or OpenCL.
[+] [-] wmf|13 years ago|reply
[+] [-] relix|13 years ago|reply
[+] [-] 3amOpsGuy|13 years ago|reply
Decompressing a 5MP jpg then applying various filters is too lightweight a task to benefit. I thought that would be more or less a perfect GPU task, not so.
Running on OpenCL, the CPU with vector instructions horses a small GPU performance wise for this problem.
[+] [-] _3u10|13 years ago|reply
This should be a performance nightmare.
[+] [-] kevingadd|13 years ago|reply
import edu.syr.pcpratts.rootbeer.testcases.rootbeertest.serialization.MMult;
This seems like a pretty amazing project if the claims are true, though - I wasn't aware that CUDA was able to express so many of the concepts used to implement Java applications. The performance data in the slides is certainly compelling!
[+] [-] kaffeinecoma|13 years ago|reply
Ruby has Modules, but many, many common libraries do not use them. I had fun recently debugging a project that (through transitive dependencies) relied on two different "progress bar" libraries, both with a class called "Progress", and neither of which was namespaced. Namespaces solve a real problem.
[+] [-] rickette|13 years ago|reply
- make things unique
- group things logically (which makes the systems design more explicit)
This applies to all programming languages. It's just that there's a convention in the Java community to prefix namespaces with a FQDN, which adds to the length. But you're free to choose another convention if you fancy. Although I wouldn't recommend since it's not a major issue, especially considering IDE support.
[+] [-] mintplant|13 years ago|reply
[+] [-] rabbitfang|13 years ago|reply
[+] [-] pjmlp|13 years ago|reply
Don't blame the language for what some guys do with them.
[+] [-] strictfp|13 years ago|reply
[+] [-] taligent|13 years ago|reply
The only reason you see the fully qualified class names is because the IDE does this automatically for you. If there were no IDEs everyone would be using the *s;
[+] [-] sillysaurus|13 years ago|reply
[+] [-] tmurray|13 years ago|reply
http://nvidianews.nvidia.com/Releases/NVIDIA-Contributes-CUD...
in other words, with that and the right frontend, you can take Language X, compile to LLVM IR, and run it through the PTX backend to get CUDA.
however, in the grand scheme of things, this probably doesn't make GPU programming significantly easier to your average developer (as you still have to deal with big complicated parallel machines); what it really does is ease integration into various codebases in those different languages.
[+] [-] pjmlp|13 years ago|reply
[+] [-] wetherbeei|13 years ago|reply
The headline is misleading; only a small subset of Java can be ported to the GPU. It works great for inner math loops and such, but not for higher level problems. Even if the author managed to find a way to translate more complicated problems (I see object locking in the list of features), they would be better suited to run on a CPU, or refactored to avoid locks.
[+] [-] mjs|13 years ago|reply
I'm not sure what the "essentially" means here, but this is the first "big" program I'm aware of that name-checks TDD, and a counter-example to my theory that programs where much of the programmer effort goes into algorithms and data structures are not suited to TDD.
Was the TDD approach "pure"? (Only one feature implemented at a time, with absolutely no design thought given to what language features might need to be implemented in the future.)
[+] [-] colomon|13 years ago|reply
[+] [-] ChuckMcM|13 years ago|reply
[+] [-] SwellJoe|13 years ago|reply
I think most of the GPU-based bitcoin farmers are using desktop hardware, but I might be wrong.
[+] [-] winter_blue|13 years ago|reply
[+] [-] dlsym|13 years ago|reply
So I guess you still end up writing your algorithm in OpenCL / Cuda and maybe use the serialization provided by this lib.
Update: (Just read the hpcc_rootbeer.pdf slides.) You write your _parralellized_ implementation of an Algorithm in Java - and it will be executed on the GPU.
[+] [-] AnthonBerg|13 years ago|reply
If Rootbeer or something similar allows me to program CUDA stuff in Clojure, then I am impressed and excited.
[+] [-] skardan|13 years ago|reply
It would be interesting to see how functional languages designed for parallelism perform on gpu.
[+] [-] th0ma5|13 years ago|reply
[+] [-] rbanffy|13 years ago|reply
It would be bad to compromise the freedoms of the users in order to be able to limit the freedoms of more of them.
Any reason why the GPLv3 would be considered unsuitable? How about the LGPLv3?
[+] [-] DaNmarner|13 years ago|reply
[+] [-] malkia|13 years ago|reply
[+] [-] damncabbage|13 years ago|reply
[+] [-] mkross|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] stephen272|13 years ago|reply
[+] [-] joestringer|13 years ago|reply
Now, this also doesn't solve the issue of needing to consider the parallel architecture when coding the kernel to actually make use of the hardware. Nevertheless, kudos to the guys behind this.
[1] http://code.google.com/p/javacl/
[+] [-] pron|13 years ago|reply
[+] [-] algad|13 years ago|reply
[+] [-] Vulume|13 years ago|reply
The biggest problem I could see would be when your computer becomes part of a botnet. Your computer could be used for brute fore encryption cracking. Again, this was also possible with just CUDA or OpenCL.