top | item 10103846

EasyOpenCL – The easiest way to get started with GPU programming

60 points| Gladdyu | 10 years ago |github.com | reply

30 comments

order
[+] scott_s|10 years ago|reply
There's actually quite a few of these kind of libraries floating around, although I'm not sure how many are still actively supported.

Thrust: http://thrust.github.io/

VexCL: http://ddemidov.github.io/vexcl/

Boost.Compute: http://boostorg.github.io/compute/

The author of VexCL provided a comparison of them two years ago: http://stackoverflow.com/questions/20154179/differences-betw...

[+] oneofthose|10 years ago|reply
There is another library authored by me and some colleagues. It is called Aura: https://github.com/sschaetz/aura

I blogged about these kinds of libraries here (overview): http://www.soa-world.de/echelon/2014/04/c-accelerator-librar...

A new addition is welcome as we still have not found the perfect API for accelerator programming. EasyOpenCL seems very simple and easy to use but I feel like it is very restricted.

For getting started with OpenCL development these days I would recommend PyOpenCL. Since everything is in Python, data can be generated easily, results can be plotted using well known Python tools which simplified debugging. Kernels developed in PyOpenCL can directly be copied to other APIs (raw OpenCL C API or some of the other C/C++ wrappers) and reused in production code.

[+] paulmd|10 years ago|reply
Thrust is for CUDA. But there's also the Bolt framework for OpenCL.

EasyOpenCL sounds quite similar to these STL-style libraries.

[+] bratsche|10 years ago|reply
What's the license on this? There doesn't seem to be anything about that in their Github repo.
[+] exDM69|10 years ago|reply
This seems to be a library to make it really easy to invoke a single GPU kernel on some input buffers that are copied from CPU (an std::vector). Unfortunately, most practical GPGPU tasks aren't like that.

The latency of getting data from the CPU to the GPU and back is bad enough that for a small quantity of data (low megabytes), it's better just to compute it on the CPU. More practical tasks usually involve several kernel invocations, and keeping the data at the GPU is essential for any kind of decent performance.

But there are cases where executing a single kernel over some buffers would be useful (especially in early development or prototyping). In those cases, I'd like to write ZERO host-side code and use a CLI or GUI tool to run the code. So what I'd like to see is something like:

    $ cl-cli --kernel=frobnicate.cl --input0=foos.bin --input1=bars.bin --output0=bazs.bin
Does such a tool exist already?

It would be even better if this would allow building proper pipelines of multi-kernel programs by defining the inputs and outputs to kernels using a directed acyclic graph.

I do not intend to dishearten you, OP, but think about this when you consider future direction to take with your project.

[+] matthiasv|10 years ago|reply
We build something like this for image processing tasks: https://github.com/ufo-kit/ufo-core

There is also a CLI interface that allows in principle what you want to do, e.g.

     $ ufo-launch read path=foos.bin ! opencl filename=frobnicate.cl kernel=frobnicate ! fft ! blur ! write filename=bars.tif
[+] Gladdyu|10 years ago|reply
The framework allows for partial data updates - for instance for a 3D renderer it suffices the push the new position to the GPU whilst the vertex data remains in GPU memory. If you invoke the kernel function again it will not recompile the kernel nor reupload the vertex data.

The DAG idea sounds fun to build and very useful - I have some spare time anyway so I'll see what I can whip up. As for the command line interface - It too sounds pretty useful and it should only be a bit of parsing as all the OpenCL related code as been written already, but ufo-launch already performs pretty much the same function so it's not very high on my todo list.

[+] oneofthose|10 years ago|reply
Interesting idea. It should be only a few lines in PyOpenCL to build something like this.

But if you're already in PyOpenCL I guess would also prefer to generate the bin files there (maybe using numpy) ans evaluate the output (matplotlib possibly). For optimization you could run the kernel in a loop, time the runtime and vary the number or global and local work groups.

[+] gjulianm|10 years ago|reply
Seems nice! I would use this to avoid all the OpenCL bloat code. However, there's an inconvenient: why restrict the vector sizes to be all the same? I see that it is used to set the workgroup size. I think that giving the possibility to pass arrays of whatever size and allowing the client to set the workgroup size wouldn't add much complexity to the code nor to the API.

Apart from that, really nice work, the code is well written and commented, it's a joy reading things like that.

[+] Gladdyu|10 years ago|reply
I still have some plans to auto-derive the optimal work/global/local group sizes, however, that still takes some work.

Therefore, I just implemented the most basic straightforward alternative (which is indeed rather restrictive at the moment) as a temporary solution.

[+] pen2l|10 years ago|reply
CUDA is probably the way to go, since especially if you're having to use a gpu, might as well get one of the new nvidia gpu's
[+] Gladdyu|10 years ago|reply
CUDA has a nice toolchain, but the point of this is to remove all of the low-level stuff. For performance (even on NVIDIA cards) it doesn't really matter whether you use CUDA or OpenCL [1] + OpenCL runs everywhere as a bonus.

[1] http://pds.ewi.tudelft.nl/pubs/papers/icpp2011a.pdf

[+] TsiCClawOfLight|10 years ago|reply
No thanks, I'll rather use something that works on more than Nvidia. Better or not, that's too proprietary for me.