top | item 22544089

(no title)

jfumero | 6 years ago

Totally agree. OpenCL code is portable, but performance is not. That's why TornadoVM specializes the OpenCL code depending on the target device. For FPGAs we do a lot more optimizations compared to GPUs, such as tuning the thread-scheduling, better loop unrolling and loop flattening, use of local memory, etc. All of these optimizations are automatically performed in the compiler-IR (GraalIR) before generating the actual OpenCL C code.

With those compiler specializations, we aim to close the performance gap between hand-tuned code and generated code.

discuss

No comments yet.