(no title)
jfumero | 6 years ago
TornadoVM compiles from Java bytecode to OpenCL as well. But additionally, it optimizes and specializes the code by interleaving Graal compiler optimizations, such as partial escape analysis, canonicalization, loop unrolling, constant propagation, etc) with GPU/CPU/FPGA specific optimizations (e.g., parallel loop exploration, automatic use of local memory, parallel skeletons exploration such as reductions). TornadoVM generates different OpenCL code depending on the target device, which means that the code generated for GPUs is different for FPGAs and multi-cores. This is because of OpenCL code is portable across devices, but performance is not portable. TornadoVM addresses this challenge by applying compiler specialization depending on the device.
Additionally, TornadoVM performs live task migration between devices, which means that TornadoVM decides where to execute the code to increase performance (if possible). In other words, TornadoVM switches devices if it knows the new device offers better performance. As far as we know, this is not available in Aparapi (in which device selection is static). With the task-migration, the TornadoVM's approach is to only switch device if it detects application can be executed faster than the CPU execution using the code compiled by C2 or Graal-JIT, otherwise it will stay on CPU. So TornadoVM can be seen as a complement to C2 and Graal. This is because there is no single hardware to best execute all workloads efficiently. GPUs are very good at exploiting SIMD applications, and FPGAs are very good at exploiting pipeline applications. If your applications follow those models, TornadoVM will likely select heterogeneous hardware. Otherwise, it will stay on CPU using the default compilers (C2 or Graal).
Some references:
* Compiler specializations: https://dl.acm.org/doi/10.1145/3237009.3237016
* Parallel skeletons: https://dl.acm.org/doi/10.1145/3281287.3281292
* Live task-migration: https://dl.acm.org/doi/10.1145/3313808.3313819
No comments yet.