top | item 38970918

(no title)

bradcray | 2 years ago

@ColonelPhantom: Thanks very much for your questions. The following are answers I'm relaying from Engin Kayraklioglu, who heads up the Chapel GPU effort:

Re Intel support: That's definitely in our plans. However, there are also many other areas where we are actively working on to add more features, fix bugs, and improve performance. When prioritizing, we typically make decisions based on what our current and potential users might need in the language. Frankly, we are not seeing a big push for Intel GPU support so far. So, currently it is not near the top of our priorities. If you (or other readers) have any input on that matter where lack of Intel support might be a blocker for testing Chapel and/or its GPU support out, definitely let us know.

Re implicit serialization: To clarify; the serialization based on order-dependence is not implicit. The users should use a `for` loop if their loop is order-dependent and `foreach` (and `forall`) if their loop is order-independent. In other words, the Chapel compiler doesn't make decisions about order-dependence. In particular, for GPU execution a `for` loop will never turn into a GPU kernel.

There are, however, some cases where a `foreach` does not turn into a kernel. You may be referring to those cases, but that's not related to order-dependence. Some Chapel features cannot execute on a GPU. If your `foreach` loop's body uses any of those features then it will not be launched as a kernel even though `foreach` signals order-independence. Now, a subset of such features that makes an order-independent loop GPU-ineligible are there because we haven't gotten a chance to properly address them, yet. Another subset of such features will remain thwarters for a longer time and maybe forever. For example, your `foreach` loop could be calling an external host function.

discuss

bradcray|2 years ago

Sorry for what now appears to be a double-post. Engin had just registered for HN, hadn't seen his reply going through, so asked me to relay it.

Re-reading this Q+A this morning, I also wanted to clarify one thing, which is that when a 'foreach' or 'forall' does end up being executed on the CPU, that doesn't mean it has been serialized. 'foreach' loops on the CPU are candidates for vectorization while 'forall' loops typically result in multicore task-parallelism with each task also being a candidate for vectorization.