top | item 40714384

(no title)

enisberk | 1 year ago

This is really cool work! Congrats on both the paper and the graduation! A long time ago, I worked on optimizing broadcast operations on GPUs [1]. Coming up with a strategy that promises high throughput across different array dimensionalities is quite challenging. I am looking forward to reading your work.

[1]https://scholar.google.com/citations?view_op=view_citation&h...

discuss

order

zfnmxt|1 year ago

> Congrats on both the paper and the graduation!

Thanks! Although I still have to actually graduate and the paper is in review, so maybe your congratulations are a bit premature! :)

> A long time ago, I worked on optimizing broadcast operations on GPUs [1].

Something similar happens in Futhark, actually. When something like `[1,2,3] + 4` is elaborated to `map (+) [1,2,3] (rep 4)`, the `rep` is eliminated by pushing the `4` into the `map`: `map (+4) [1,2,3]`. Futhark ultimately then compiles it to efficient CUDA/OpenCL/whatever.