top | item 18404201

Dynamic Automatic Differentiation of GPU Broadcast Kernels [pdf]

69 points| g0wda | 7 years ago |mit.edu

7 comments

jrevels|7 years ago

Author here; the arxiv version can be found at https://arxiv.org/abs/1810.08297. Not much different from OP's linked version, but it includes citations to other interesting Julia AD/TPU-related papers that utilize this technique.

Happy to answer any questions, at least until I turn in for the night :)

joe_the_user|7 years ago

Interesting. I'm still researching GPU AD stuff and just skimmed your article.

AD is basically a code transformation method.

What's the most notable way the GPU in particular comes into play?

How does caching come into play? What about intrinsic condensing functions?

rsp1984|7 years ago

Can you explain, from a very high level, what problem is being solved or which thing in a machine learning stack is improved?

I have basic knowledge in machine learning and can handle the maths part but, for instance, I have never heard the term 'broadcasting' in this context.

I'm not trying to scrutinize your work, just being genuinely curious and trying to learn.