top | item 14793716

(no title)

helgie | 8 years ago

Using the 'cuda.jit' method as linked does require you to do things like manually setting threads and blocks, though one could argue it makes it easier than doing it in CUDA C.

However numba's 'vectorize' and 'guvectorize' decorators can also run code on the GPU. The current documentation doesn't show good GPU examples, but here's examples from the documentation for the deprecated numbapro (the CUDA things from numbapro were later added into numba): https://docs.continuum.io/numbapro/CUDAufunc

  @vectorize(['float32(float32, float32, float32)',
            'float64(float64, float64, float64)'],
            target='gpu')
  def cu_discriminant(a, b, c):
    return math.sqrt(b ** 2 - 4 * a * c)

The 'float32/64' type signatures are not strictly necessary, unless you want to define the output type (so if the inputs are 32-bit floats and you don't want it to return 64-bit floats); if given no signature numba will automatically compile a new kernel each time the function is called with a new type signature. So that function would become (but in current numba 'gpu' should be replaced with 'cuda'):

  @vectorize(target='gpu')
  def cu_discriminant(a, b, c):
    return math.sqrt(b ** 2 - 4 * a * c)

Vectorize is a little limited in that it only operates on scalars and broadcasts those scalar operations over arrays.

guvectorize is more powerful and can operate on arrays directly so something like convolution or a moving average are possible, but is slightly more complicated to use than vectorize.

Update: fixed code formatting

discuss

No comments yet.