top | item 41601730

CuPy: NumPy and SciPy for GPU

377 points| tanelpoder | 1 year ago |github.com

106 comments

order
[+] gjstein|1 year ago|reply
The idea that this is a drop in replacement for numpy (e.g., `import cupy as np`) is quite nice, though I've gotten similar benefit out of using `pytorch` for this purpose. It's a very popular and well-supported library with a syntax that's similar to numpy.

However, the AMD-GPU compatibility for CuPy is quite an attractive feature.

[+] ogrisel|1 year ago|reply
Note that NumPy, CuPy and PyTorch are all involved in the definition of a shared subset of their API:

https://data-apis.org/array-api/

So it's possible to write array API code that consumes arrays from any of those libraries and delegate computation to them without having to explicitly import any of them in your source code.

The only limitation for now is that PyTorch (and to some lower extent cupy as well) array API compliance is still incomplete and in practice one needs to go through this compatibility layer (hopefully temporarily):

https://data-apis.org/array-api-compat/

[+] KeplerBoy|1 year ago|reply
One could also "import jax.numpy as jnp". All those libraries have more or less complete implementations of numpy and scipy (i believe CuPy has the most functions, especially when it comes to scipy) functionality.

Also: You can just mix match all those functions and tensors thanks to the __cuda_array_interface__.

[+] Narhem|1 year ago|reply
As nice as it is to have a drop in replacement, most of the cost of GPU computing is moving memory around. Wouldn’t be surprised if this catches unsuspecting programmers in a few performance traps.
[+] BiteCode_dev|1 year ago|reply
Wondering why AMD isn'y currently heavily investing into creating tons of adapter like this to help yhe transition from cuda.
[+] paperplatter|1 year ago|reply
Hm. Tempted to try pytorch on my Mac for this. I have an AS chip rather than a Nvidia GPU.
[+] WCSTombs|1 year ago|reply
> However, the AMD-GPU compatibility for CuPy is quite an attractive feature.

Last I checked (a couple months ago) it wasn't quite there, but I totally agree in principle. I've not gotten it to work on my Radeons yet.

[+] sspiff|1 year ago|reply
It only supports AMD cards supported by ROCm, which is quite a limited set.

I know you can enable ROCm for other hardware as well, but it's not supported and quite hit or miss. I've had limited success with running stuff against ROCm on unsupported cards, mainly having issues with memory management IIRC.

[+] hedgehog|1 year ago|reply
It's kind of unfortunate that EagerPy didn't get more traction to make that kind of switching even easier.
[+] amarcheschi|1 year ago|reply
I'm supposed to end my undergraduate degree with an internship at the italian national research center and i'll have to use pytorch to write ml models from paper to code, i've tried looking at the tutorial but i feel like there's a lot going on to grasp. until now i've only used numpy (and pandas in combo with numpy), i'm quite excited but i'm a bit on the edge because i can't know whether i'll be up to the task or not
[+] curvilinear_m|1 year ago|reply
I'm surprised to see pytorch and Jax mentioned as alternatives but not numba : https://github.com/numba/numba

I've recently had to implement a few kernels to lower the memory footprint and runtime of some pytorch function : it's been really nice because numba kernels have type hints support (as opposed to raw cupy kernels).

[+] killingtime74|1 year ago|reply
Numba doesn't support GPU though
[+] meisel|1 year ago|reply
When building something that I want to run on both CPU and GPU, depending, I’ve found it much easier to use PyTorch than some combination of NumPy and CuPy. I don’t have to fiddle around with some global replacing of numpy.* with cupy.*, and PyTorch has very nearly all the functions that those libraries have.
[+] setopt|1 year ago|reply
Interesting. Any links to examples or docs on how to use PyTorch as a general linear algebra library for this purpose? Like a “SciPy to PyTorch” transition guide if I want to do the same?
[+] johndough|1 year ago|reply
CuPy is probably the easiest way to interface with custom CUDA kernels: https://docs.cupy.dev/en/stable/user_guide/kernel.html#raw-k...

And I recently learned that CuPy has a JIT compiler now if you prefer Python syntax over C++. https://docs.cupy.dev/en/stable/user_guide/kernel.html#jit-k...

[+] einpoklum|1 year ago|reply
> probably the easiest way to interface with custom CUDA kernels

In Python? Perhaps. Generally? No, it isn't. Try: https://github.com/eyalroz/cuda-api-wrappers/

Full power of the CUDA APIs including all runtime compilation options etc.

(Yes, I wrote that...)

[+] towerwoerjrrwer|1 year ago|reply
Cupy was first but at this point you're better of using JAX.

Has a much larger community, a big push from Google Research, and unlike PFN's Chainer (of which CuPy is the computational base), is not semi-abandoned.

Kind of sad to see CuPy/Chainer eco-system die: not only did they pioneer the PyTorch programming model, but also stuck to Numpy API like JAX does (though the AD is layered on top in Chainer IIRC).

[+] WanderPanda|1 year ago|reply
JAX still has the

"This is a research project, not an official Google product. Expect bugs and sharp edges. Please help by trying it out, reporting bugs, and letting us know what you think!"

disclaimer in its readme. This is quite scary, especially coming from Google which is known to abandon projects out of the blue

[+] low_tech_love|1 year ago|reply
I tried Jax last year and was not impressed. Maybe it was just me, but everything I tried to do (especially with the autograd stuff) involved huge compilation times and I simply could not get the promised performance. Maybe I’ll try again and read some new documentation, since everyone is excited about it.
[+] kmaehashi|1 year ago|reply
CuPy isn't semi-abandoned as well, obviously :)
[+] p1esk|1 year ago|reply
I agree. Though it's good to have options for GPU accelerated Numpy. Especially if Google decides to discontinue Jax at some point.
[+] __mharrison__|1 year ago|reply
I taught my numpy class to a client who wanted to use GPUs. Installation (at that time) was a chore but afterwards it was really smooth using this library. Big gains with minimal to no code changes.
[+] SubiculumCode|1 year ago|reply
As an aside, since I was trying to install CuPy the other day and was having issues.

Open projects on github often (at least superficially) require specific versions of Cuda Toolkit (and all the specialty nvidia packages e.g. cudann), Tensorflow, etc, and changing the default versions of these for each little project, or step in a processing chain, is ridiculous.

pyenv et al have really made local, project specific versions of python packages much easier to manage. But I haven't seen a similar type solution for cuda toolkit and associated packages, and the solutions I've encountered seem terribly hacky..but I'm sure though that this is a common issue, so what do people do?

[+] kmaehashi|1 year ago|reply
As a maintainer of CuPy and also as a user of several GPU-powered Python libraries, I empathize with the frustrations and difficulties here. Indeed, one thing CuPy values is to make the installation step as easy and universal as possible. We strive to keep the binary package footprint small (currently less than 100 MiB), keep dependencies to a minimum, support wide variety of platforms including Windows and aarch64, and do not require a specific CUDA Toolkit version.

If anyone reading this message has encountered a roadblock while installing CuPy, please reach out. I'd be glad to help you.

[+] ttyprintk|1 year ago|reply
You can jam the argument cudatoolkit=1.2.3 when creating conda environments.

NB I’m using Miniforge.

[+] mardifoufs|1 year ago|reply
One way to do it is to explicitly add the link to say, the pytorch+CUDA wheel from the pytorch repos in your requirements.txt instead of using the normal pypi package. Which also sucks because you then have to do some other tweaks to make your requirements.txt portable across different platforms...

(and you can't just add another index for pip to look for if you want to use python build so it has to be explicitly linked to the right wheel, which absolutely sucks especially since you cannot get the CUDA version from pypi)

[+] welder|1 year ago|reply
Yes, you need to install the right version or Cupy hangs forever when installing via pip:

    pip install cupy-cuda12x
[+] coeneedell|1 year ago|reply
Ugh… docker containers. I also wish there was a simpler way but I don’t think there is.
[+] m_d_|1 year ago|reply
conda provides cudatoolkit and associated packages. Does this solve the situation?
[+] whimsicalism|1 year ago|reply
in real life everyone just uses containers, might not be the answer you want to hear though
[+] sdenton4|1 year ago|reply
Why not Jax?
[+] johndough|1 year ago|reply
> Why not Jax?

- JAX Windows support is lacking

- CuPy is much closer to CUDA than JAX, so you can get better performance

- CuPy is generally more mature than JAX (fewer bugs)

- CuPy is more flexible thanks to cp.RawKernel

- (For those familiar with NumPy) CuPy is closer to NumPy than jax.numpy

But CuPy does not support automatic gradient computation, so if you do deep learning, use JAX instead. Or PyTorch, if you do not trust Google to maintain a project for a prolonged period of time https://killedbygoogle.com/

[+] bee_rider|1 year ago|reply
Real answer: CuPy has a name that is very similar to SciPy. I don’t know GPU, that’s why I’m using this sort of library, haha. The branding for CuPy makes it obvious. Is Jax the same thing, but implemented better somehow?
[+] palmy|1 year ago|reply
cupy came out a long time before Jax; remember using it in a project for my BSc around 2015-2016.

Cool to see that it's still kicking!

[+] setopt|1 year ago|reply
I’ve been using CuPy a bit and found it to be excellent.

It’s very easy to replace some slow NumPy/SciPy calls with appropriate CuPy calls, with sometimes literally a 1000x performance boost from like 10min work. It’s also easy to write “hybrid code” where you can switch between NumPy and CuPy depending on what’s available.

[+] glial|1 year ago|reply
Are you able to share what functions or situations result in speedups? In my experience, vectorized numpy is already fast, so I'm very curious.
[+] markkitti|1 year ago|reply
It's funny how much easier GPU support, especially vendor agnostic GPU support, is in Julia.
[+] einpoklum|1 year ago|reply
Your comment would be more useful with some elaboration and links.
[+] adancalderon|1 year ago|reply
If it ran in the background it could be CuPyd
[+] whimsicalism|1 year ago|reply
I was just thinking we didn’t have enough CUDA-accelerated numpy libraries.

Jax, pytorch, vanilla TF, triton. They just don’t cut it

[+] bee_rider|1 year ago|reply
Good a place as any to ask I guess. Do any of these GPU libraries have a BiCGStab (or similar) that handles multiple right hand sides? CuPy seems to have GMRES, which would be fine, but as far as I can tell it just does one right hand side.
[+] kunalgupta022|1 year ago|reply
Is anyone aware of a pandas like library that is based on something like CuPy instead of Numpy. It would be great to have the ease of use of pandas with the parallelism unlocked by gpu.
[+] lmeyerov|1 year ago|reply
We are fans! We mostly use cudf/cuml/cugraph (GPU dataframes etc) in the pygraphistry ecosystem, and when things get a bit tricky, cupy is one of the main escape hatches