cuDF is the most impressive DataFrame implementation I've seen and have been recommending for years. The API is exceptionally close to Pandas (just a couple of different function arguments here and there), much more so than PyArrow or Modin. Throughput and energy efficiency were often 10x that of PyArrow running on a comparable TDP SotA CPU 2 years ago [1].
Does this accelerate on an M1? I know this says it is for cuda and that obviously means Nvidia GPU, but lots of ML projects have a port to Apple silicon. I would love to try this on my Mac and see what kind of acceleration my pandas tools get for free.
I wish we could commit to not conflating NVIDIA with GPU. It wouldn't hurt a soul to call it "cuDF - NVIDIA DataFrame Library." To answer your question, it will probably run on the CPU.
How does this compare to duckdb/polars? I wonder if GPU based compute engine is a good idea. GPU memory is expensive and limited. The bandwidth between GPU and main memory isn't very much either.
The same group (Nvidia/Rapids) is working on a similar project but with Polars API compatibility instead of Pandas. It seems to be quite far from completion, though.
This and Rapids.ai is the single reason that NVIDIA is the leader in AI.
They made GPU processing at scale accessible to everyone, I have been a long term user of Rapids and found that even as a data engineer I can do things on an old consumer GPU that would otherwise require a 20+ node cluster to do in the same time.
This is actually a good callout. While pipeline speedups of transforms is hugely important, lots of other fundamental python tools for viz, model examination, etc are not built on a different foundation and not optimized by pandas improvements.
I get it: some of these are legacy, others are hand optimized python since default pandas is so slow. But I'm hoping that, over time, we'll improve the runtime of the other stages of analysis too.
[+] [-] ashvardanian|1 year ago|reply
[1]: https://www.unum.cloud/blog/2022-09-20-pandas
[+] [-] fbdab103|1 year ago|reply
Unless cudf has implemented some clever dask+cudf kind of situation which can intelligently push data in/out of GPU as required?
[+] [-] xrd|1 year ago|reply
[+] [-] zamalek|1 year ago|reply
[+] [-] kiratp|1 year ago|reply
https://developer.apple.com/metal/jax/
And MLX
https://github.com/ml-explore/mlx
[+] [-] killingtime74|1 year ago|reply
[+] [-] mgt19937|1 year ago|reply
[+] [-] pbib|1 year ago|reply
See discussion: https://news.ycombinator.com/item?id=39930846
[+] [-] xs83|1 year ago|reply
They made GPU processing at scale accessible to everyone, I have been a long term user of Rapids and found that even as a data engineer I can do things on an old consumer GPU that would otherwise require a 20+ node cluster to do in the same time.
[+] [-] __mharrison__|1 year ago|reply
The ability to run coffee 100-1000x faster with this is just icing on the cake.
(I've run through this with most of my Pandas training material and it just works with no code changes.)
[+] [-] mafro|1 year ago|reply
I like pandas, and python.
[+] [-] skenderbeu|1 year ago|reply
We have like 12 different types of it in the wild. I think it's time we came up with a 1 or 2 GPU HW standards similar to how we have for CPUs.
[+] [-] CarRamrod|1 year ago|reply
[+] [-] mwexler|1 year ago|reply
I get it: some of these are legacy, others are hand optimized python since default pandas is so slow. But I'm hoping that, over time, we'll improve the runtime of the other stages of analysis too.
[+] [-] hack_ml|1 year ago|reply
HoloViews hvPlot Datashader Plotly Bokeh Seaborn Panel PyDeck cuxfilter node RAPIDS