top | item 44177752 Python Pandas Ditches NumPy for Speedier PyArrow 18 points| blacktulip | 9 months ago |thenewstack.io 4 comments order hn newest joshlk|9 months ago > [numpy] stores everything in rowsThis isn't true. Pandas uses Numpy to store columns of data. Theres quite a few technical errors in the article. constantcrying|9 months ago This is an insane article, I do not think the author has any idea what is going on.The comparison of numpy reading CSV to arrow reading parquet is completely bizarre and totally misses the point of switching out the underlying data format. kbrkbr|9 months ago Or maybe you did not read it properly?> Reading in that CSV file into memory would take Python 55.8 seconds, but PyArrow did the work in 11.8 seconds.It's later clarified that Pyarrow does load csv here, though the numbers don't fully add up. Also the format change is explained. agons|9 months ago It gets worse the further you go, this was where I had to bail:> the format is much favored by AI frameworks such as TensorFlow and PyCharm. LNGBandit77|9 months ago [deleted]
joshlk|9 months ago > [numpy] stores everything in rowsThis isn't true. Pandas uses Numpy to store columns of data. Theres quite a few technical errors in the article.
constantcrying|9 months ago This is an insane article, I do not think the author has any idea what is going on.The comparison of numpy reading CSV to arrow reading parquet is completely bizarre and totally misses the point of switching out the underlying data format. kbrkbr|9 months ago Or maybe you did not read it properly?> Reading in that CSV file into memory would take Python 55.8 seconds, but PyArrow did the work in 11.8 seconds.It's later clarified that Pyarrow does load csv here, though the numbers don't fully add up. Also the format change is explained.
kbrkbr|9 months ago Or maybe you did not read it properly?> Reading in that CSV file into memory would take Python 55.8 seconds, but PyArrow did the work in 11.8 seconds.It's later clarified that Pyarrow does load csv here, though the numbers don't fully add up. Also the format change is explained.
agons|9 months ago It gets worse the further you go, this was where I had to bail:> the format is much favored by AI frameworks such as TensorFlow and PyCharm.
joshlk|9 months ago
This isn't true. Pandas uses Numpy to store columns of data. Theres quite a few technical errors in the article.
constantcrying|9 months ago
The comparison of numpy reading CSV to arrow reading parquet is completely bizarre and totally misses the point of switching out the underlying data format.
kbrkbr|9 months ago
> Reading in that CSV file into memory would take Python 55.8 seconds, but PyArrow did the work in 11.8 seconds.
It's later clarified that Pyarrow does load csv here, though the numbers don't fully add up. Also the format change is explained.
agons|9 months ago
> the format is much favored by AI frameworks such as TensorFlow and PyCharm.
LNGBandit77|9 months ago
[deleted]