top | item 24748533

(no title)

datanecdote | 5 years ago

> 2. I don't entirely follow this point. Perhaps using PyArrow's parser would be faster than what is timed here, but is that what the typical Python data science user would do?

I am a Python data science user. If data gets big enough such that loading time is a bottleneck, I use parquet files instead of CSV, and PyArrow to load them into pandas. It’s a one line change. The creator of Pandas is now leading the Arrow project. It’s very seamless. Don’t know if I’m typical but that’s me.

discuss

order

ViralBShah|5 years ago

Perhaps not directly relevant to your point here, but thought it would be interesting to anyone following along.

Jacob Quinn (karbacca) also has a Julia package for integrating Julia into the Arrow ecosystem: https://github.com/JuliaData/Arrow.jl

datanecdote|5 years ago

Thanks Viral. To be clear, I’m a python user who’s cheering for Julia, because I live the problems of python and do see the potential of Julia as a better path. But unfortunately I’m not prepared to be the early adopter (at least in my day job), and will wait until other, braver users have sanded off the rough edges. God speed and good luck.