(no title)
jointpdf | 3 years ago
df = pandas.read_parquet(‘foo.parquet’)
df.to_csv(‘foo.csv’)
df.to_json(‘foo.json’)
(no sarcasm)—how could it be simpler than that? What problems have you encountered that make it unusable?jointpdf | 3 years ago
df = pandas.read_parquet(‘foo.parquet’)
df.to_csv(‘foo.csv’)
df.to_json(‘foo.json’)
(no sarcasm)—how could it be simpler than that? What problems have you encountered that make it unusable?
scrollaway|3 years ago
wenc|3 years ago
Pandas and Arrow are dependencies like any other. Pandas is like a DSL for working with tabular data, much like numpy is a DSL for working with arrays and numerical algebra. No one working with linear algebra will insist on using the Python standard library built ins.
If you’re distributing a smallish Python app that only needs to read and manipulate smallish amounts of data, then I agree there are easier solves like SQLite.
But if you’re doing consulting work and dealing with large tabular datasets and need to do SQL type window functions and aggregations then Parquet is a better fit and the disk space required for adding a Pandas dependency is trivial. If one is using Anaconda, Pandas is batteries included. It really depends on what is being optimized for.