I've also been looking for that. In an ideal world there would be a small, fast, standalone cli tool that can convert csv to parquet. There is a (sadly, unfinished) parquet writer Rust library in the Arrow repository that looks promising. All approaches I've tried so far (spark, pyarrow, drill, ...) require everything and the kitchen sink. So far I've settled on a java cli tool that uses jackson + org.apache.parquet internally, but it's cpu bound and has a huge amount of maven dependencies.
meritt|5 years ago
MrPowers|5 years ago
import pandas as pd
df = pd.read_csv('data/us_presidents.csv')
df.to_parquet('tmp/us_presidents.parquet')