top | item 40794913

(no title)

doppenhe | 1 year ago

Could HyParquet's approach be extended to other data formats beyond Parquet?

discuss

order

platypii|1 year ago

I definitely think that UX is an underappreciated area for machine learning data. I want to make a set of libraries and tools that make it easier for people to work with ML data in the browser. The first step of good data science is to become one with your data.

I started with parquet because most datasets for modern LLMs are in parquet format. But there are other formats like JSONL which are common too.