(no title)
dmw_ng | 1 year ago
> pickling
Sounds like if this is the tooling and the task at hand, about the most complex things that should be passing through the pickler are partitioned lists of filenames rather than raw data. E.g. you can have each partition generate a parquet for combining in a final step (pyarrow.concat_tables() looks useful), or if it were some other format you were working with, potentially sending flat arrays back to the parent process as giant bytestrings or similar
This is not to say the limitations don't suck, just that very often there are simple approaches to avoid most of the pain
unknown|1 year ago
[deleted]