top | item 40816687

(no title)

dmw_ng | 1 year ago

> converting huge amount of xml files

> pickling

Sounds like if this is the tooling and the task at hand, about the most complex things that should be passing through the pickler are partitioned lists of filenames rather than raw data. E.g. you can have each partition generate a parquet for combining in a final step (pyarrow.concat_tables() looks useful), or if it were some other format you were working with, potentially sending flat arrays back to the parent process as giant bytestrings or similar

This is not to say the limitations don't suck, just that very often there are simple approaches to avoid most of the pain

discuss

order