top | item 41381109

How to process 100gb tsv and XML files?

2 points| anindha | 1 year ago

I am trying to parse a music data file that is close to 100gb. What app or programming language is best for handling a file like this?

Thanks!

6 comments

order

FlyingAvatar|1 year ago

It really depends on what you need to do with the data, but in most cases Python could do this pretty easily with csv.reader (with a \t delimiter for TSV) or xml.etree.ElementTree.iterparse (for XML) in streaming fashion such that you're not loading the whole file at once.

datadrivenangel|1 year ago

What kind of single music data file is 100gb?

Also how is it structured? If it's actually a tab separated value file, consider using something like polars or DuckDB?