top | item 45497227

(no title)

bede | 4 months ago

For BAM this could be a good place to start: https://www.htslib.org/benchmarks/CRAM.html

Happy to discuss further

discuss

order

terrelln|4 months ago

Amazing, thank you!

I will take a look as soon as I get a chance. Looking at the BAM format, it looks like the tokenization portion will be easy. Which means I can focus on the compression side, which is more interesting.

fwip|4 months ago

Another format that might be worth looking at in the bioinformatics world is hdf5. It's sort of a generic file format, often used for storing multiple related large tables. It has some built-in compression (gzip IIRC) but supports plugins. There may be an opportunity to integrate the self-describing nature of the hdf5 format with the self-describing decompression routines of openZL.