top | item 16679738

(no title)

vamin | 8 years ago

An RNA sequencing run generates on the order of 10GB of data, a typical study requires many runs (treatments, controls, replication of results, etc), and posting the raw data is required by most biology journals. I'm not surprised that there is over 1PB of data available to curate.

discuss

order

dekhn|8 years ago

Oh, you mean BAM files? Get yourself a retention policy; you don't need to keep RNA BAM files that long.

I thought you meant derived data.

vamin|8 years ago

I'm talking about the raw reads, which is important if you want to try a different alignment or base-calling method. You can debate how important it is to be able to do that, but I'm not trying to argue that the data should be kept, I was just explaining why the total size of publicly available RNA-seq data (the sum total of which the parent is attempting to organize) runs in the petabytes.