(no title)
gbletr42 | 2 years ago
In terms of controlling how many blocks and their respective fragments are interleaved, that can be controlled by the -l argument in the command. I'll paste the info I have in the manpage here.
>The number of blocks(specifically their fragments) to interleave. The default number is 3, as it provides protection for a bad burst to corrupt both the block in front of and behind it.
In general, you can approximate the burst size ratio needed to destroy a set of interleaved blocks beyond repair via this equation (this may not be fully accurate as its napkin math)
let n = number of blocks to interleave, B = number of bytes per fragment, m = number of parity fragments, k = number of required fragments to rebuild a block.
((n - 1) * m * B + n - 1) / (n * (k + m) * B)
this is because a given corruption could traverse a whole fragment and then leech into another, taking some other fragments with it with the most unlucky of bursts. When you remove the B and take the limit as n goes to infinity, it approaches the ratio of m/(k+m), or in other words as you interleave more and more blocks, you get a maximal burst size closer and closer to the size of your total number of parity fragments.
Also, the header specifies exactly the size of each fragment, we don't depend on the fragments themselves to tell us their size. In fact, the only metadata a fragment has is its hash and the number of padded bytes for the whole interleaved set of fragments, the format is (almost, ignoring padded bytes) completely described from the header as a design principle. That's why a whole fragment can be turned to garbage and we don't care, just skipping ahead to the next fragment.
edit: thinking on it, I could add a mode of behavior such that it recovers as much as possible if a specific block is corrupted, rather than the situation right now where it quits streaming after it finds a corrupted block.
myself248|2 years ago
> in other words as you interleave more and more blocks, you get a maximal burst size closer and closer to the size of your total number of parity fragments.
I think this could be called out more clearly, as my default assumption was this limit case -- the file grows by X amount, I can recover from up to X amount of corruption. It sounds like that's only the case if the interleave is set to maximum, which is not the default.
Is that because, as you increase the interleave, it increases the amount of the stream you need to read in order to repair any corruption?
gbletr42|2 years ago
Yes, as you need to read the entire set of interleaved blocks to get each respective block, so to keep memory consumption low, we don't want to interleave too many blocks. I could increase the default to something higher though.
Regarding the tape problem, I was being a bit daft in my response, as if you lose a chunk of tape, you also lose some indeterminate number of data with it, which my format currently isn't capable of recovering from. I'll try to fix that to some degree in the next version, at the cost of not guaranteeing backwards compatibility as I feel it falls under that major problem comment I made in the v0.1 release.