(no title)
aurelian15 | 5 years ago
The idea is to make blocks (slightly) variable in size. Block boundaries are determined based on a limited window of preceding bytes. This way a change in one location will only have a limited impact on the following blocks.
octoberfranklin|5 years ago
There isn't a way to advertise some "rolling hash value" in a way that allows other people with a differently-aligned copy to notice that you and them have some duplicated byte ranges.
Rolling hashes only work when one person (or two people engaged in a conversation, like rsync) already has both copies.
btschaegg|5 years ago
The rolling hash is used to find the chunk boundary: Hash a window before every byte (which is cheap with a rolling hash) and compare it against a defined bit mask. For example: Check if the first 20 bytes are zero. If so, you'd get chunks with about 2^20 bytes (1 MiB) average length.
As a good explanation, I'd encourage you to look at borgbackup's internals documentation: https://borgbackup.readthedocs.io/en/stable/internals.html