top | item 46706784

(no title)

zigzag312 | 1 month ago

> which can be done fast enough to appear instant on the CPU

Big scanned PDFs can be problfrom more efficient processing (if it had HW support for such technique)

> Your link shows CRC32 at 7963.20 MiB/s (~7.77 GiB/s) which indicates it's either very old or isn't measuring pure CRC32 throughput

It may not be fastest implementation of CRC32, but it's also done on old Ryzen 5 3350G 3.6GHz. Below the table are results done on different HW. On Intel i7-6820HQ CRC32 achieves 27.6 GB/s.

> measures 85 GB/s (GB, GiB, eh close enough) on the Apple M1. That's fast enough that I'm comfortable calling it limited by memory bandwidth on real-world systems.

That looks incredibly suspicious since Apple M1 has maximum memory bandwidth of 68.25 GB/s [1].

> I have personally seen bitrot and network transmission errors that were not caught by xxhash-type hash functions, but were caught by higher-level checksums. The performance properties of hash functions used for hash table keys make those same functions less appropriate for archival.

Your argument is meaningless without more details. xxhash supports 128 bits, which I doubt wouldn't be able to catch an error in you case.

SHA256 is an order of magnitude or more slower than non-cryptographic hashes. In my experience archival process usually has big enough effect on performance to care about it.

I'm beginning to suspect your primary reason for disliking xxhash is because it's not de facto standard like CRC or SHA. I agree that this is a big one, but you constantly imply like there's more to why xxhash is bad. Maybe my knowledge is lacking, care to explain? Why wouldn't 128 bit xxhash be more than enough for checksums of files. AFAIK the only thing it doesn't do is protect you against tampering.

> I don't know what kopia is, but according to your link it looks like their wire protocol involves each client downloading a complete index of the repository content, including a CAS identifier for every file. The semantics would be something like Git? Their list of supported algorithms looks reasonable (blake, sha2, sha3) so I wouldn't have the same concerns as I would if they were using xxhash or cityhash.

Kopia uses hashes for block level deduplication. What would be an issue, if they used 128 bit xxhash instead of 128 bit cryptographic hash like they do now (if we assume we don't need to protection from tampering)?

[1] https://en.wikipedia.org/wiki/Apple_M1

discuss

order

minitech|1 month ago

> What would be an issue, if they used 128 bit xxhash instead of 128 bit cryptographic hash like they do now (if we assume we don't need to protection from tampering)?

malicious block hash collisions where the colliding block was introduced by some way other than tampering (e.g. storing a file created by someone else)

zigzag312|1 month ago

That's a good example. Thanks! It would be kind of an indirect tampering method.