top | item 19292644

(no title)

ivmaykov | 7 years ago

Disclaimer: I’m one of the authors of the paper/blog post/code.

If you want to use signatures over the hash as proof of data set integrity, you need two things. 1) you need to make sure that hash({a}) + hash({b}) == hash({a, b}). 2) ensure that hash() is collision resistant - in other words, it needs to be computationally infeasible to find hash(S) == hash(T), S != T for any sets S and T. We prove that LtHash with our choice of parameters has this property in the paper (which is linked from the blog post).

discuss

order

dfox|7 years ago

My reading of the post is that the Hash({a, b}) is infact computed as Hash'(a) + Hash'(b) given that a and b are "rows". And thus my question is why Hash' has to have any special properties.

ivmaykov|7 years ago

I can think of hash functions that are homomorphic, but are not secure. A simple example is something like “sha256 each element separately and XOR all the resulting hashes together.” This would not be collusion resistant.

We offer a proof that LtHash with our choice of parameters provides over 200 bits of security. You would have to read the paper for the details.

rstuart4133|7 years ago

Just so I have this clear: is the issue the simple solution above doesn't work (at all) with multisets, and the birthday problem combined with potentially billions of rows means even sha256 will have a significant chance generating a collision?

scottlocklin|7 years ago

Are you guys going to use this in your blockchain?