top | item 2438327

(no title)

bayes | 15 years ago

This makes me worried about hash collisions as well. The article implies that a file whose hash matches something they already have will never even reach their servers - so presumably I just have to keep my fingers crossed that the file they're synchronising to all my machines is the one I uploaded, and not some other user's completely different file that happens to have the same hash?

discuss

order

thomaswmeyer|15 years ago

SHA256, which Dropbox uses, has around 10^77 possible hashes. That's 100 trillion quadrillion quadrillion quadrillion quadrillion possible values. So I wouldn't worry about hash collisions if I were you. If a hash collision happened easily in SHA256, that would be very big news for the security community, and much more serious services than Dropbox would be affected.

ceejayoz|15 years ago

Deduping with hash and filesize in bytes should make a collision unlikely enough.

gojomo|15 years ago

This is a common sentiment, but not really sensible. If you want to store (for example) another 64 bits worth of information, you would always be better off with 64 more bits of some strong hash than 64 bits of filesize.

sukuriant|15 years ago

But still possible. Whenever the number of bits in a file is more than the number of bits in a hash, there are collisions. They could use the Chinese Remainder Theorem, but that would only go so far (maybe far enough to remove substantial doubt? The link below seems to suggest so.)

Relavent to the discussion: http://stackoverflow.com/questions/622930/purposely-create-t...

schwanksta|15 years ago

At first, that was what I thought the flaw would be -- providing a file that has a hash that collides with another file, gets you that file.

But it seems to me you would need to know the exact contents of the file in question to get that to happen, making the point moot. Perhaps I'm wrong on that.

avdempsey|15 years ago

Do they check filesize too? What are the odds of a hash collision + identical filesize? We might need Carl Sagan to answer that one.

Groxx|15 years ago

Given that they do chunks of updates, rather than the whole file, presumably they check multiple hashes to ensure this doesn't happen. I could be wrong, though.