top | item 39477527

(no title)

daviesliu | 2 years ago

JuiceFS relies on the object store to provide integrity for data. Besides that, JuiceFS stores the checksum of each object as tags in S3, and verifies that when downloading the objects.

Inside the metadata service, it uses merkle tree (hash of hash) to verify the integrity of whole namespace (including id of data blocks) between RAFT replicas. Once we store the hash (4 bytes) of each objects into metadata, it should provide the integrity of the whole namespace.

discuss

order

amluto|2 years ago

Does JuiceFS allow the user to specify the hash of a file when uploaded? And then to read that hash back later?

Otherwise there’s no end-to-end integrity check.

daviesliu|2 years ago

The S3 API allow user to specify the hash of content as HTTP header, it will be verified by the JuiceFS gateway and persisted into JuiceFS as ETag.

With POSIX API or HDFS, there is no such API to do that, unfortunately.

markhahn|2 years ago

surely you mean that the FS should calculate the hash on file creation/update, not take some random value from the user. but I agree that a FS that maintains file-content hash should allow clients to query it.