From what I've seen the likelihood of triggering a pathological case with real-world non-malicious data is actually low enough to be ignored, given that the rolling hash function is well-crafted. I do agree that crafting malicious data to break deduplication in Dolt should be relatively easy, but I do not see how this could lead to DOS on e.g. a hosted Dolt platform. If I understand correctly, your proposed attack would only affect the rate of deduplication and by extension disk space used, and I would expect a hosted Dolt platform to have strict disk-space limits or use storage-based billing.
No comments yet.