top | item 47131069

(no title)

everforward | 5 days ago

If the version control hash changes you have to re-download the dictionary, which is similar to redownloading the whole page.

Reddit/NYT would have to publish their changes without changing the dictionary, meaning some portions would be largely absent from the dictionary and have worse compression than gzip. Probably fine for NYT, something like Reddit might actually have worse ratios than gzip in that case.

discuss

order

superb_dev|5 days ago

Or you could use the previous version to generate the dictionary for the current version?

I would assume chunks that didn’t benefit from the dictionary would receive the standard compression, so you can’t get worse than gzip.

everforward|5 days ago

Maybe? That gets sort of awkward for frequently updated things like Reddit where there might be 10 dictionary versions between what you have and the current version. You’d need something that decides whether to get an incremental update or a new dictionary, and the hoster has to store those old dictionaries. Feels like more trouble than it’s worth.

You could compress things with gzip if the dictionary doesn’t work well, but to my understanding gzip compresses repetition. There’s less repetition in smaller chunks, so worse compression ratios. Eg compressing each comment individually has a worse net ratio than compressing all the comments at once.

It would also be annoying to merge a bunch of individually compressed blocks back together, but certainly an option