top | item 42013180

(no title)

dbrueck | 1 year ago

Interesting! It's worth noting though that HTTP actually works very well for reliably downloading large immutable files.

And since this proposed protocol operates over TCP, there's relatively little that can be done to achieve the performance goals vs what you can already do with HTTP.

And because "everything" already speaks HTTP, you can get pretty close to max performance just via client side intelligence talking to existing backend infrastructure, so there's no need to try to get people to adopt a new protocol. Modern CDNs have gobs of endpoints worldwide.

A relatively simple client can do enough range requests in parallel to saturate typical last-mile pipes, and more intelligent clients can do fancy things to get max performance.

For example, some clients will do range requests against all IPs returned from DNS resolution to detect which servers are "closer" or less busy, and for really large downloads, they'll repeat this throughout the download to constantly meander towards the fastest sources. Another variation (which might be less common these days), is if the initial response is a redirect, it may imply redirects are being used as a load distribution mechanism, so again clients can ask again throughout the download to see if a different set of servers gets offered up as potentially faster sources. Again, all of this works today with plain old HTTP.

discuss

order

arp242|1 year ago

Last year I set up some QEMU VMs to test some things. I struggled mightily to get the FreeBSD one up and running. QEMU flags are not the easiest – lots of knobs to turn and levels to pull, but after quite a lot of time trying to get it to work, it turned out that the installer ISO was just damaged. Do'h. It's impossible to say why/how this happened, but probably during download(?)

Since then I've started to check the sums after downloading, just to be sure.

I wish every binary format would include a hash of the content.

Also this is something that can be in HTTP – it's kind of silly I need to manually download a separate sum file and run a command to check it. Servers can send a header, and user agents can verify the hash. I don't know why this isn't part of HTTP already, because it seems pretty useful to me.

varenc|1 year ago

TCP has built in checksums that prevent most data corruption. I believe this is why it’s not part of HTTP, because TCP should already be doing this for you.

I’m guessing that for your very large file download you had an unusually high number of corrupted TCP packets and some of those were extra unlucky and still had valid checksums.