top | item 43727328

(no title)

JackC | 10 months ago

> I work for an org with close ties to arXiv, and just like us they are getting a lot more demand due to AI crawling

Funny, I also work on academic sites (much smaller than arXiv) and we're looking at moving from AWS to bare metal for the same reason. The $90/TB AWS bandwidth exit tariff can be a budget killer if people write custom scripts to download all your stuff; better to slow down than 10x the monthly budget.

(I never thought about it this way, but Amazon charges less to same-day deliver a 1TB SSD drive for you to keep than it does to download a TB from AWS.)

discuss

order

Imustaskforhelp|10 months ago

I don't understand, why don't you use cloudflare? Don't they have an unlimited egress policy with R1?

Its way more predictable in my opinion that you only pay per month a fixed amount to your storage, it can also help the fact that its on the edge so users would get it way faster than lets say going to bare metal (unless you are provisioning a multi server approach and I think you might be using kubernetes there and it might be a mess to handle I guess?)

sitkack|10 months ago

Regardless, if you are delivering PDFs, you should be using a CDN.

If crawling is a problem, 1 it is pretty easy to rate limit crawlers, 2 point them at a requestor pays bucket and 3, offer a torrent with anti leech.

mcmcmc|10 months ago

Could have something to do with Cloudflare’s abhorrent sales practices.

ryao|10 months ago

The two are not comparable. The 1TB of transit at Amazon can be subdivided over many recipients, while the solid state drive is empty and only can be sent to one.

That said, I agree that transit costs are too high.

fc417fc802|10 months ago

So order multiple drives, transfer the data to them, and drop them in the mail to the client. That should always be the higher bandwidth option, but in a sane world it would also be less cost effective given the differences in amount of energy and sorts of infrastructure involved.

The reason to switch away from fiber should be sustained aggregate throughput, not transfer cost.