That reminds me of the entertaining "I just want to serve 5 terabytes. Why is this so difficult?" video that someone made inside Google. It satirizes the difficulty of getting things done at production scale.
Nothing in that video is about scale. Or the difficulty of serving 5TB. It's about the difficulty of implementing n+1 redundancy with graceful failover inside cloud providers.
User: "I want to serve 5TB."
Guru: "Throw it in a GKE PV and put nginx in front of it."
Congratulations, you are already serving 5TB at production scale.
The interesting thing is there also paradoxes of large scale: things that get more difficult with increasing size.
Medium- and smaller-scale can often be more flexible because they don't have to incur the pain of nonuniformity as scale increases. While they may not be able to afford optimizations or discounts with larger, standardized purchases, they can provide more personalized services large scale cannot hope to provide.
Depends on what exactly you want to do with it. Hetzner has very cheap Storage boxes (10TB for $20/month with unlimited traffic) but those are closer to FTP boxes with a 10 connection limit. They are also down semi-regularly for maintenance.
For rock-solid public hosting Cloudflare is probably a much better bet, but you're also paying 7 times the price. More than a dedicated server to host the files, but you get more on other metrics.
> Hetzner has very cheap Storage boxes (10TB for $20/month with unlimited traffic)
* based on fair use
at 250 TB/mo:
> In order to continue hosting your servers with us, the traffic use will
need to be drastically reduced. Please check your servers and confirm
what is using so much traffic, making sure it is nothing abusive, and
then find ways of reducing it.
That's if you use their CDN. Cloudflare R2 doesn't charge for egress bandwidth. If you have 100TB/mo to serve, try it and see what happens. I haven't heard of anyone being kicked off of R2 for using too much egress bandwidth yet.
At scale, you'll pay a couple thousand dollars for Class B operations on R2, and another bunch for storing the 10 TB in the first place, but that's relatively cheap compared to other offerings where you'd pay for metered egress bandwidth.
I'd suggest looking into "seedboxes" which are intended for torrenting.
I suspect the storage will be a bigger concern.
Seedhost.eu has dedicated boxes with 8TB storage and 100TB bandwidth for €30/month. Perhaps you could have that and a lower spec one to make up the space.
Prices are negotiable so you can always see if they can meet your needs for cheaper than two separate boxes.
> I'd suggest looking into "seedboxes" which are intended for torrenting.
Though be aware that many (most?) seedbox arrangements have no redundancy, in fact some are running off RAID0 arrays or similar. Host has a problem like a dead drive: bang goes your data. Some are very open about this, afterall for the main use case cheap space is worth the risk, some far less so…
Of course if the data is well backed up elsewhere or otherwise easy to reproduce or reobtain this may not be a massive issue and you've just got restore time to worry about (unless one of your backups can be quickly made primary so restore time is as little as a bit of DNS & other configuration work).
Yep, resellers of dedicated machines rent servers in bulk so you can often get boxes for way cheaper than you would directly from the host. Take a look at https://hostingby.design as an example.
It's impossible to answer this question without more information. What is the use profile of your system? How many clients, how often, what's the burst rate, what kind of reliability do you need? These all change the answer.
"Impossible", yet many others have succeeded commendably... explore what they can do but you cannot. Or else offer examples wherein your constraints exist and drive another solution. "No solution without more info" is a cop-out.
Consider storing the data on Backblaze B2 ($0.005/GB/month) and serving content via Cloudflare (egress from B2 to Cloudflare is free through their Bandwidth Alliance).
(No affiliation with either; just a happy customer for a tiny personal project)
Man, thanks so much for this. I’m using Wasabi with a Yarkon front end right now and it’s great, but Backblaze/Cloudflare is looking like a serious contender.
BuyVM has been around a long time and have a good reputation. I’ve used them on and off for quite a while.
They have very reasonably priced KVM instances with unmetered 1G (10G for long-standing customers) bandwidth that you can attach “storage slabs” up to 10TB ($5 per TB/mo). Doubt you will find better value than this for block storage.
At some point you still need a seed for that 10TB of data with some level of reliability. WebTorrent only solves the monthly bandwidth iff you've got some high capacity seeds (your servers or long-term peers).
And they just added TCP client sockets in Workers. We are just one step step away from being able to serve literally anything on their amazing platform (listener sockets).
They don't really maintain the regular Sync client anymore, only the expensive enterprise Connect option. My wife and I used Resilio Sync for years, but had to migrate away, since it had bugs and issues with newer OS versions, but they didn't care to fix them. Let alone develop new features.
If price is a consideration, you might consider two 10 TB hard drives on machines on two home gbps Internet connections. It's highly unlikely that both would go down at the same time, unless they were in the same area, on the same ISP.
I would also like to ask everyone about suggestions for deep storage of personal data, media etc. 10TB with no need for access unless in case of emergency data loss. I'm currently using S3 intelligent tiering.
I like to use rsync.net for backups. You can use something like borg, rsync, or just sftp/sshfs mount. Its not as cheap as something like S3 deep (in terms of storage) but it is pretty convient. The owner is a absolute machine and frequently visits HN too.
S3 is tough to beat on storage price. Another plus is that the business model is transparent, i.e., you don't need to worry about the pricing being a teaser rate or something.
Of course the downside is that, if you need to download that 10TB, you'll be out $900! If you're worried about recovering specific files only this isn't as big an issue.
Wasabi is the best option for you. 10TB would be around 60$/month and they offer free egress as much as your storage. So you can download upto 10TB per month.
Glacier Deep Archive is exactly what you want for this, that would be something like $11/month ongoing, then about $90/TB in the event of retrieval download. Works well except for tiny (<150KB) files.
Note that there is Glacier and Glacier Deep Archive. The latter is cheaper but longer minimum storage periods. You can use it as a life cycle rule.
I helped run a wireless research data archive for a while. We made smaller data sets available via internet download but for the larger data sets we asked people to send us a hard drive to get a copy. Sneakernet can be faster and cheaper than using the internet. Even if you wanted to distribute 10TB of _new_ data every month, mailing hard drives would probably be faster and cheaper, unless all your customers are on Internet2 or unlimited fiber.
The answer to this question depends entirely on the details of the use case. For example, if we're talking about an HTTP server where a small number of files are more popular and are accessed significantly more frequently than most others, you can get a bunch of cheap VPS with low storage/specs but a lot of cheap bandwidth to use as cache servers to significantly reduce the bandwidth usage on your backend.
I always assumed having a raspberry pi with a couple HDs in raid1 with IPFS or torrent would be the best way to do this.
Giving another one of these raid1 rpis to a friend could make it reasonably available.
I am very interested to know if there are good tools around this though, such as a good way to serve a filesystem (nfs-like for example) via torrent/ipfs and if the directories could be password protected in different ways, like with an ACL. That would be the revolutionary tech to replace huggingface/dockerhub, or Dropbox, etc.
If you just want to be able to sync a directory between multiple devices with encryption options I'd recommend Syncthing. It's dead easy to set up, I've currently got it on a rpi backing up all my photos from my phone while syncing my Obsidian vault beteen my phone and desktop.
Hell, make me a fair offer and I'll throw it up on ye olde garage cluster. That thing has battery backup, a dedicated 5 Gbps pipe, and about 40 TB free space on Ceph. I'll even toss in free incident response if your URL fails to resolve. But it'll probably be your fault, cause I haven't needed a maintenance window on that thing in like three years.
Spend some time on https://www.webhostingtalk.com/ and you will find a lot of info. For example https://www.fdcservers.net/ can give you 10TB storage and 100GB bw for around $300....but keep in mind the lower the price you pay, the lower the quality...just like any other products.
OVH is probably your best bet and should be the cheapest both for hosting and serving the files. You'd be hard pressed to beat the value there without buying your own servers and colocating in eastern Europe.
Most of their storage servers have 1gbps unmetered public bandwidth options and that should be sufficient to serve ~4TB per day, reliably.
Surprised no one has said Cloudflare Pages. Might not work though depending on your requirements since there’s a max of 20,000 files of no more than 25 mb per project. But if you can fit under that, it’s basically free. If your requirements let you break it up by domain, you can split your data across multiple projects too. Latency is amazing too since all the data is on their CDN.
Smaller VPS providers are a good value for this. I'm currently using ServaRICA for a 2TB box, $7/mo. I use it for some hosting, but mostly for incremental ZFS backups. Storage speed isn't amazing, but it suits my use case.
I'm using cloudflare R2 for a couple hundred GB, where I needed something faster.
I think 2x 1 Gb/s symmetric home fibers + SuperMicro 12x SATA Atom Mini-ITX with Samsung drives can solve this fairly cheaply and durably depending on write intensity.
That said above 80 TB is looking hard for eternity, unless you can provide backup power and endure noise of spinning drives.
You could do this for about $1k/mo with Linode and Wasabi.
For FastComments we store assets in Wasabi and have services in Linode that act as an in-memory+on disk LRU cache.
We have terabytes of data but only pay $6/mo for Wasabi, because the cache hit ratio is high and Wasabi doesn't charge for egress until your egress is more than your storage or something like that.
The rest of the cost is egress on Linode.
The nice thing about this is we gets lots of storage and downloads are fairly fast - most assets are served from memory in userspace.
Following thread to look for even cheaper options without using cloudflare lol
Hetzner has excellent connectivity: https://www.hetzner.com/unternehmen/rechenzentrum/
They are always working to increase their connectivity. I'd even go so far to claim that in many parts of the world they outperform certain hyperscalers.
Sounds like you could find someone with a 1Gbps symmetric fiber net connection, and pay them for it and colo. I have 1Gbps and push that bandwidth every month. You know, for yar har har.
And that's only 309Mbits/s (or 39MB/s).
And a used refurbished server you can easily get loads or ram, cores out the wazoo, and dozens of TB's for under $1000. You'll need a rack, router, switch, and batt backup. Shouldn't cost much more than $2000 for this.
I once had a Hetzner dedicated server that held about 1 TB of content and did some terabytes of traffic per month (record being 1 TB/24 hours). Hetzner charged me 25€/month for that server and S3 would've been like $90/day at peak traffic.
you can definitely do this at home on the cheap. As long as you have a decent internet connection, that is ;)
10TB+ harddisks are not expensive, you can put them in an old enclosure together with a small industrial or NUC PC in your basement
I current have 45 WUH721414ALE6L4 drives in a Supermicro JBOD SC847E26 (SAS2 is way cheaper than SAS3) connected to an LSI 9206-16e controller (HCL reasons) via hybrid Mini SAS2 to Mini SAS3 cables. The SAS expanders in the JBOD are also LSI and qualified for the card. The hard drives are also qualified for the SAS expanders.
I tried this using Pine ROCKPro64 to possibly install Ceph across 2-5 RAID1 NAS enclosures. The problem is I can't get any of their dusty Linux forks to recognize the storage controller, so they're $200 paperweights.
I wrote a SATA HDD "top" utility that brings in data from SMART, mdadm, lvm, xfs, and the Linux SCSI layer. I set monitoring to look for elevated temperature, seek errors, scan errors, reallocation counts, offline reallocation, and
probational count.
> If your monthly egress data transfer is less than or equal to your active storage volume, then your storage use case is a good fit for Wasabi’s free egress policy
> If your monthly egress data transfer is greater than your active storage volume, then your storage use case is not a good fit for Wasabi’s free egress policy.
10TB storage + 100TB bandwidth and S3 will easily be +1000 USD per month, while there are solutions out there that are fast and secure with unrestricted bandwidth for less than 100 USD per month. Magnitude cheaper with same grade in "enterprisey".
kens|2 years ago
https://www.youtube.com/watch?v=3t6L-FlfeaI
trhr|2 years ago
User: "I want to serve 5TB."
Guru: "Throw it in a GKE PV and put nginx in front of it."
Congratulations, you are already serving 5TB at production scale.
ergocoder|2 years ago
sacnoradhq|2 years ago
Medium- and smaller-scale can often be more flexible because they don't have to incur the pain of nonuniformity as scale increases. While they may not be able to afford optimizations or discounts with larger, standardized purchases, they can provide more personalized services large scale cannot hope to provide.
foobarbecue|2 years ago
hardware2win|2 years ago
wongarsu|2 years ago
For rock-solid public hosting Cloudflare is probably a much better bet, but you're also paying 7 times the price. More than a dedicated server to host the files, but you get more on other metrics.
KomoD|2 years ago
* based on fair use
at 250 TB/mo:
> In order to continue hosting your servers with us, the traffic use will need to be drastically reduced. Please check your servers and confirm what is using so much traffic, making sure it is nothing abusive, and then find ways of reducing it.
fragmede|2 years ago
At scale, you'll pay a couple thousand dollars for Class B operations on R2, and another bunch for storing the 10 TB in the first place, but that's relatively cheap compared to other offerings where you'd pay for metered egress bandwidth.
https://developers.cloudflare.com/r2/pricing/ https://r2-calculator.cloudflare.com/
yakubin|2 years ago
codersfocus|2 years ago
throwaway2990|2 years ago
support seems non existent cos no one answers emails or web chat…
novok|2 years ago
psychphysic|2 years ago
I suspect the storage will be a bigger concern.
Seedhost.eu has dedicated boxes with 8TB storage and 100TB bandwidth for €30/month. Perhaps you could have that and a lower spec one to make up the space.
Prices are negotiable so you can always see if they can meet your needs for cheaper than two separate boxes.
dspillett|2 years ago
Though be aware that many (most?) seedbox arrangements have no redundancy, in fact some are running off RAID0 arrays or similar. Host has a problem like a dead drive: bang goes your data. Some are very open about this, afterall for the main use case cheap space is worth the risk, some far less so…
Of course if the data is well backed up elsewhere or otherwise easy to reproduce or reobtain this may not be a massive issue and you've just got restore time to worry about (unless one of your backups can be quickly made primary so restore time is as little as a bit of DNS & other configuration work).
GOATS-|2 years ago
KomoD|2 years ago
jedberg|2 years ago
aledalgrande|2 years ago
omniglottal|2 years ago
qeternity|2 years ago
Throw that in RAID10 and you'll have 12TB usable space with > 300TB bandwidth.
dinvlad|2 years ago
princevegeta89|2 years ago
feifan|2 years ago
(No affiliation with either; just a happy customer for a tiny personal project)
rewgs|2 years ago
indigodaddy|2 years ago
They have very reasonably priced KVM instances with unmetered 1G (10G for long-standing customers) bandwidth that you can attach “storage slabs” up to 10TB ($5 per TB/mo). Doubt you will find better value than this for block storage.
https://buyvm.net/block-storage-slabs/
j45|2 years ago
noja|2 years ago
wwwtyro|2 years ago
[0] https://github.com/webtorrent/webtorrent
giantrobot|2 years ago
bosch_mind|2 years ago
ttul|2 years ago
slashdev|2 years ago
gok|2 years ago
bombcar|2 years ago
andai|2 years ago
It's like Dropbox except peer to peer. So it's free, limited only by your client side storage.
The catch is it's only peer to peer (unless they added a managed option), so at least one other peer must be online for sync to take place.
microtonal|2 years ago
callamdelaney|2 years ago
See: https://wasabi.com/cloud-storage-pricing/#cost-estimates
They could really do with making the bandwidth option on this calculator better.
mkroman|2 years ago
johnklos|2 years ago
sireat|2 years ago
That is yourdomain.com -> IP_ISP1, IP_ISP2
Going the other way from yourserver -> outside would indicate some sort of bonding setup.
It is not trivial for a home lab.
I use 3 ISPs at home and just keep each network separate (different hardware on each) even though in theory the redundancy would be nice.
walthamstow|2 years ago
Atlas22|2 years ago
ericpauley|2 years ago
Of course the downside is that, if you need to download that 10TB, you'll be out $900! If you're worried about recovering specific files only this isn't as big an issue.
NilsIRL|2 years ago
msh|2 years ago
harrymit907|2 years ago
seized|2 years ago
Note that there is Glacier and Glacier Deep Archive. The latter is cheaper but longer minimum storage periods. You can use it as a life cycle rule.
jedberg|2 years ago
Hamuko|2 years ago
NotYourLawyer|2 years ago
oefrha|2 years ago
rapjr9|2 years ago
radar1310|2 years ago
bakugo|2 years ago
ez_mmk|2 years ago
chaxor|2 years ago
Giving another one of these raid1 rpis to a friend could make it reasonably available.
I am very interested to know if there are good tools around this though, such as a good way to serve a filesystem (nfs-like for example) via torrent/ipfs and if the directories could be password protected in different ways, like with an ACL. That would be the revolutionary tech to replace huggingface/dockerhub, or Dropbox, etc.
Anyone know of or are working on such tech?
Voklen|2 years ago
trhr|2 years ago
jayonsoftware|2 years ago
tgtweak|2 years ago
Most of their storage servers have 1gbps unmetered public bandwidth options and that should be sufficient to serve ~4TB per day, reliably.
bicijay|2 years ago
gravitronic|2 years ago
scottmas|2 years ago
jacob019|2 years ago
I'm using cloudflare R2 for a couple hundred GB, where I needed something faster.
bullen|2 years ago
That said above 80 TB is looking hard for eternity, unless you can provide backup power and endure noise of spinning drives.
ignoramous|2 years ago
I briefly looked at services selling storage on FileCoin / IPFS and Chia, but couldn't find anything that inspired confidence.
justinclift|2 years ago
Also, any idea on the number of users (both average, and peak) you'd expect to be downloading at once?
Does latency of their downloads matter? eg do downloads need to start quickly like a CDN, or as long as they work is good enough?
roetlich|2 years ago
cheeseprocedure|2 years ago
https://www.datapacket.com/pricing
j45|2 years ago
Assuming the simplest need is making files available :
1) Sync.com provides unlimited hosting and file sharing from it.
Sync is a decent Dropbox replacement with a few more bells and whistles.
2) BackBlaze business let’s you deliver files for free via their CDN. $5/TB per month storage plus free egress via their CDN.
https://www.backblaze.com/b2/solutions/developers.html
Backblaze seems to be 70-80% cheaper than S3 as it claims.
Traditional best practice cloud paths are optimized to be a best practice to generate profit for the cloud provider.
Luckily it’s nice to rarely be alone or the first to have a need.
ericlewis|2 years ago
activiation|2 years ago
mindcrash|2 years ago
You might want to check out OVH or - like mentioned before - Hetzner.
api|2 years ago
Cloud bandwidth is absolutely enormously overpriced.
FBISurveillance|2 years ago
sacnoradhq|2 years ago
I can't justify colo unless I can get 10U for $300/month with 2kW of PDU, 1500 kWh, and 1 GbE uncapped.
amluto|2 years ago
http://he.net/colocation.html
winrid|2 years ago
For FastComments we store assets in Wasabi and have services in Linode that act as an in-memory+on disk LRU cache.
We have terabytes of data but only pay $6/mo for Wasabi, because the cache hit ratio is high and Wasabi doesn't charge for egress until your egress is more than your storage or something like that.
The rest of the cost is egress on Linode.
The nice thing about this is we gets lots of storage and downloads are fairly fast - most assets are served from memory in userspace.
Following thread to look for even cheaper options without using cloudflare lol
qeternity|2 years ago
This is still crazy expensive. Cloud providers have really warped people’s expectations.
winrid|2 years ago
You could just run Varnish with the S3 backend. Popular files will be cached locally on the server, and you'll pay a lot less for egress from Wasabi.
winrid|2 years ago
k8sToGo|2 years ago
GC_tris|2 years ago
Hetzner has excellent connectivity: https://www.hetzner.com/unternehmen/rechenzentrum/ They are always working to increase their connectivity. I'd even go so far to claim that in many parts of the world they outperform certain hyperscalers.
pierat|2 years ago
And that's only 309Mbits/s (or 39MB/s).
And a used refurbished server you can easily get loads or ram, cores out the wazoo, and dozens of TB's for under $1000. You'll need a rack, router, switch, and batt backup. Shouldn't cost much more than $2000 for this.
jstx1|2 years ago
Hamuko|2 years ago
abatilo|2 years ago
hdjjhhvvhga|2 years ago
KaiserPro|2 years ago
who are you serving it to?
how often does the data change?
is it read only?
What are you optimising for, speed, cost or availability? (pick two)
dark-star|2 years ago
sacnoradhq|2 years ago
I tried this using Pine ROCKPro64 to possibly install Ceph across 2-5 RAID1 NAS enclosures. The problem is I can't get any of their dusty Linux forks to recognize the storage controller, so they're $200 paperweights.
I wrote a SATA HDD "top" utility that brings in data from SMART, mdadm, lvm, xfs, and the Linux SCSI layer. I set monitoring to look for elevated temperature, seek errors, scan errors, reallocation counts, offline reallocation, and probational count.
jszymborski|2 years ago
slashdev|2 years ago
> If your monthly egress data transfer is greater than your active storage volume, then your storage use case is not a good fit for Wasabi’s free egress policy.
https://wasabi.com/paygo-pricing-faq/
risyachka|2 years ago
dboreham|2 years ago
snihalani|2 years ago
bluedino|2 years ago
influx|2 years ago
ignoramous|2 years ago
sacnoradhq|2 years ago
pollux1997|2 years ago
rozenmd|2 years ago
wahnfrieden|2 years ago
kokizzu5|2 years ago
subhro|2 years ago
daguava|2 years ago
[deleted]
delduca|2 years ago
aborsy|2 years ago
If you recover only small data, it’s also not expensive. The only problem is if you recover large data. That would be a major problem.
capableweb|2 years ago
sacnoradhq|2 years ago