top | item 22506722

Cloud Storage for $2 per TB per month

659 points| beedrillzzzzz | 6 years ago |blog.sia.tech

334 comments

order
[+] kmod|6 years ago|reply
I worked on the design of Dropbox's exabyte-scale storage system, and from that experience I can say that these numbers are all extremely optimistic, even with their "you can do it cheaper if you only target 95% uptime" caveat. Networking is much more expensive, labor is much more expensive, space is much more expensive, depreciation is faster than they say, etc etc. I don't think the authors have ever done any actual hardware provisioning before.

I didn't read all their math but I expect their final result to be off by a factor of 2-5x. Hard drives are a surprisingly low percentage of the cost of a storage system.

[+] Taek|6 years ago|reply
Author here. A lot of these numbers are drawn from experience in the mining world, where people realized that when cost is the ultimate bottom line, a lot of corners can be cut.

Sia systems don't need a ton of networking. I ran the networking buildout costs by some networking people, and again it comes down to cutting corners. If you only need 10 gbps per rack, if you don't mind having extra milliseconds added, etc, you can get away with very scrappy setups. The whole point is that it's not a highly reliable facility.

[+] chx|6 years ago|reply
> I didn't read all their math but I expect their final result to be off by a factor of 2-5x.

Can't be more than 2.5 because Backblaze B2 already gives you $5/TB/Mo.

[+] vikiomega9|6 years ago|reply
> exabyte-scale storage system

Somewhat of a random question, can you point me to some state of the art research?

[+] z3t4|6 years ago|reply
Storage can be far cheaper when decentralized. Sending data over the Atlantic is super expensive compared to LAN networking. Almost all content providers peer with ISP's with onsite hardware. But why stop there, put the "racks" in ppl's basements. Data storage is very compact now a days, you can probably fit 100 TB is a shoe-box.
[+] winfred|6 years ago|reply
>I didn't read all their math but I expect their final result to be off by a factor of 2-5x.

I looked at their parts list and it's obvious they aren't serious. CPU is missing, memory is missing, SAS to SATA cables, but no SAS controller, no mounting for the system board. Low effort at best.

[+] notyourday|6 years ago|reply
We have done this calculation and even if you put your gear into Equinix/Digital Realty in the most expensive places and use Backblaze-type setup ( which is not optimized and buying retail) bringing 10Gbit to every 4U the price for double-writes at 5TB disks are $10/year per TB.
[+] FalconSensei|6 years ago|reply
> Hard drives are a surprisingly low percentage of the cost of a storage system

THIS! There's no use of a 2TB storage if you can't upload/download this amount each month

[+] late2part|6 years ago|reply
I agree with you. Not 2-5x but they are rounding down on costs and optimistic on risks.
[+] walrus01|6 years ago|reply
I work in telecom/datacenter infrastructure and this is fanciful. The whole way they take the wattage load of one machine and then hand wave away all of the rest of the costs of either building and running a datacenter, or paying ongoing monthly colocation costs... Is just scary. I truly don't mean to offend anyone but this looks like a bunch of enthusiastic dilettantes.

Generators?

UPS?

Cooling costs?

Square footage costs for the real estate itself?

Security and staffing?

At the scale they intend to accomplish they will need at minimum several hundred kilowatts of datacenter space. Even assuming somewhere with a very low kWh cost of electricity, that much space for bare metal things isn't cheap. Go price a lot of square footage and 300kW of equipment load in Quincy, WA or anywhere else comparable, the monthly recurring dollar figure will be quite high.

And all of that is before you even start to look into network costs to build a serious IP network and interconnect with transits and peers.

[+] Ajedi32|6 years ago|reply
They're not talking about a datacenter. Datacenters need to be reliable. Sia storage pools don't, because security and reliability is achieved at the global network level, not at the level of individual systems or storage pools. 95% reliability means you can be down for two whole weeks out of every year and still be well within acceptable uptime requirements.

Generators? Who needs those? Just wait for the power to come back on. UPS? Why bother? Square footage? Stick some wooden shelves in the cheapest building possible. Cooling? Locate in a cold climate and buy some window fans.

This isn't anything like the sort of infrastructure you're used to dealing with. Think Bitcoin mining farm, not Backblaze datacenter. Any corners that can be cut will be.

[+] Taek|6 years ago|reply
Its super interesting to dive into the world of cryptocurrency mining, where some DCs are getting PUE of 1.1 or better with a buildout that's basically amounts to shelves and box fans.

No generators, just eat the downtime. No batteries. No 24/7 staff. No racks, just shelves (folded sheet metal is cheap). Security varies from farm to farm.

These servers don't need to run cool, as long as you are in a climate that doesn't get over 100 degrees you can get away with fans and no AC.

[+] jrockway|6 years ago|reply
I think it's interesting to dive into the economics of 95% uptime. Maybe you don't need full datacenter cooling; you can just locate in a cool climate and have a fan in the window blowing cold air in. If there's a blizzard, you lose your drives because snow blows in and melts on your drives. If the chance of the blizzard * price of drives is less than needing 750W of cooling, then you win. Yeah, sometimes everything shorts out.

Power is similar. Maybe you just use solar and turn the drives off when it's cloudy. With enough distribution throughout the world, it will probably be sunny somewhere.

I haven't done the math and I'm not saying it will work out favorably. I also don't have a use case for 95% availability. (That's two weeks a year where your data is gone!) But it's something that someone with the right needs could consider, and maybe come out ahead of someone shooting for 5 nines and drives that aren't covered in snow.

[+] wmf|6 years ago|reply
They're not talking about a real datacenter; they're talking about a deathtrap crypto mine with hard disks instead of GPUs/ASICs. Can't you run a million hard disks off a consumer cable modem? ;-)

At the (slow) rate Sia is growing, I don't think there will ever be enough demand to justify this design anyway.

[+] snazz|6 years ago|reply
It sounds like they're intentionally forgoing backup power and UPSs, as well as 24/7 staffing. However, you're right that this is probably pretty optimistic.
[+] late2part|6 years ago|reply
If you’re paying more than $300k/mo for 300kw of power in Quincy the DC sales guy probably bought your purchasing people a boat.
[+] late2part|6 years ago|reply
I don’t really agree. Retail Datacenter pricing all in should be well under $500/kW; half of what they suggested.
[+] mtlynch|6 years ago|reply
In 2018, I spent about six weeks running a series of tests to measure Sia's real world costs. At that time, storage cost ~$4.50/TB on Sia to back up large real world files (backups of DVDs and Blu-Rays).[0] Community members have re-run my tests every few months, most recently in October 2019, when the cost was measured at $1.31/TB, though it's worth noting that recent tests use synthetic data optimized to minimize Sia's cost.[1] It's also unclear how much the market value of Sia's utility token affects these costs, as the price of Siacoin has fallen by ~80% since I conducted my original set of tests.

The calculations in today's blog post account for the labor cost of assembling hardware, but leave out major other labor costs:

1. You need an SRE to keep the servers online. Sia pushes out updates every few months, and the network penalizes you if you don't upgrade to the latest version. In addition, to optimize costs, you need to adjust your node's pricing in response to changes in the market.

2. You need a compliance officer to handle takedown requests. Since Sia allows anyone to upload data to your server without proving their identity, there's nothing stopping anyone from uploading illegal data to the network. If Sia reached the point where people are building $4k hosting rigs, then it's safe to assume clients would also be using Sia to store illegal data. When law enforcement identifies illegal data, they would send takedown notices to all hosts who are storing copies of it, and those hosts would need someone available to process those takedowns quickly.

[0] https://blog.spaceduck.io/load-test-wrapup/

[1] https://siastats.info/benchmarking

[+] reggieband|6 years ago|reply
I'm going through Sia's website now. It seems this article is meant to bolster the claim on their website which states "When the Sia network is fully optimized, pricing will fall somewhere around $2/TB/month." [1]

Call me skeptical but it seems that they aren't committing to building out this infrastructure themselves or providing a specific amount of storage at this pricing. They seem to be outlining a potential infrastructure that some enterprising individual (or corporation) could use to provide storage at that price to "renters" within their marketplace.

I guess I'll just wait until someone puts their money where their mouth is. Given that this is a marketplace, the fact that a theoretical setup could be built to provide some service doesn't necessarily guarantee it will be built.

1. https://support.sia.tech/article/thvymhf1ff-about-renting

[+] tastroder|6 years ago|reply
Since you're bringing up the website... I don't get this marketing strategy. The cryptocurrency angle is just as off putting as telling me as a potential customer that my data will be stored on janky servers in unreliable places, no matter if the uptime is the same. OP even claims they have reliability and experience I'd consider but those aspects sure send signals that make me not want to deal with that stack.

Just looking at the website for Sia I see a bunch of fluffy marketing stuff, fair enough, that's normal these days. But where is the selling point? https://sia.tech/technology tells me my data is stored securely and in a redundant manner, great, just like any storage provider. That is followed by "Renters And Hosts Pay With Siacoin" and talks about payment channels, which links to a wikipedia article and not something that tells me how I would even pay them, not to even talk about how much (I saw the calculator thingy on my way to that site, the messaging is still weird).

The "Getting started" call to action is a similar experience, a bunch of downloads, cool - I don't even know if you're right for me yet. I'm five levels deep into the "Getting started guide" linked there and so far found that I'd apparently have to deal with weird crypto exchanges to pay somebody for this, plus I couldn't use most of my pretty standard tooling anymore (at least not without involving one of those proxy things on the getting started page that cover a few use cases, some of which seem to be operated by others?).

[+] jakear|6 years ago|reply
> That means about 2 hours of labor per rig. We’ll call that $50

Does that seem low to anyone else? I don’t really have any background in the area, but 25/hr cost to the company would be less than 20/hr pay for the skilled labor. Other countries are different of course, but in US I could make that much flipping burgers in the right area.

[+] eyegor|6 years ago|reply
It's outrageously low. They're also fancifully assuming cpu tdp = electrical power, cooling = 0w, and another 0w to the motherboard/network cards. And each box has 1 non-redundant, schmuck-grade $80 psu, as well as a consumer grade mobo. This would never be anywhere near their uptime.
[+] Taek|6 years ago|reply
Assembling computers and replacing drives is generally speaking unskilled labor, right there next to factory workers.
[+] lucb1e|6 years ago|reply
If you're from the USA, perhaps just ×2 any price you read if that helps you get through the article. Work that needs doing doesn't always need to be done in high-income/high CoL places.

I earn about €26/h before taxes in western Europe, an income which lets me live in relative luxury (not "private jet" luxury, but I literally do anything I want and still save more than a third of my income with a 36-hour work week), and that's for security consultancy which is way more specialised than the job you're talking about. I think it's also above the national average, but I don't have the statistics on hand. Not sure what the cost to the company is, I think they put in another hundred a month for health insurance or pension or something (they pay 50% and I pay 50%, though I don't see why keeping 50% off my payslip helps anyone, an employer will just deduct that from the salary they can offer) plus some overhead for accounting and whatever, but it's probably not that far off.

[+] chrisseaton|6 years ago|reply
I don't think they mean engineer time here - it's assembling the servers, so technicians, and yeah I guess yeah like most people they aren't paid like in-demand software engineers. This is how most people have to get to by!
[+] benhurmarcel|6 years ago|reply
In some developed countries you can get qualified labor that cheap. For example in Southern or Eastern Europe. Not to mention developing countries of course.
[+] ComputerGuru|6 years ago|reply
There is way too much hand-waving and assuming going on this article. It is a load of BS that does not take into account real-world inefficiencies. e.g. sometimes buying in bulk is more expensive than buying at retail, esp when you need consistent supply. Sure, you may need only an hour of sysadmin time a day, but what sysadmin will let you employ them an hour a day? The buildout did not list a CPU. The assumptions about uptime are over-amortized, an outage given the resources they quote may average out to 95% uptime but their latency for getting systems back up is going to be absolutely terrible and I’d be surprised if outages were shorter than a day or two on average. They aren’t factoring in cooling. They aren’t factoring in the drastically reduced lifetime of drives in their ridiculously cramped and under-ventilated cubbies. They are completely ignoring diagnostic time, presuming they can only quote actual repair times, which is an absolute joke given the lack of smart hardware and enterprise DC management. They think they can average out throughout over the number of drives not taking into account per-channel limitations. They are not taking into account the extra time to build and dismantle systems in their hacked-together IKEA shelves. They are underestimating the costs of electricity at commercial rates. I could go on and on, but suffice to say that I would never, ever use their network for any purpose without another backup (which they don’t finger into their costs, of course ;). I thought B2 was risky; this is taking it to an entirely different level.
[+] growt|6 years ago|reply
I feel like backblaze has already done most of this and has it in production [1]. Whereas this is just done back of the napkin calculation.

[1] https://www.backblaze.com/b2/storage-pod.html

[+] TheDong|6 years ago|reply
Backblaze has tried to make their datacenter as efficient as possible, and still only ends up hitting $5/tb/mo for their b2 service, as a point of reference.
[+] atsmyles|6 years ago|reply
According to your link, backblaze hardware costs are around 10x cheaper than OP's estimate. $2.56 per TB (based on .05 per GB) vs $25 per TB.
[+] reilly3000|6 years ago|reply
To be fair, it’s an extremely large napkin.
[+] aresant|6 years ago|reply
Top of Hacker News and there's nothing clickable above the fold that takes me to the SIA website.

Content marketers and technical marketers - don't miss the opportunity on Medium and other platforms to at VERY LEAST link to your homepage in the first section.

In fact that is at the top of this awesome piece of content marketing is a "Sign Up" button for Medium . . .

[+] pbhjpbhj|6 years ago|reply
I ended up at https://github.com/NebulousLabs/Sia and there's no activity in the last two years, the latest issues are a few "you broke my wallet with the update and my password doesn't work" from 2018.
[+] basch|6 years ago|reply
I just removed the word blog so there was no subdomain, and it worked. I mostly agree with what youre saying, but dont neglect the URL as part of the user interface. The information was there and a click away.
[+] TecoAndJix|6 years ago|reply
The word "Sia" in the first sentence links to their website
[+] border43|6 years ago|reply
I've been using Sia for about three months to backup some personal files. Nothing crazy, but it seems to work well.

I'm looking forward to seeing this project mature as well as have some more layers build on top of it moving forward. I really wish the client offered synchronization or access across multiple devices. For now you have to try third party layers on top of Sia to accomplish this.

[+] AaronFriel|6 years ago|reply
Really smart people make this mistake a lot, so I'm wondering what Sia is doing to decorrelate failure rates. If hedge fund quants can turn mortgage tranches into a machine for massive correlated economic losses, can blockchain quants turn storage tranches into a machine for massive correlated storage losses?

Or if one of the major hyperscalers or datacenter operators decides to start selling storage to Sia, it seems likely that their control plane across datacenters could result in correlated failures. A networking outage for their AS could result in multiple datacenters appearing offline concurrently, for example.

[+] TheDong|6 years ago|reply
This analysis entirely omits the cost of a sysadmin to manage the storage servers. Even if sia is assumed to do almost everything, and even if we only want 95% uptime, you still need someone to deal with software updates, hard drive monitoring, etc etc.

The profit of $570/year/box is not enough to pay a part-time sysadmin and still have any useful profit.

[+] simias|6 years ago|reply
>If we assume that the 30 hosts go offline independently

I wonder how reasonable this assumption really is. For regular CPU-bound crypto-mining we see that it tends to centralize geographically in zones where electricity, workforce and real-estate space to build a datacenter are cheap.

Assuming that Sia ends up following a similar distribution, it wouldn't be surprising if several of these hosts ended up sharing a single point of failure.

Beyond that, if only copying stuff around three times to provide tolerance is enough to lower the costs to $2/TB/Mo, why aren't centralized commercial offerings already offering something like that? Just pool three datacenters with 95+% uptime around the world and you should get the same numbers without the overhead of the decentralized solution, no? Surely the overhead of accounting for hosts going offline and redistributing the chunks alone must be very non-trivial. With a centralized, trusted solution it would be much simpler to deal with.

Or is the real catch that Sia has very high latency?

[+] WrtCdEvrydy|6 years ago|reply
I'm guessing there's not a lot of 95% datacenters that don't have heavy generators or UPS on site. You'd have to basically build a datacenter that has lower guarantees.
[+] scrooched_moose|6 years ago|reply
Wait, how are they connecting 32 drives to that motherboard? They seem to be implying they are splitting each SATA plug 4 ways, which as far as I know is impossible.

The adapter they're linking to is SF8087 to 4x SATA, not SATA to 4x SATA (which shouldn't exist). That motherboard doesn't have SF8087, it has 8 SATA3 connections.

Unless I've missed something big, SF8087 can not be plugged into SATA3.

[+] theamk|6 years ago|reply
I don't think it is correct to say that the only options are "host failures are truly independent" or "world war three".

The hosts are not ever going to be fully independent. There will be hundreds, if not thousands, host co-located in the same location -- likely of the cheapest grade, without any extras like fire alarms or halon extinguishers or redundant power feeds. A single fire (flood, broken power station) has a chance of taking out thousands of hosts simultaneously.

And there is management system as well -- AWS has thousands of engineers working on security. Will there be one at this super-cheap farm? What are the chances there will be farms with default passwords and password-less VNC connections? And since machines are likely to be cloned, any compromise affects thousands of hosts.

... and all of those things are made worse by the fact that if you store hundreds of thousands of files, your failure probability raises significantly. If a data center burns down, at least few of your files may be unlucky enough to be lost.

[+] sigstoat|6 years ago|reply
at a minimum the facility will need some power conditioning and/or insurance. you don't want a brief power surge to eat all of your capital, and lockup fees, in one go.

> For a 32 HDD system, you expect about 5 drives to fail per year. This takes time to repair and you will need on-site staff (just not 24/7). To account for these costs, we will budget $50 per year per rig.

will you not also lose 6TB (times utilization) of your lockup every time a drive dies?

> 8x 4 way SATA data splitters

you've linked to SAS breakout cables. they don't plug into SATA ports, they plug into SFF-8087 SAS ports.

they cannot plug into the motherboard you've listed. nor have I ever seen one listed for retail sale that has 8 SFF-8087 ports.

the cheapest way to get 8 SFF-8087 ports is with some SAS expander card, and a SAS HBA. even scraping off eBay that's another $50 per host, and two more components to fail.

there are also actual SATA expanders out there, but they last about 3 months before catastrophic failure in my experience.

[+] johnklos|6 years ago|reply
Big deal. I charge $5 per TB per month and I'm not even trying to be cheap.

The economies of scale should make this much less expensive. Colocating your own machine in a real datacenter and hosting your own data shouldn't still be cheaper than practically all of "the cloud" offerings, but it is. What does that tell you about "the cloud"? It's marketing bullshit.

Sure, it's fine for occasional use, but anyone using the hell out of "the cloud" can easily save money by using anything else.

[+] krick|6 years ago|reply
I don't know anything about the subject, so no idea if these claims are realistic. But whatever, either they deliver or they don't.

My (or their, actually) problem is I don't really get what they are offering right now. There is an impressive landing page with big numbers and pretty pictures which explains pretty much nothing. Project seems to be in production for at least 3 years, there are some apps, but I don't actually see if I can use it to backup/store some data and how much it costs right now. I mean, they say "1TB of files on Sia costs about $1-2 per month" right there on the main page, but it cannot be true, right? It's just what they promise in the hypothetical future, not current price-tag?

The only technical question I'm interested here is why they actually need blockchain? This is always suspicious and I don't remember if I saw any startup at all that actually needs it for things other than hype. It is basically their internal money system to enable actual money exchange between storage providers and their customers, right? So, just a billing system akin to what telecom and ISP companies have? Is it cheaper to implement it on blockchain than by conventional means? How so?

[+] standardUser|6 years ago|reply
On a related topic, I've had a ton of problems finding a cloud storage system that will reliably handle files around 100-200gb. Does anyone have a recommendation for a service that can handle that file size with ease?