top | item 26905948

Filecoin, StorJ and the problem with decentralized storage (2019)

126 points| arthur2e5 | 4 years ago |randomoracle.wordpress.com

114 comments

order
[+] brutaltruth|4 years ago|reply
Not very interesting. There are ways to solve all of these concerns--no fees are paid during the first X months, nodes periodically tested for bandwidth, etc. etc. Just because the author didn't bother to think it through doesn't mean it's not straightforward.

The whole idea of a blockchain is reputation CAN be built up over time, because every transaction is recorded and there might even be incentives to associate nodes to show X nines of pool reliability.

Is it a bad idea to use this as your only source of storage? Of course. Is it less useful than AWS if you do large local burst queries? Of course.

But none of the concerns mentioned in the post are relevant. This can be cheaper than AWS because they charge an enormous markup, and this automatically provides worldwide replication and accessibility.

Amazon already erases books from Kindles, Google scans your Docs for blasphemy against Fauci, and both have been known to throw businesses off their services. In a few years it will seem crazy not to have an extra copy of your data on here.

[+] 300bps|4 years ago|reply
What do you base AWS storage having "an enormous markup" on? AWS has a type of storage to meet any access need / budget. S3 Glacier Deep Dive for example is perfect for storing archival items and costs $0.00099 per GB-month. So you can store a TB for about $1 per month.

There are even options like S3 Intelligent Tiering that enable you to store objects on S3 Standard and it will intelligently move them to lower-cost options based on the usage patterns of the data.

There are six different pricing options of S3 storage. There are EBS volumes, EFS volumes, FSx for Lustre and many other options for storage.

I used to mine BTC, currently mine Ethereum and looked into mining Filecoin and was left wondering if the people buying into it have any idea what cloud storage options are actually out there. There is so much competition and the prices are sooooo low that I don't see any viable use case for Filecoin outside of the one you mentioned about being concerned about being shut down by a single vendor.

[+] gruez|4 years ago|reply
>Amazon already erases books from Kindles, Google scans your Docs for blasphemy against Fauci, and both have been known to throw businesses off their services. In a few years it will seem crazy not to have an extra copy of your data on here.

Isn't this a non-issue because you can encrypt your data prior to uploading?

[+] qyi|4 years ago|reply
For backup scenarios, you can encrypt your data before sending it to Google/Amazon. Filecoin even accommodates (or at least plans to, from what I've read) for nodes that want to store only certain types of content based on legality or whatever.
[+] boogoob|4 years ago|reply
I agree that all the concerns can be mitigated. But they're not just theoretical, they're also issues with the current implementations—some of which remain a year and a half after this was written.
[+] ethn|4 years ago|reply
Not true, they suffer to the Sybil attack. I could create several digital identities to create a fake reputation.
[+] Dig1t|4 years ago|reply
> Google scans your Docs for blasphemy against Fauci

What is this referencing? I'm just curious, this is something I haven't heard about.

[+] unknown|4 years ago|reply

[deleted]

[+] Geee|4 years ago|reply
One important property of decentralized storage is that it prevents data monopolies. For example, if you host your web app on Sia Skynet, then users own their data instead of being locked in by the app developer. It also enables a model where different apps can easily access the same data. The data is in the "cloud", but still owned by the user.
[+] tymekpavel|4 years ago|reply
Do you know why Sia is overlooked so often? I see a lot of shills on forums and the founder seems to tweet bitterly about Filecoin, but folks seem optimistic about the tech. So what am I missing that makes Sia not get adoption vs. Filecoin?
[+] theamk|4 years ago|reply
I think that this paper is more about file storage, not web apps.

And for file storage, there are no data monopolies already. Take for example AWS S3 storage -- there are multiple providers (AWS, Backblaze or local Minio) and many, many clients in all sorts of languages.

[+] wmf|4 years ago|reply
You could implement the same Solid-style model using centralized storage or you could implement lock-in on top of decentralized storage. It's mostly orthogonal.
[+] omginternets|4 years ago|reply
I've noticed the debate around (de)centralization always seems to focus on purely technical issues, or on economic issues (which are technicalities of another sort). In other words, it's always the following two points that are debated:

1. Economies of Scale

2. Network Properties (robustness, scalability, latency, etc.)

It seems to me that a crucial element is missing from these analyses: organizational policy. All other things being equal (which they may not be), it seems like decentralization appeals to those trying to restrict top-down control. Historically, this has been an ideological thing (e.g. bittorrent's copyleft slant), but more recently I've seen mainstream endorsement of concepts such as "flat hierarchy", "distributed workforce", "collaborative work", etc. Interestingly, I don't see this discourse being picked up by decentralization evangelists, and I'm not sure what to make of that.

Off the cuff, it seems like decentralized storage tech has a compelling story to tell about small distributed teams, working in a loose, peer-to-peer organizational structure. In this context, my mind immediately goes to the functional centralization of cloud storage. In general, the Cloud tries to concentrate control over computing resources in a single department (i.e. the devops team). But what happens when a bunch of freelancers collaborate on a project-by-project basis? From what I've seen, either (1) they use cloud providers, but the account is under the client's name or (2) they use some kind of SaaS solution with support for sharing or other forms of collaboration. With decentralized storage, it seems like a "bring your own cloud" approach is possible in principle, where these freelancers would pool resources (storage, compute, etc.). Are there any pain-points here? Is there perhaps a business problem that needs solving?

The question for me is whether these new patterns of collaboration are really becoming a thing, or whether it's a fad amongst business-school graduates.

At any rate, this comment is an invitation for y'all to opine on the subject. I feel like a good analysis of organizational policy could help us decide whether a decentralized web is economically viable, but I'm kinda stumped.

[+] ChainOfFools|4 years ago|reply
a classic essay discussing the real ground truth problems of decentralized anything is by Jo Freeman, called the tyranny of structurelessness [0].

though most of the punches land toward the end, it is a fascinating and short read, in which she dissects, with the sober disappointment of a former optimistic evangelist, exactly why structureless ("decentralized" before that term became vogue) movements are never what they claim to be, and indeed cannot be.

why? because structureless coordination of groups above a handful members doesn't scale beyond the achievement of the simplest kinds of goals, such as consciousness-raising (or, in crypto-promoter terms, spreading "adoption").

any group objective requiring specialization of labor, coordination of resources and responsibility inevitably succumbs to the insidious creep of informal, but officially disavaowed structural networks of insiders.

it is especially pernicious for newcomers, who have been heavily messaged about "decentralization" or "structurelessness" and engage themselves with the groups identity without any awareness of these cryptic back channel networks that actually govern its operation. only after they are committed do they slowly find out there really is a hierarchy and a command structure.

i highly recommend a quick read for anyone considering the merits of a given groups' claims of being decentralized.

[0] https://www.jofreeman.com/joreen/tyranny.htm

[+] omginternets|4 years ago|reply
Addendum:

I think I could get excited about FileCoin if it allowed me to pool storage with friends and colleagues. I've been wanting to have a distributed hacker-garage for a while [0].

Importantly, I want it to be independent of any cloud provider because (1) it's hard to freely experiment/prototype when you have to keep track of billing and (2) functional centralization makes the AWS/GCE admin de facto responsible for everything running under the account. This is great for large enterprises that want to virtualize their IT department, but it really goes against the grain for hackers, freelancers, and -- IMHO -- startup founders.

Returning to Filecoin, I feel like all the *coin ventures make the same fundamental mistake of over-estimating how much we care about money. I really don't care about pimping out my hard-drive for a couple of cents. I'm motivated by gains on the order of several hundred to a couple of thousand dollars. The counter-argument I usually hear is that there will be more interest in developing nations, but then I don't want to store my data in e.g. Pakistan. I want data for pet projects in my own hacker-garage, and I want production data close to the end-user.

/rant

[0] Incidentally, I'm working on a project to do exactly this. It's a way of clustering a set of computers over the internet to produce a virtual cloud. It would be super cool to use FileCoin as a block storage layer in this context, but restricted to the hardware we own.

[+] yosito|4 years ago|reply
As someone who just set up Storj to backup my Nextcloud installation, I am very interested in this topic. Storj was a cheap alternative to AWS S3, and, at least out of the box, zero-knowledge encryption was easier, though I know there are ways to achieve zero knowledge encryption with any S3 provider. I hadn't considered Glacier, but I may look into it.

Edit: One thing I liked about Storj was that geographic redundancy is built in. I know Amazon has that too, but if three data centers caught fire on the same day, Amazon could lose your data. Super unlikely with Amazon, but virtually impossible with Storj, and most S3 providers have less geographic redundancy than Amazon.

> knowing your data is there does not mean that you can get it back when needed. This is the subject of the next blog post.

This is something I'd love to hear more about.

[+] nemo1618|4 years ago|reply
Presumably the argument is that a storage host might dutifully provide proofs of storage, but refuse to actually transfer your data to you later. This is true, but it's also true of centralized storage providers. The difference is that traditional providers are typically bound by SLAs -- something that a blockchain can't really offer.

Instead, decentralized storage gives you the ability to store your data with dozens of independent entities; as long as the behavior of these entities is sufficiently uncorrelated, then it is highly likely that you will always be able to retrieve your data from some subset of them.

[+] GTP|4 years ago|reply
The next blog post it's already there, have a look. It's basically about the incentives that could make in certain scenarios more profitable to never give back data to the one who purchased the storage.
[+] thebean11|4 years ago|reply
I looked into glacier pricing a while back it's..tricky. It looks super cheap up front, but gets extremely expensive if you ever need to get the data out of glacier.
[+] liamsargent|4 years ago|reply
Why is Sia overlooked every single time this topic is brought up? It is shocking to me that Filecoin, an example of an ICO darling with no working product, is always at the top of the list. I have used Sia and Skynet to build several useful decentralized media sharing applications with zero loss of data for over a year.
[+] ranguna|4 years ago|reply
Sia currently costs 8$ TB/month, storj costs a fixed forever* 4$ TB/month.

* not counting usd inflation, because they don't lock storage price to coin value (with crazy volatility), they fix it against usd and covert it to their coin when paying their nodes.

[+] twostorytower|4 years ago|reply
Sia has been operating at a large scale for years and has a running decentralized CDN service (Skynet).
[+] de6u99er|4 years ago|reply
Just for everybody to be in the same page.

Blockchain is not used for distributed storage itself. Existing concepts like IPFS are used for this. Blockchain is used to confirm that someone is storing certain data and to verify it's integrity through an additional protocol.

From ipfs.io about Blockchain

>With IPFS, you can address large amounts of data and put immutable, permanent links in transactions — timestamping and securing content without having to put the data itself on-chain.

Please correct me if I am wrong.

[+] omginternets|4 years ago|reply
You are corrent; I think the article is just clumsily worded, though. I think what he's saying is that IPFS gives you references to immutable data, which you are then free to commit to your favorite blockchain.
[+] qyi|4 years ago|reply
Interesting article. While reading about Filecoin the first time, I had to stop every paragraph and think "but what if the node just doesn't give you your data back when you finally ask for it?". And as pointed out by part 2 of the article, not only can they choose not to give it back, but they are incentivized not to.

edit: Wouldn't erasure codes just completely fix it (as described by the Storj guy)?

[+] hanklazard|4 years ago|reply
As someone who's been interested in filecoin mining for a while, my main question is around legal liability for miners. For instance, if I am storing other people's data and it turns out someone was storing something illegal (unbeknownst to me), could I be held liable under US law?
[+] qyi|4 years ago|reply
Why do you think this is different than hosting a normal website with file uploads? Literally millions of people have made such websites (especially around 2000-2010).
[+] fwip|4 years ago|reply
IPFS is mostly fine technology for storing data in a content-addressable way.

FileCoin is, at best, an inefficient way to incentivize people to store your data in an imaginary trustless world. (At worst, it's an obvious scam.)

You can imagine any hosting provider offering regular IPFS hosting. Sign up on our website, send us a list of data to store, pay us money, and we'll guarantee you can get it back or you can sue us - same as the guarantees by any conventional hosting provider. This works fine and well without gluing Filecoin to it.

[+] jfrunyon|4 years ago|reply
> Cloud services are ruled by a ruthless economy of scales. This is where Amazon, Google, MSFT and a host of other cloud providers shine, reaping the benefits of investment in data-centers and petabytes of storage capacity.

I dunno about that. Amazon and Google, at least, charge significantly more than I could go out and buy a few drives for on sale. (AND, of course, they charge it perpetually. Dunno how StorJ works.)

[+] SavantIdiot|4 years ago|reply
Economies of scale in the cloud have one thing going for them: massive reliability.

As someone responsible for my company's data, I cannot make a sound argument to convince management (or myself) to use anything but Amazon S3 (or Google or Microsoft) cloud. The data is simply too important to trust to a smaller entity.

Maybe coin storage can boost Storj's reputation. And I certainly favor decentralization for Crypto.

[+] jtolds|4 years ago|reply
Hi! CTO of Storj here. This article makes some reasonable points, and it makes some unreasonable ones.

First, on the reasonable: absolutely, privacy and security should be layered on top of whatever storage platform you use using your own encryption. No objection! That's the right approach. It is always better to bring your own encryption, and having it layered so your storage provider can't have access is great. I have much love for @cperciva's Tarsnap. It's awesome.

The downside with a product that punts encryption to the user is that users /typically don't do this/. In fact, users like to share things and serve content to others. Bringing your own encryption that the sharing infrastructure doesn't understand means it is much harder to actually use the product for sharing. This is why Storj includes hierarchical encryption for delegated sharing. It's not that embedding encryption into the framework is better encryption by any means, but it is a better default. Users are by default protected by their own keys that we don't have access to, which is a better default than the cloud. Should people use something else also? Sure! If it suits them.

On the unreasonable:

The author says:

> [I]t is very unlikely that a decentralized storage market can offer an alternative that can compete against centralized providers— AWS, Google, Azure— when measured on any of these [cost, speed, reliability] dimensions.

This is a testable assumption, and the author didn't test it (post is from December of 2019, we were in late beta then, working very well).

Storj is cheaper ($4/TB/mo, was $10/TB/mo in December of 2019), provides 11 9s of durability (no SLA offered until March 2020, but we had and still have never lost a single object out of billions), and is equivalent in speed to providers like Backblaze (and getting faster). It's also multiregion by default (obviously). This is a screaming deal on all dimensions. Try it out for yourself. Our metrics are delivering on all three dimensions.

This author is picking and choosing between architectures of Filecoin and Storj, which is fine, but each should be evaluated in isolation. Filecoin and Storj make very different design decisions with very different payoffs. The author argues Storj can't deliver on its promises with appeals to Filecoin's architecture.

Filecoin absolutely uses blockchains and proof of storage and whatever else. In fact, Filecoin does fall victim to requiring lots of resources (see https://docs.filecoin.io/mine/hardware-requirements/).

But this is just true of Filecoin, this isn't a problem with decentralized storage in general, and Storj is an excellent counter example. Storj does not require lots of resources (and in fact many of our node operators use Raspberry Pis). Storj is not a blockchain, uses statistical audits that are low-effort and low-CPU for storage node operators, and works great for idle capacity.

If you're interested in how we can achieve these things, you might take a look at an older blog post I wrote explaining (without naming) why Filecoin's architecture is fundamentally flawed: https://www.storj.io/blog/replication-is-bad-for-decentraliz...

Storj and Filecoin are both decentralized storage products, but they are really fundamentally very different, and their differences are worth understanding.

[+] Trias11|4 years ago|reply
Leaving aside imperfections of each, for user with extra bandwidth and storage, which one offers better path for monetization?
[+] ruggeri|4 years ago|reply
The article raises a question I've wondered about before: how do you know that anyone will transfer you your data when you need it? You can require people to prove that they stored it, but how can you require people to transfer it out to you?

You presumably agree with the node to pay for the transfer out, but if both counterparties are anonymous and do not trust each other, how is this accomplished?

[+] boogoob|4 years ago|reply
One general solution is to have persistent nodes in the network that develop reputation over a duration of time, who perform operations that can't be done entirely trustlessly. In Storj's case these are called Satellites.

> Whenever a Satellite on the Storj network has a less than stellar payment, demand generation, or performance history, there is a strong incentive for the storage nodes to avoid accepting its data. When a new Satellite joins the network, the participating storage nodes will commence their own vetting process. This process limits their exposure to the new and unknown Satellite, while building trust over time to highlight which of the Satellites have the best payment record.

They also incentivize complete file delivery by paying for chunks of bandwidth as they're used to deliver the file, rather than all up front.

https://www.storj.io/storj.pdf

[+] wmf|4 years ago|reply
The answer is incentives: if you're paying to retrieve data, why shouldn't they take your business?
[+] wallacoloo|4 years ago|reply
> Another option is requiring service providers to put up a surety bond. Before they are allow to participate in the ecosystem, they must set aside a lump sum on the blockchain held in escrow. These funds would be used to compensate any customers harmed by failure to honor storage contracts. But this has the effect of creating additional barriers to entry and locking away capital in a very unproductive way.

If bonds solve the bad actor problem, then this use of capital would directly enable safe data storage. I guess it depends on your definition of “productive”, but that definitely feels like a good use of capital to me.

[+] msgilligan|4 years ago|reply
Article looks like it is from December 2019. Shouldn't (2019) be in the title?
[+] iR5ugfXGAE|4 years ago|reply
Is there any sensible study on those system's energy consumption?
[+] rmtech|4 years ago|reply
Since this was written, Filecoin is up about 10,000% or something stupid.

If you pay attention to the naysayers, you're gonna have fun staying poor.

[+] kfprt|4 years ago|reply
A better solution would be payment based on a minimum GB and bandwidth where the data is constantly being streamed back over the network with a payout period of ~5min. This way you can guarantee that you will get all your data back at least once a week. You can also guarantee via network latency that your data is not being collocated. Sounds crazy but it would be more energy efficient than bitcoin.