Introducing Cloudflare’s IPFS Gateway

[+] jgrahamc|7 years ago|reply

This is my current favourite use case (from the blog https://blog.cloudflare.com/e2e-integrity/)

I wanted to provide an example of the kinds of secure, performant applications that are possible with IPFS, and this made building a search engine seem like a prime candidate. Rather than steal Protocol Labs' idea of 'Wikipedia on IPFS', we decided to take the Kiwix archives of all the different StackExchange websites and build a distributed search engine on top of that. You can play with the finished product here: https://ipfs-sec.stackexchange.cloudflare-ipfs.com

[+] nathcd|7 years ago|reply

Neat! Looks like this generally works by building a static index and publishing it to IPFS [1], then having some client-side JS do a lookup in the index, and getting a metadata file for each result [2]?

I made something kinda similar a while ago [3], where I gathered a bunch of metadata about files on IPFS at the time by scraping a Searx instance, with a query like "site:ipfs.io/ipfs" or something like that. It was a quick hack job but was fun, and it's cool to see something similar on a bigger scale!

[1] like https://ipfs-sec.stackexchange.cloudflare-ipfs.com/ai/_index...

[2] like https://ipfs-sec.stackexchange.cloudflare-ipfs.com/ai/_index...

[3] https://ipfs.io/ipfs/QmYo5ZWqNW4ib1Ck4zdm6EKteX3zZWw1j4CVfKt..., https://github.com/doesntgolf/ipfs-search, https://github.com/doesntgolf/ipfs-searx-scraper

[+] aw3c2|7 years ago|reply

Considering I want to use this as it is meant to be used, distributed, how do I pin all of say https://ipfs-sec.stackexchange.cloudflare-ipfs.com/ham/ (including the search) to my local IPFS node so I can use it offline?

[+] nikkwong|7 years ago|reply

How does IFPS index new content that's added to the stackexchange network daily? And/or how long does it take to add new data?

[+] wumms|7 years ago|reply

Stack Exchange for Kiwix, yes! I can't find Stack Overflow though - has anybody tried scraping it with https://github.com/openzim/sotoki ?

[+] styfle|7 years ago|reply

Here is one really important part that I almost missed when reading this announcement:

> Using Cloudflare's gateway, you can also build a website that’s hosted entirely on IPFS, but still available to your users at a custom domain name.

This is huge step forward for the distributed web!

[+] fiatjaf|7 years ago|reply

The killer app for IPFS in this early days should be people who are paying $10/mo to host their half-dozen HTML files on WhateverHost companies everywhere. Bonus points if they are hosting heavy files, like podcast files, for example.

Is there a Soundcloud clone on IPFS anywhere? (One that doesn't provide you with cloud storage, only the public index)

[+] koalalorenzo|7 years ago|reply

You can use any gateway to do that :)

[+] str33t_punk|7 years ago|reply

Can you?

You will still need to run a node to host your own content on IPFS. Unless you just simply plan on using data that already exists in the wild

[+] robotmay|7 years ago|reply

Any plans to support Dat in the future too? I've had a bunch of distributed projects in mind recently, and I've still not decided which platform to start experimenting with properly, but Dat is definitely one of the other interesting ones alongside IPFS.

[+] bren2013|7 years ago|reply

No plans currently. I'm not really familiar with Dat. Can it do something that IPFS can't?

[+] orliesaurus|7 years ago|reply

Total noob questions, would love some answers if anyone has them.

Here it says:

"IPFS is a peer-to-peer file system composed of thousands of computers around the world, each of which stores files on behalf of the network. These files can be anything: cat pictures, 3D models, or even entire websites"

- If I run an IPFS node, will I be hosting other people's data? What if these files are illegal? How can my "computer" (I guess a node is the terminology but whatever) be sure I won't be hosting some bad stuff?

Then it says:

"The content with a hash of QmXnnyufdzAWL5CqZ2RnSNgPbvCc1ALT73s6epPrRnZ1Xy could be stored on dozens of nodes, so if one node that was caching that content goes down, the network will just look for the content on another node."

- How long does it take for 1 specific file to be "re-distributed" on another node? and how many times is a file rehosted/distributed on a different node? It would be stupid to have the same file hosted 1.000.000 times - what's the break off point? Is there like a healthcheck so to speak which ensures that a certain file is in danger of disappearing from the network and therefore is automatically pushed for re-distribution

And ultimately, how can I know for sure that CloudFlare won't play the game where as acting as a proxy they will modify some of the files served? Imagine I want to retrieve cat.gif and cloudflare intercepts my request and serves me cat1.gif - I guess it all boils down to trusting them? But hold up, isn't p2p file system like this all about trusting the network and not 1 server?

[+] diggan|7 years ago|reply

The IPFS network works in a pull-model, not push. So when nodes add content, they only add it locally. It's not until some other nodes request it, as it actually transfers.

So if you start a IPFS node locally, it won't get content pushed to it (well, some DHT data which is basically routing information, but not the content itself).

> ultimately, how can I know for sure that CloudFlare won't play the game where as acting as a proxy they will modify some of the files served?

You'll have to add some checks on your end for this as HTTP/browsers cannot provide these guarantees currently (maybe in the future?). So, download file A and add it to IPFS from your end. If you end up with the same hash, the content was not modified.

> isn't p2p file system like this all about trusting the network and not 1 server

Yes, IPFS <> IPFS is trustless as this verification happens automatically and the clients can verify content locally when it gets fetched. IPFS <> HTTP is harder as there is no functionality for doing this verification.

[+] furi|7 years ago|reply

> And ultimately, how can I know for sure that CloudFlare won't play the game where as acting as a proxy they will modify some of the files served? Imagine I want to retrieve cat.gif and cloudflare intercepts my request and serves me cat1.gif - I guess it all boils down to trusting them?

The hash serves that purpose, the file is identified by the hash for the purposes of retrieval and coming up with another file with the same hash (especially one that's usefully different, like changing the overall message in somebody's blog post) is computationally infeasible. Now you might have a problem there in that in a web only implementation like this the code that checks the hash is probably also provided by Cloudflare, but the functionality is there, you can download the file, compute the hash and compare it against the one you retrieved it from.

> But hold up, isn't p2p file system like this all about trusting the network and not 1 server?

Ideally yes but 1. it's very difficult to get more than a handful of users to use anything that doesn't run in a web browser, 2. IPFS' implementation for Windows is effectively unusable (it's a command line interface like the Linux one, which is a non-starter on Windows), 3. attempts to built a complete IPFS node in the browser are currently incomplete, I believe because DHT discovery isn't possible in the browser presently. These gateways should probably be viewed as a stop-gap solution until such a time as we can have full IPFS nodes in the browser.

[+] bren2013|7 years ago|reply

> And ultimately, how can I know for sure that CloudFlare won't play the game where as acting as a proxy they will modify some of the files served?

https://blog.cloudflare.com/e2e-integrity/

[+] acover|7 years ago|reply

It's like a torrent. You don't host data that you haven't accessed. For content to stay hosted on your computer forever you have to explicitly pin it.

The number of people hosting it is the number of people accessing it.

CloudFlare can't modify the files. On IPFS you request files by a hash. If they give you a file with a different hash you know they gave you the wrong file.

[+] oscargrouch|7 years ago|reply

> - If I run an IPFS node, will I be hosting other people's data? What if these files are illegal? How can my "computer" (I guess a node is the terminology but whatever) be sure I won't be hosting some bad stuff?

Thats some of the reasons why i prefer the 'Dat' and torrent(be it classic or DHT) model of distribution per content, instead of IPFS which is per block.

The IPFS model make a lot of sense for cloud storage.. You might create a whole CDN based on its model where the clients dont even need to care about your backend storage layer.

But in the per-content p2p distribution model, we have the 'page-rank' effect, where popular or important stuff, curated by human reasoning of whats popular and important.

This model has worked pretty well with torrent, and public interest should drive the p2p storage model, where by using blocks, you lose some of the control and the context. And to see how this is a problem, the IPFS folks created the Filecoin to deal with the problem of the incentives, which on the 'per-content' model of Dat and Torrent, its a non-issue because of the natural page ranking indexing of content drive by public interest.

If something doesnt matter much, it will go away with time, just like our human brain works.

(Of course, if IPFS add some form of indexing of content by public interest on top of the block system, and even let you control what you serve. That problem would be solved)

[+] kpcyrd|7 years ago|reply

I really didn't expect this to happen, congrats to the ipfs team! A demo link over here:

https://cloudflare-ipfs.com/ipfs/QmS4ustL54uo8FzR9455qaxZwuM...

[+] joshfraser|7 years ago|reply

This is awesome. We've been running an IPFS gateway at Origin for about a year and are huge fans of the technology. It's nice to see Cloudflare investing in decentralized technology despite contributing to the over-centralization of the web with their core business. There are quite a few public IPFS gateways available, but not all of them are broadly advertised. A few others that I know of:

ipfs.io

ipfs.infura.io

siderus.io

ipfs.jes.xxx

gateway.originprotocol.com

[+] burdakovd|7 years ago|reply

Just want to add an alternative gateway that I've been working on recently: dapps.earth

It is somewhat different in the sense that it redirects from `gateway/ipfs/hash` to `hash.ipfs.gateway` to ensure different ipfs websites are hosted at different origins (Cloudflare also talked about origin issues in their announcement).

Now with what Cloudflare release, they do seem to solve some of the similar problems, but they still ask users to register the domains by themselves (or access in unsafe way with `/ipfs/hash` urls), which means those domains become a centralization point, and also most of the content is served from the same domain.

It would be fantastic if they allowed access in form `hash.cloudflare-ipfs.com` or `ens-name.cloudflare-ipfs.com`.

In the meantime I may just abandon the idea of serving ipfs content myself, and instead can just operate a small DNS-server that will return CNAME record pointing to Cloudflare and TXT record pointing to ipfs hash for each subdomain.

[+] diggan|7 years ago|reply

I just went through our "Public IPFS Gateway Checker" and added a bunch of new ones (include the Cloudflare) one. You can see it running here: https://ipfs.github.io/public-gateway-checker/

And of course, the source code: https://github.com/ipfs/public-gateway-checker/

[+] aclelland|7 years ago|reply

This looks great and will hopefully help IPFS grow!

I do wonder if there are any limits which CF are going to impose on these files? A few years ago there was an image hosting site [1] who was told by CF that they were using too much bandwidth on the free plan. With this gateway, can't anyone start doing exactly the same and not even have to be a CF customer?

[1] https://news.ycombinator.com/item?id=12825719

[+] zapita|7 years ago|reply

Is Cloudflare doing this in partnership with Protocol Labs, the company backing IPFS? Or are they doing it on their own?

I remember that the way Cloudflare communicated around Nginx rubbed a lot of people the wrong way, because they aggressively positioned themselves as "the de-facto Nginx company", to the point where they would sometimes announce a new Nginx feature on their blog before the Nginx developers themselves had a chance to announce it... And of course nobody was more pissed off than Nginx.com, the actual official sponsor of Nginx.

I'm wondering if they will behave the same way with Protocol Labs and the existing IPFS community? I sure hope not, I like Cloudflare and love IPFS, and would like to see everyone collaboratig in good spirits, and not competing for cheap marketing points.

[+] headcrack|7 years ago|reply

...and the devil introduces an own church. IPFS is about decentralisation! And Cloudflare is the devil of centralization!

[+] dividuum|7 years ago|reply

You can easily replace their domain with any other IPFS gateway. On top of that they add further validation to the idea that IPFS is a good idea. Seems like an overall win to me.

[+] dboreham|7 years ago|reply

If you analyze any decentralized system long enough you'll discover that it can't be used in practice by regular end-users without some element of centralization.

(Dboreham's Law)

[+] ilaksh|7 years ago|reply

I agree. I think the fact that we need a gateway is a big problem. And relying on a big company like Cloudflare is a bigger one.

I think that overall this might help for adoption as long as people don't start using Cloudflare URLs. But ultimately we need IPFS somehow integrated into Firefox and Chromium I believe. Or some other seamless integration software.

[+] auslander|7 years ago|reply

... and with access to plaintext, SSL stripped data of all traffic

[+] ClassAndBurn|7 years ago|reply

Cloudflare is great. I love that they spend resources on projects like this. This would normally be a Moonshot or someone's side projects that would get limited/no release.

[+] souterrain|7 years ago|reply

This is cool from an implementation standpoint.

However, doesn't this defeat the purpose of using IPFS? Using Cloudflare to front-end content stored within IPFS makes Cloudflare the choke point for all traffic, effectively re-centralizing the distributed content.

[+] jgrahamc|7 years ago|reply

It doesn't change IPFS fundamentally, it just provides a simple way to get to IPFS content and a place where it gets cached. We hope it help legitimize IPFS.

[+] detaro|7 years ago|reply

You'll need a "centralized" version of a web site for those that don't use IPFS today. That's what a gateway provides. Clients using the IPFS protocol will ignore them and fetch the distributed version.

[+] patrickg_zill|7 years ago|reply

Yes and cloudflare has already demonstrated both their ability and willingness to de-platform.

See https://news.ycombinator.com/item?id=15031922

[+] komali2|7 years ago|reply

Somehow, this is the first time I've heard of IPFS. Seems really awesome! From a total novice standpoint, can someone help me understand:

>With IPFS, every single block of data stored in the system is addressed by a cryptographic hash of its contents, i.e., a long string of letters and numbers that is unique to that block. When you want a piece of data in IPFS, you request it by its hash. So rather than asking the network “get me the content stored at 93.184.216.34,” you ask “get me the content that has a hash value of QmXnnyufdzAWL5CqZ2RnSNgPbvCc1ALT73s6epPrRnZ1Xy.”

It seems like you must know the "URL" of the "website" you want to visit (files in IPFS) beforehand? But in the case of IPFS, there's no like, DNS service, so you can't type "www.google.com." Basically, it'd be like if to navigate the modern internet, you'd need to know the IP address of whatever site you visit? Is that true of IPFS? Is there any way around that?

It seems like a strong limitation, unless someone can make some sort of IPFS search engine that happens to hash out at QN000000000000000000000 or some really memorizable hash... which seems extremely unlikely!

[+] Communitivity|7 years ago|reply

IPFS has a name resolution protocol, called IPNS. Conceptually it works similar to DNS. More advanced discovery protocols are being worked. The thing with IPFS though is that, because the raw identifier (not address) is the hash of the content, multiple nodes can serve the same content.

A key idea to to focus on de-centralization, so for example when you want a particular piece of content you could send a query out asking your known nodes for content with that hash, they then ask others, etc., propagating the request like gossip. Caching can make this more efficient. IPNS allows you to register your node as a provider of named content that has a given name. The biggest benefits of this are (1) that you can update the content (giving it a new hash) and people can still find it by name, and (2) most mere mortals can't remember hashes but names are much easier to remember.

A good introduction to IPFS can be found on HackerNoon, at https://hackernoon.com/a-beginners-guide-to-ipfs-20673fedd3f.

A good library to start with is Libp2p, https://github.com/libp2p

[+] brtknr|7 years ago|reply

Does anyone know the answer to whether large content that is widely available becomes quicker to download similar to peers in BitTorrent?

[+] r1ch|7 years ago|reply

I'm a bit worried about the abuse potential here. It seems like a great way to distribute movies, warez, etc since not only is Cloudflare paying for the bandwidth but they're also caching the content so you won't be bottlenecked by IPFS itself.

[+] ezoe|7 years ago|reply

After quickly glancing the IPFS document, I can't find the superiority over... say, BitTorrent with DHT and Magnet link(no tracker server or torrent file required.)

Save for data deduplication and IPNS maybe.

And isn't it rather contradictory? A gateway for P2P-based filesystem.

[+] blacksoil|7 years ago|reply

The part I'm still confused is: how does the cache being updated? For example in the traditional web architecture if I go to www.example.com/index.htm, the host server of example.com tells the hash of index.htm and depending on that my web browser decides to use its cache or do a fresh request.

How would this work in IPFS? How about dynamic pages? Does that mean my browser still has to contact www.example.com to get the latest version's hash but then has the option to request the file from IPFS instead of www.example.com? What about if example.com goes down?

[+] dark_glass|7 years ago|reply

This seems very similar to Freenet. The only difference I see at a glance is that Cloudflare is running gateways so that everyone does not have to run a node.

[+] kpcyrd|7 years ago|reply

On freenet other people are sending you files to store, on ipfs you only store your own files, files pinned (basically seeding) and files from your cache. Freenet is built for anonymity, while ipfs is built for performance.

[+] yoava|7 years ago|reply

Being from wix and managing petabytes of media files, we always wondered if a distributed web solution for websites can actually work.

Even with the best of clouds (more then one of the big names) we faced downtimes caused by multiple cloud failures.

Just imagine, a store that is Geo distributed, resilient and just works.

This is super exciting for the web and for customers of websites!!!

Big thumb up.

[+] dgreensp|7 years ago|reply

Is IPFS really a store, let alone a store with any particular performance or availability properties? If I want to be sure a file is available, don’t I have to host it or pay someone to host it? And if I want to make sure it is geo distributed, I have to geo distribute it or pay a service to? What is the big change?

[+] frio|7 years ago|reply

This is excellent. I've currently got a hacked together static page for my personal site, which is fronted by CloudFlare, but forwards the actual request onto ipfs.io (which, in turn, serve the files -- ultimately -- from where I've pinned them on eternum). This will let me take a step out of that chain.

[+] antpls|7 years ago|reply

That's a nice initiative.

BitTorrent files are also content addressable, and the DHT allows for routing to several provider nodes. Only files that are downloaded are shared too.

What's the plus-value of IPFS over BitTorrent in this scenario?

157 comments