Wikipedia’s Switch to HTTPS Has Successfully Fought Government Censorship

[+] shpx|8 years ago|reply

It won't last, at least for China. Their government is working on a clone of wiki, scheduled for 2018[0]. Once that's done they'll likely completely ban the original.

Wikipedia publishes database dumps every couple of days[1]. So it shouldn't be that expensive for smaller governments to create and host their own censored mirror. You'd maintain a list of banned and censored articles, then pull from wikipedia once a month. You'd have to check new articles by hand (maybe even all edits), but a lot of that should be easily automated, and if you only care about wikipedia in your native tongue (and it's not english) that's much less work.

The academics will bypass censorship anyway, since it's so easy[2], so an autocrat won't worry about intellectually crippling their country by banning wikipedia. Maybe they don't do this because the list of banned articles would be trivial to get.

Better machine translation might solve this by helping information flow freely[3]. We have until 2018 I guess.

[0] https://news.vice.com/story/china-is-recruiting-20000-people...

[1] https://dumps.wikimedia.org/backup-index.html

[2] https://www.wired.co.uk/article/china-great-firewall-censors...

[3] https://blogs.wsj.com/chinarealtime/2015/12/17/anti-wikipedi...

[+] Markoff|8 years ago|reply

wut? everyone in China already use Baike instead of Wikipedia, nobody really understand why they are making another website

[+] Yizahi|8 years ago|reply

Russia will follow soon. They already heavily editing Russian Wiki for "inconvenient" information.

[+] wfunction|8 years ago|reply

Why hasn't it been done yet? It's not like Wikipedia is a new thing.

[+] wodenokoto|8 years ago|reply

I thought China already blocked https, so switching to https only would effectively ban/block wikipedia.

[+] awinter-py|8 years ago|reply

Can an expert comment on side-channel attacks on HTTPS and whether they're less viable on HTTP/2?

My assumption is that because wikipedia has a known plaintext and a known link graph it's plausible to identify pages with some accuracy and either block them or monitor who's reading what.

I also assume that the traffic profile of editing looks different from viewing.

[+] chimeracoder|8 years ago|reply

> My assumption is that because wikipedia has a known plaintext and a known link graph it's plausible to identify pages with some accuracy

At least in theory, the latest versions of TLS should not be vulnerable to a known plaintext attack. TLS also is capable of length-padding, which would reduce the attack surface here as well for an eavesdropper.

My understanding is that HTTP/2 makes it even more difficult to construct an attack on this basis, because HTTP/2 means multiple requests can get rolled into one.

Of course, all this is assuming an eavesdropper without the ability to intercept and modify traffic. In practice, governments will probably just MITM the connection - we have precedent for governments abusing CAs like this in the past - and unless Wikipedia uses HPKP and we trust the initial connection and we trust that the HPKP reporting endpoint isn't blocked, then it's still possible to censor pages, without anybody else knowing[0].

[0] ie, the government censors will know, and the person who attempted to access the page will know, but neither Wikipedia nor the browser vendor would be able to detect the censorship automatically.

[+] cyphar|8 years ago|reply

And one thing to note is that people generally don't randomly pad the length of articles, so it's not _very_ difficult to figure out what articles you might be reading -- even over TLS.

[+] shif|8 years ago|reply

The government could force pc manufacturers to deploy a root CA that they control and then do a MITM proxy to read everything the user is doing, they could also redirect wikipedia domain to another domain that just acts as a reverse proxy and deploy a legit cert on that other site

[+] petre|8 years ago|reply

There was an IPFS clone of wikipedia after Turkey blocked it.

http://observer.com/2017/05/turkey-wikipedia-ipfs/

[+] darkhorn|8 years ago|reply

There were few censored pages on the Turkish Wikipedia when it was on HTTP. They were the "vagina" article and election prediction article. Only those pages were censored.

Last month there were some articles on the English Wikipedia about ISIS-Erdoğan (I don't care true or not). Then they have blocked all Wikipedia (all languages). Because they were unable to block those individual pages.

[+] thr0w__4w4y|8 years ago|reply

Yup. Was there 2 weeks ago working with a group of Turkish engineers - I went online to get some technical information about a particular stream cipher, and WHOOPS! - Wikipedia is blocked, completely.

Fired up my VPN, accessed the page, thank you very much.

"The Net interprets censorship as damage and routes around it." - John Gilmore

[+] rocky1138|8 years ago|reply

How do governments censor only parts of Wikipedia when the site is encrypted? How do they know which pages you are browsing if they can't see the URL?

[+] zeta0134|8 years ago|reply

That's just it; they can't! When you visit Wikipedia over HTTPS, the only thing actually visible in plain text is wikipedia.org, and that's only if your browser is using Server Name Identification (SNI).

Since the rest of the request, including the URL is hidden, governments and other malicious agents between you and the server cannot actually see what pages you're requesting directly. They can only see that you are accessing wikipedia.org and transmitting some data. You may still be somewhat vulnerable to timing attacks to try to identify what pages you're viewing, but censorship can't happen at the page level over HTTPS; you have to block the whole thing in one go.

[+] rgbrenner|8 years ago|reply

according to the paper.. the answer is subdomains.. For example, in one instance China blocked zh.wikipedia.org (the entire subdomain.. they cant see what page you're visiting), but left their other 291 subdomains unblocked.

[+] varenc|8 years ago|reply

Governments can't censor parts of Wikipedia when it's all encrypted, that's sort of the point of the article.

[+] unknown|8 years ago|reply

[deleted]

[+] azernik|8 years ago|reply

That's exactly the point of the article?

[+] ekarulf|8 years ago|reply

Who says they can't see the URL? A sufficiently motivate government would probably be able to create forged certificates and mass interception isn't really out of the question. Especially with browsers homogenizing on fast ciphers AES-GCM/POLY-1305, I bet it's much more economical than you would think.

Cert Pinning or HPKP is one type of solution, but it's tricky to get right especially for a large site like wikipedia.

[+] gwern|8 years ago|reply

After reading through the whole paper, I would have to say that there is far less censorship of WP, HTTPS or HTTP, than I guessed.

[+] enzolovesbacon|8 years ago|reply

  Critics of this plan argued that this move would just result in more 
  total censorship of Wikipedia and that access to some information 
  was better than no information at all

I'm no critic of this plan but I still don't understand why this wouldn't result in more total censorship. Someone explain please?

[+] dTal|8 years ago|reply

Because Wikipedia is too useful. Note that it required a certain self-confidence that this was the case for Wikipedia to implement this strategy. And it's self-fulfilling - if Wikipedia allowed itself to be censored, then it would have fewer contributors and its usefulness would suffer.

There's a rather interesting analogy to be made with the GPL here. Critics argue that companies shy away from it because they cannot control it. Yet its entire goal is to not be controlled, and it draws its strength from the conviction that the body of GPL software is too useful to ignore. And again, that's self-fulfilling.

It takes courage, but it's important to know when you have the power to say "all of me, or none of me".

[+] samcheng|8 years ago|reply

If a censor can't tell which specific parts of wikipedia someone is trying to access, then they will be more likely to simply block the entire site.

HTTPS encrypts the URL and the content, but does not mask the DNS lookup nor the server being connected to.

[+] shusson|8 years ago|reply

TIL: HTTPS encrypts the URL.

[+] blhack|8 years ago|reply

I think it's a fun/educational process to interact with some daemons over telnet. You can telnet into port 80 and create an HTTP request, for instance.

Certification negotiation happens before the GET request happens, which means that the "URL" (or, rather, everything after the domain) is encrypted.

You can also see some of this process with curl. So:

     curl -vvv https://www.google.com/

[+] Matt3o12_|8 years ago|reply

For all those wo are not aware what HTTPS encrypts:

HTTPs encrypts basically the whole protocol, this includes your request (the URL, your fingerprint -- e.g. browser, plugins installed, preferred languages) and the response (the content, type of the response (text, video, audio file), and some other not some important things).

What HTTPs does not encrypt is the domain and ip. The domain is leaked through DNS. DNSSec will not help either because it will not encrypt the DNS request. It rather signs it so that you can be sure it is authentic (not tempered with) but everyone can read it. This includes the wifi hotspot you use, your ISP, your government and anyone who tampers with the wires (theoretically even your neighbor and nearby people if you use mobile data since the connection from your device to your ISP is not really strong[1]).

Even if you would encrypt the DNS traffic (or you use just use the host's ip directly), the person who intercepts your traffic could just build a database with IP addresses that correspond to DNS entries (or do a reverse lookup, however, not every IP address has a reverse lookup configured to the domain you are visiting).

In wikipedia's example, this can still be pretty bad. For instance, if an oppressive government realizes that you visit wikipedia version of a particular language pretty frequently (compared to the rest of the population), they might make assumptions about you and profile you. When you visit the German wikipedia site, you are actually visiting de.wikipedia.org instead of en.wikipedia.org which can be intercepted and seen.

This gets worse for static file servers which serve different images at different subdomains (e.g. static512.domain.tld). So, if a DNS request is made to static523, static123, static721, and static132, an attack might be able to guess which article you are reading (or narrow down the choice) because their will not be many articles which have images served by those particular file servers. Thankfully wikipedia does not do that. Everything is served through upload.wikimedia.org but newpapers/forums, etc might not do that or they even have a unique domain for that article (e.g. embedded chart/video, which comes from a unique their party and is loaded automatically).

So all in all, HTTPs is pretty good but you still leave a lot of metadata (the DNS requests are just the tip of the iceberg) that can be used to learn a lot about you. If you want to be safe, use Tor or a VPN. If you use a VPN be aware that you just shift the trust from your current location to another one (so that the VPN provider, their ISP, and the government where the VPN server is located can read all those metadata, which might be not a big deal or even worse, depending where you actually life. Furthermore some VPNs have been known to be broken easily and your ISP/government still sees that you are using a VPN or even Tor).

[1]: One exception is LTE internet but you could still downgrade the connection to 3G or edge to intercept the domain

[+] SpacePotatoe|8 years ago|reply

I just wonder what UK government has against German metal bands

[+] SXX|8 years ago|reply

Against album cover:

https://en.wikipedia.org/wiki/Virgin_Killer

[+] stordoff|8 years ago|reply

Not, strictly speaking, the UK government. The Internet Watch Foundation, a non-governmental organisation, placed the article/image in question on its blacklist, a list which most major UK ISPs use (notable exceptions at the time were the UK universities' and military networks IIRC).

AFAIK, whether or not the image is actually illegal under English law is somewhat unclear (the definition of "indecent" is rather woolly), though it's certainly a poor choice for an album cover.

Edit: "to its blacklist" -> "on"; added "a non-governmental organisation"

[+] vbezhenar|8 years ago|reply

Currently HTTPS sends domain in clear-text before establishing a connection. It allows to host (and block) website by domain, not by IP. May be HTTPS should have optional extension to send URI in clear-text before establishing a connection. This way, if censors decide to block Wikipedia, users can opt-in into this behaviour and have unblocked Wikipedia except few selected articles.

[+] knome|8 years ago|reply

Absolutely not. The response to censorship should not be to make things easier for the censor.

Anyway, the idea is unworkable as the user's client could simply lie about what URI it's going to send after the encrypted connection is setup.

[+] avaer|8 years ago|reply

Two problems:

- Unlike the host, URIs are a property of the request, not the connection, so sending it as part of the connection handshake doesn't really make sense.

- Unlike the host, there is a very long history of putting secret things into the URI. Even if the extension is built with this in mind, the number of security breaches that will result is greater than zero, with probability one. That's probably not the correct price to pay for convenient censorship infrastructure.

[+] unscaled|8 years ago|reply

And how would you make sure Wikipedia honors that clear-text URI (instead of a different encrypted URI inside the request)?

Even when using SNI (the optional extension that sends the domain name in cleartext), the web server fully entitled to ignore it.

119 comments