top | item 6166292

Security advisory: Breach and Django

260 points| Lightning | 12 years ago |djangoproject.com | reply

139 comments

order
[+] brokentone|12 years ago|reply
Correct me if I'm wrong, but it appears as though Django isn't the only framework/technology that is vulnerable to such an attack, they're just one of the first to provide a mitigation strategy (resulting in this post).
[+] steveklabnik|12 years ago|reply
Any website that

  * Be served from a server that uses HTTP-level compression
  * Reflect user-input in HTTP response bodies
  * Reflect a secret (such as a CSRF token) in HTTP response bodies
is vulnerable, regardless of technology.

The mitigation strategies were given in the original paper[1], this announcement is just repeat of what's in there. That said, it's exactly the right thing to do, that's not a knock on Django.

1: http://breachattack.com/#mitigations

[+] jacobian|12 years ago|reply
Yes, that's correct, in theory BREACH can be used to target any sort of secret embedded in the body of an HTTP response. CSRF tokens are the most common type of secret in that category, but there are others. However, we can't speak authoritatively to The Web or All Web Frameworks or anything, but we can advise our users on how they can protect themselves.
[+] artificialidiot|12 years ago|reply
You are right. The thing is, django doesn't have an image problem so it is fine to announce the attack that way.
[+] gojomo|12 years ago|reply
Currently, the Django templating tag:

  {% csrf_token %}
...results in an insert like...

  <input type="hidden" name="csrfmiddlewaretoken" 
    value="566e4606b2094c7c48e5d04b58236f51">
I suspect that the particular mitigation strategy the BREACH authors' describe as "Randomizing secrets per request" could be implemented by having {% csrf_token %} instead emit:

  <input type="hidden" name="random_data" 
    value="91178a84e0bc6e08a2fda853eef2d2c8">
  <input type="hidden" name="csrfmiddlewaretoken_xor" 
    value="e0b594e902c7fe6b1748d13aefaf63aa">
...where the random_data changes every response, the emitted csrfmiddlewaretoken_xor is the real token XORed with the random_data, and upon submission the server will again XOR the two values together to get the real CSRF token.

There may be other secrets that need protection in other ways, and maybe this would make any random-source issues more exploitable... but this would seem to protect the CSRF token, in a cheap and minimal way.

UPDATE: Thinking further, though, maybe the attacker can probe for both values at the same time, and thus determine the probability of certain pairs, and thus this only slows the attack? I'd appreciate an expert opinion, as this was the first mitigation that came to mind, and if it's wrong-headed I'd like to bash my intuition into better shape with a clue-hammer.

[+] homakov|12 years ago|reply
Your UPDATE is right, attacker can probe a..z 2 times, and just choose the letter that was compressed in both of them, ignoring random compressions
[+] Erwin|12 years ago|reply
So to be clear:

1) The attacker must be on the same network as you, or at least be able to detect how large the compressed and encrypted replies are.

If you are on the same network it seems to be there are far more MITM and whatnot attacks that are more likely to succeed, if you do not use HSTS (or secure DNS if that helps).

2) The attacker must be able to get your browser to rapidly generate many (how many?) requests from your browser to the site. It takes "30 seconds" they claim, but is that at a rate 100 requests per second?

3) Each request must carry something that will be reflected by the body of that particular page when it's rendered. I suppose it could be an error message or search string that's echoed.

It seems to me that unless you generate a CSRF token unconditionally on every page, the subset of pages that both reflect something with no protection (e.g. search results) and have a protected form (e.g. change my email address to XYZ) might be small.

4) The secret that can be extracted is what's in the reply body and not the headers -- headers are not compressed, since the TLS compression is now universally disabled post-CRIME.

Personally I use Referer header checking as well. IME all the browsers of my users do send them. So if you extract the CSRF token, it's useless by itself unless you also can make the browser send the right Referer header (and AFAIK, all the holes such as Flash have been plugged).

Other than that -- it seems that if you are normally generating e.g. a 32 byte CSRF key, you could interleave it with 32 bytes of good randomness per request?

[+] RyanZAG|12 years ago|reply
Switch off all GZIP..? That feels very extreme, I'm sure there are better workarounds than that one.

EDIT: The following workarounds should be very simple to implement and seem like more viable alternatives for production?

  Length hiding (by adding random amount of bytes 
    to the responses)
  Rate-limiting the requests
Mitigations 6 and 7 taken from http://breachattack.com/
[+] derefr|12 years ago|reply
All these framework-vendor guides will recommend switching off Gzip, because it's a content-neutral workaround; it works everywhere, for every instance of the attack, no matter how you've coded your app. There are more specific workarounds, but they require changing how you encode secrets into your page, so there can't really be a vendor guide on how to do that; the vendor doesn't know how and where your app sticks secrets into its views, after all.
[+] tptacek|12 years ago|reply
I'm sure everyone is going to come up with workarounds that re-enable compression, but they'll be context-dependent and will involve code; in the meantime, the attack is straightforward and viable. Think of disabling compression as a stopgap.
[+] wim|12 years ago|reply
Yeah turning off compression completely is kind of crude but it works without having to go into app-specifics, I'm sure the Django folks are on it ;). When it comes to things like CSRF tokens, probably secret masking (4) is the easiest to implement? so something like

  import uuid
  csrf = '42be455e20e64d7294eee8d1806d14a9'
  p = uuid.uuid4().hex # random response-specific pad
  xord = "%2x" % (int(csrf, 16) ^ int(p,16))
  request_token = "%s%s" % (p, xord)
  print "<input type='hidden' name='token' value='%s'>" % request_token
  v_unxord = "%2x" % (int(request_token[len(request_token)/2:], 16) ^ int(request_token[:len(request_token)/2], 16))
  if ( v_unxord == csrf ): print "yay, valid CSRF" # constant_time_cmp
[+] kansface|12 years ago|reply
Length hiding was shown to be ineffective in the article (by adding random noise). Perhaps a fixed length response would work better- or perhaps one that is heavily quantized? Really, production environments are not the place to try un-vetted academic crypto research.
[+] maciejgryka|12 years ago|reply
You can still safely compress your static files. So, assuming that you don't send any secrets in your CSS, JS etc., you can configure your server to enable gzip only for these resources.

For example, when using nginx and with gzip off globally, you can do :

    location /static/ {
        gzip on;
        ...
    }
[+] ineedtosleep|12 years ago|reply
Would it be possible to just have some sort of dynamic compression scheme and not gzip when you'll potentially be transmitting sensitive information?
[+] jrochkind1|12 years ago|reply
adding random amount of bytes to response may break your caching, in a variety of ways.
[+] danso|12 years ago|reply
A few days ago, Meldium's announcement of a Ruby gem that provides an inexpensive partial protection (i.e. not disabling gzip) made it to the HN front page:

http://blog.meldium.com/home/2013/8/2/running-rails-defend-y...

The two protective measures are masking the Rails CSRF token and appending a HTML comment to every HTML doc to slow down plaintext recovery. How easy is this to include in a Django plugin?

[+] mhurron|12 years ago|reply
Is a partial workaround really better than a guaranteed workaround.
[+] level09|12 years ago|reply
This would cause a big problem for us. we mobile web service serves around 3-4k concurrent requests on average. without compression our API would take 300% - 900% increase in the delay.

is there any alternatives ? would like to know what Cloud Flare would do as their CDN is based on compressed nginx responses.

[+] pudquick|12 years ago|reply
As gzip compression only applies to the content of the page, not the headers, I would assume that prefixing your page with content that is variably compressible and of varying lengths would throw a monkey wrench in the attacks.

The compressed content of any part of a page very much depends on what came before it. Altering the content to include a script comment block full of random text and various common HTML and JavaScript elements (Markov chains anyone?) would definitely change how a page is compressed.

If the compressed length of the replies varies significantly with every request - even if the request content is identical - attacks like this can no longer reveal hidden information.

Edit:

You could improve this significantly by including false positive matches as well. If your HTML content has: csrf="45a7..." in it, you could hash that content into enough material to generate 19 or so identical looking code blocks embedded in a script comment. You've now provided a 95% chance they attack the wrong one / increased the number of attacks they'll need to try by 20x.

This method (minus the above part) would actually be cacheable by smart CDNs like Cloudflare.

[+] z-factor|12 years ago|reply
The attacker has to be able to issue requests on behalf of the user with injected "canary" strings. I fail to see a practical exploit where one can do this and wouldn't have access to the secret in the response anyway. What am I missing?
[+] tptacek|12 years ago|reply
Does any GET or POST URI endpoint in your application accept parameters? Do none of those parameters impact the output of the application? That set of circumstances is extraordinarily common.
[+] veesahni|12 years ago|reply
I'm in the same boat - if the attacker could inject strings into requests pre-compression, then wouldn't the client already be compromised?
[+] sehrope|12 years ago|reply
How about having the CSRF token change with each request? If it's encrypted/signed by the server for each request with a random IV then it would be different in each request. It would be a bit more processing on the server (decrypt vs just HMAC verify) but it would be completely different each time. It seems kind of belt and suspenders as you're encrypting data within an encrypted channel but I think it gets around this issue.
[+] dvogel|12 years ago|reply
If the CSRF token changes with each page view then opening a second page (perhaps an explanation for a form field) in a new tab/window would invalidate the form in the original tab/window.
[+] pquerna|12 years ago|reply
Has anyone looked at mitigating the attack by changing the behavior of chunked transfer encoding?

Chunked Transfer encoding is basically padding that a server can easily control, without having to change content or behavior of a backend application. A web server could easily insert an order of magnitude more chunks, and randomly place them in the response stream.

[+] donaldstufft|12 years ago|reply
I'm not sure I fully understand the proposed fix here, how does it differ from the application simply including random chunks of data inside the response?

This area of things isn't my strong suite, but assuming that this is analogous to just adding random data to the response, I believe that simply adding random data to the response can be worked around by doing more requests as using statistics to factor out the noise introduced.

If my understanding is wrong then excuse me :)

[+] tomp|12 years ago|reply
So, it seems that even if I encrypt everything, a lot of information is still present in the size of encrypted message; in case of VOIP, it's possible to guess speech that is being transferred over an encrypted transport, in the case of text, it's possible to figure out secrets if the attacker can modify an equally-sized part of the message.

Is there any general way of preventing this kind of attacks? Inserting random data could work, but it's distribution would have to be exactly right for the attack to be impossible over longer periods of time. For the BREACH case, we could solve it by not compressing user input, but what about the VOIP case?

Also, why does the site http://breachattack.com/ says that "Randomizing secrets per request" is less effective than disabling compression?

[+] sdevlin|12 years ago|reply
> Is there any general way of preventing this kind of attacks?

Disabling compression is a 100%-effective countermeasure for compression oracle attacks.

> Also, why does the site http://breachattack.com/ says that "Randomizing secrets per request" is less effective than disabling compression?

Putting random data in the server response will only slow down the attack. With enough requests, the noise from that random data will wash out.

Disabling compression will stop the attack cold. The whole thing is predicated on analyzing the size of the compressed text. No compression, no compression oracle.

[+] cschmidt|12 years ago|reply
I'm sure it will come, but I'd appreciate a layman's terms explanation of this. What is the threat, and how do you go about fixing things in Django?
[+] IvyMike|12 years ago|reply
Imagine you're going to send a compressed and encrypted message to a friend, and I (the attacker) can do two things:

1) Append a bit to the message before it is compressed and encrypted. 2) See the size of the final message.

So I start by appending the string "4179174b19e0cdc91bf4" to your plaintext message. I see the final encrypted message size is 500 bytes.

Then, I redo the experiment, but this time, I append the string "[email protected]" to the message. The final encrypted message size is now 480 bytes. The string I injected was the same size, but the compression worked better this time, and I can guess it's because the string I picked is redundant with something in your plaintext.

Mix in a bunch of complicated math and a bit of javascript, and you've got an exploit.

This threat isn't specific to Django: it's being billed as a TLS attack, but any encryption system that uses compression the same way is vulnerable.

[+] steveklabnik|12 years ago|reply
Here's a simple explanation: http://arstechnica.com/security/2013/08/gone-in-30-seconds-n...

It's not Django, but us over at Rails have been discussing various parts of BREACH and how we'll handle it: https://github.com/rails/rails/pull/11729

The important two comments are here: https://github.com/rails/rails/pull/11729/#issuecomment-2206... and https://github.com/rails/rails/pull/11729/#issuecomment-2208...

  > Let's let this stew for a while with security researchers doing their
  > analysis on various approaches and wait and see what the security community
  > as a whole recommends.
  >
  >  My only concern to rushing out a release is that we do something equally 
  >  dumb and end up creating a different problem for our users. 
  > 
  > We can roll out fixes as it becomes clear what the consensus is as to the
  > best solution for a generalised framework like Rails.
As http://breachattack.com/ says:

  * Be served from a server that uses HTTP-level compression
  * Reflect user-input in HTTP response bodies
  * Reflect a secret (such as a CSRF token) in HTTP response bodies
These things are easy to tell about your application, but are much harder for frameworks to detect generally, which is why projects like Django and Rails will take some time to evaluate exactly how to best handle this at the framework level.
[+] jeffasinger|12 years ago|reply
I'm not all that familiar with BREACH, so please correct me on the parts I'm wrong about, but it seems that it's an attack that allows one to recover some data sent over TLS if compression on TLS and the protocol level is enabled.

In Django, this means that attackers could recover the CSRF token that's used to prevent cross site requests. This means anyone between you and a client could later have that client automatically make authenticated requests to your app, simply by visiting a site they control, without the knowledge of the user.

To protect yourself, the Django team recommends turning off compression either at the TLS level and at the HTTP level.

[+] skizm|12 years ago|reply
The link is like 5 sentences long and 2 of them are recommendations for stopping the attack.
[+] Hovertruck|12 years ago|reply
This advisory is pretty understandable, I think.

The big bold text ("BREACH may be used to compromise Django's CSRF protection") is a strong warning of the threat (becoming vulnerable to XSS). They list two steps that they recommend taking; disable the gzip middleware in your settings.py, and disable gzip for responses from your web server.

[+] STRML|12 years ago|reply
Could somebody help me understand how this attack would be viable?

It seems like the attack has the following requirements:

  1. You want a secret that appears in the response body, like a 
     CSRF token.

  2. The web server always responds with the exact same response 
     for a request.

  3. The response body contains data that you send to the server, 
     e.g. url params.

  4. The attacker has access to an environment where he can send requests 
     under your browser session (otherwise, the user would be
     unauthenticated and there would be no secrets to steal).
Given (4.), how is this a real concern? If I, an attacker, am able to make 3000+ requests while logged in under your session and modify the request character by character pre-encryption, doesn't it logically follow that I have your cookies anyway?
[+] e12e|12 years ago|reply
Looking at https://github.com/django/django/blob/ffcf24c9ce781a7c194ed8... I'm a little confused about how the csrf-token is generally used in Django -- but if I understand the code correctly, it looks for a cookie with the csrf_token, and compares that to a POSTed value (or x-header in case of an Ajax request).

If the system has a decent random-implementation there is no secret involved, just a (pseudo)random string -- essentially a csrf cookie is given the client on one request, and compared on the next request(s).

Is there any reason one couldn't simply use the rotate_token()-function on every (n) request(s)?

[+] sbov|12 years ago|reply
Just to make sure I understand this correctly: is this only a security issue if you include sensitive information on a page by default?

For instance, if you had a search field, the contents of what users puts in that search field will not be compromised. However, if you include a csrf token with the search field form, that can be compromised since it will be there every time the attacker gets the victim to make a request.

[+] homakov|12 years ago|reply
im a rabbit
[+] jacobian|12 years ago|reply
To your second point, [email protected]. It's documented in a bunch of places; where'd you look for it? I'll add it there, too :)

To the first point, we believe that Django's CSRF protection is as strong as session-linked CSRF protection, and adds CSRF protection to anonymous users (users without a session as well). In other words, it's a design decision, one that we believe doesn't compromise CSRF protection. If you believe otherwise, please get in touch (see above).

[+] JshWright|12 years ago|reply
>And i was trying to report it, but didn't find a contact/email.

How hard did you try? Django uses the industry standard security@ address for reporting security issues.

A quick googling results in this page pretty easily: https://docs.djangoproject.com/en/1.5/internals/security/

EDIT: I described the link as 'first' in the Google results, but that was because Google was being helpful and promoting a page I've visited a lot before... In reality, it's a few links down.

[+] donaldstufft|12 years ago|reply
Django's CSRF protection is perfectly fine other than issues with BREACH.

In any case that you can edit the CSRF token you already can execute a much stronger attack (MITM, XSS, etc). If you have a way to set your own arbitrary cookie that doesn't require a much work attack that already includes the ability to do arbitrary requests without them needing to be cross origin then I heartily suggest you report it.

[+] homakov|12 years ago|reply
P.S. i am not into django, but if you have a clue how to contact authors... please tell them to put CSRF token into session cookie. It must be fixed in the first place, BREACH is 100 times harder and longer, while cookie forcing is completely viable attack with active MITM. Or perhaps it was fixed? I checked it on bitbucket the last time..
[+] dangayle|12 years ago|reply
Disable compression altogether? That's craptastic.