top | item 14653163

How Not to Encrypt a File – Courtesy of Microsoft

88 points| rakel_rakel | 8 years ago |medium.com

61 comments

order
[+] GreaterFool|8 years ago|reply
The author could spend less time bashing the original article and a little bit more explaining how to do things right.

This:

> Suggestion to use the encryption key as the IV

is a second sub-heading while the words "initialization vector" don't appear until much later. Initialization vector is pretty obvious, "IV" isn't.

Also the author spends time complaining that the original article misunderstands the use of initialization vector while providing no explanation of how it should be used.

After reading the post I haven't learned anything useful other than that the original article was bad.

[+] Bartweiss|8 years ago|reply
I... sort of have mixed feelings on this.

I agree that the article could do far more to explain what's good, both in content (talking about why these things are bad) and in style (defining all terms immediately).

But holy shit, the MSDN article is bad. It's so hideously bad that I think there's nontrivial social value in bashing it extensively to discourage people from writing docs like this without getting them sanity-checked.

In short, I think this article is largely useless to people reading guides and trying to avoid the pitfalls of the original source, but is aimed at people writing crypto guides who have no business doing so.

[+] jlebar|8 years ago|reply
Explaining to you how to do it the right way is not an obligation of anyone that says X article is wrong?

"This article on global warming could spend less time bashing governments for inaction and more time talking about how I can reduce my emissions."

"This bad restaurant review could spend less time bashing the chef's food and more time telling me where the good restaurants are."

Similarly maybe the author didn't explain what "IV" means because their audience understands that term.

"This article in CACM uses 'NVRAM' in the heading, while the words "non-volatile" don't appear until much later. Non-volatile is pretty obvious, 'NVRAM' isn't."

[+] Sophira|8 years ago|reply
While I'm sure the article is correct, it doesn't even attempt to link to resources to say how these things are misunderstandings. For example, I myself don't really understand IVs, and from my perspective I'm left with no clearer of anuidea about why IVs shouldn't be considered secret, or why the IV isn't required to be able to decrypt the file again.

Regardless, it's obvious that the fact that bad encryption advice in a MSDN article is horrifying.

[+] fpgaminer|8 years ago|reply
> I myself don't really understand IVs

Time to drop some knowledge!

IVs are used in a number of places in cryptography, so I'll just pick one (easy) example.

Consider the stream cipher ChaCha20. You can think of ChaCha20 as a black box. You input a key and an IV and out you get a really, really long stream of uniformly random bytes. (This is a simplification but sufficient here). ChaCha20 works in such a way that having any or all of the output stream doesn't help you figure out what the inputs were. It's irreversible. ChaCha20 is also deterministic; the same input will give the same output.

You can then use the output of random bytes to encrypt a message by XORing with your plaintext. To later decrypt, you feed the same key and IV, get the same stream, XOR the ciphertext with it, and by the property of XOR you'll get the plaintext.

Now why is there an IV? Let's consider a ChaCha without an IV. The system works like so:

    R = ChaCha(Key)
    Ciphertext = Message ^ R
So let's encrypt two different messages:

    R = ChaCha(Key)
    Ciphertext1 = Message1 ^ R
    R = ChaCha(Key)
    Ciphertext2 = Message2 ^ R
Notice how R is the same for both messages? Again, ChaCha is deterministic; the output is the same for the same inputs. Since the key is the same, R is the same. Now an attacker, knowing this, can do this

    Q = Ciphertext1 ^ Ciphertext2
What does Q end up being? Let's look:

    Ciphertext1 ^ Ciphertext2
    = Message1 ^ R ^ Message2 ^ R
    = Message1 ^ Message2
So Q ends up being equal to the XOR of the two messages. That's really bad. The xor of two messages might be enough to tell the attacker what the messages are, especially if the messages are predictable (like english text). But maybe that's not scary enough. Well there's another attack. What if you're encrypting a data format with a header. Headers often have the same data in the same places. So the attacker knows part of the message. Uh oh...

    R = Ciphertext1 ^ Message1
If the attacker knows the message (or any parts of it) they can recover the R of those parts. And now, since your key is always the same and your R is always the same, all the other messages you encrypt will have those bytes exposed.

This is where IVs come in:

    R = ChaCha(Key, IV)
IV should be unique per message. That means that every R is different! None of the above attackers work anymore. XORing two ciphertexts together returns gibberish:

    R1 = ChaCha(Key, IV1)
    Ciphertext1 = Message1 ^ R1
    R2 = ChaCha(Key, IV2)
    Ciphertext2 = Message2 ^ R2

    Ciphertext1 ^ Ciphertext2
    = Message1 ^ R1 ^ Message2 ^ R2
And if the attacker knows the message, all they can recover is R1 or R2 (or any R). But that's useless, because since all your IVs are unique that R will never be seen ever again.

That's the point of IVs.

> why the IV isn't required to be able to decrypt the file again.

It is required. Obviously you need all the inputs to ChaCha to get the byte stream again, to decrypt the message.

Now sometimes the IV is known from the protocol. So say you're using ChaCha to encrypt network traffic. You might set the IV equal to the packet number. So both sides already know the packet number.

But you always need the IV to decrypt.

> and from my perspective I'm left with no clearer of anuidea about why IVs shouldn't be considered secret,

Consider again ChaCha20 as a blackbox. Key+IV goes in, stream of bytes comes out. There's no way to reverse that without the key (and IV). Since the attacker doesn't know the Key, they can't reverse it. Knowing the IV doesn't help.

Another way to think about it is that, instead of accepting a 256-bit key and a 64-bit IV, it's really just a 320-bit key. Knowing 64-bits of a 320-bit key doesn't help break a cipher. The cipher is still 256-bits strong. So you can share the IV without affecting security.

BIG NOTE: It's important that an IV is always unique. If an IV is ever re-used, the above attacks become available again because R will be the same for two messages.

Hope that helps. This is only one way that IVs are used. In ChaCha20 it's called a nonce, because ChaCha20 is geared towards usage on network protocols where the above trick of using packet number is applied. For block ciphers there are various cipher modes that get used, and most of them need an IV. The purpose is always the same; to make this "session" of encryption unique.

There's another way to use IVs, and I think they re-affirm the concept of what an IV actually is. Let's say you have a cipher that only accepts a key! No IV (like AES). You still want to make your encryption sessions unique. A way to do that is this:

    TempKey = HMAC (IV, Key)
And then use TempKey. HMAC is a form of hash. In this case it lets us combine a Key and IV in an irreversible way, yielding a new key. TempKey will be the right size key for the cipher (say, 256-bits). What this is doing is giving us a unique key for every encryption session. And that's the heart of IVs. And in many ways, ChaCha20 is doing exactly that. It's hashing together Key and IV and using the output hash to generate a long stream of random data that can't be reversed back to the key+IV.

(and in case you're wonder, yes, you can use a cryptographically secure hash function alone to build a stream cipher like ChaCha. It'll just be _really_ _really_ slow, because hash functions are really, really slow compared to ChaCha.)

[+] jdcarter|8 years ago|reply
In addition to fpgaminer's excellent explanation, I highly recommend the book "Cryptography Engineering: Design Principles and Practical Applications" by Niels Ferguson, Bruce Schneier, and Tadayoshi Kohno. It's an excellent overview of how to use crypto primitives and why to use them that way.
[+] rakoo|8 years ago|reply
I know you're not just looking for answers but a pointer to some better documentation, and I can't provide you with those, but:

> why IVs shouldn't be considered secret

The least is considered secret, the least can be leaked and cause problems.

> why the IV isn't required to be able to decrypt the file again

The IV is required to decrypt the file again. In the linked document's design the IV is actually the encryption key, which means it is known by the receiver, which is why it's not included. But that is just a special case that should never be reproduced.

[+] sixothree|8 years ago|reply
Agreed. Where is the pointer to the correct article to use when encrypting a file in C#.
[+] pacaro|8 years ago|reply
Note: All my information re: Microsoft is from no later than 2013.

This is indicative of a classic challenge in the industry.

To ship code that uses crypto at Microsoft you have to go through an auditing process. To ship code that uses novel crypto, or works directly with crypto primitives, you have to be reviewed by a specialist crypto review board — that contains security and crypto people from across the company, names that you might know (e.g. Niels Ferguson was there last time I needed a review. Hi Niels!)

Samples and documentation aren't held to the same standard.

[+] unscaled|8 years ago|reply
As someone in charge of reviewing all crypto code for a sizable chunk of my company, I've yet to see a single case of someone using encryption primitives correctly by naive developers. To tell the truth, I don't think I've ever seen a single example of IVs used correctly.

At the very best of times I get AES-CBC-HMAC-SHA1 (usually Encrypt-AND-MAC) with binary keys and secret static IV.

I'm still waiting for the developer that will botch AES-GCM with a random nonce so I can have first world problems, but we're not there yet.

I wanted to call Microsoft sneaky for pulling out this article, but considering basically every top-ranked "how do I encrypt with AES" question on StackOverflow is full of bad advice, I'm glad they at least did something.

[+] jwilk|8 years ago|reply
The article says that DES "can be brute forced in a single digit number of days by a modern computer".

  2**56 keys / 9 days ≈ 92.7 Gkeys/s
Can modern computers actually compute DES that fast?
[+] Strategizer|8 years ago|reply
The article author is complaining about an MSDN article not being updated. The content even says it applies to VS 2005 at its highest. That's a hint of how old it is. Is he going to get the print version and complain about that next. If programmers are using this without thought that is on them not the example code.
[+] BusinessInsider|8 years ago|reply
That's pretty disturbing. Though to be fair, the article in question was written a while ago (since it targets .NET 2005), and to be less fair, MS doesn't really review their documentation very well, at all.
[+] duke360|8 years ago|reply
probably you are too youn, in the past when internet wasn't so ubiquitus, having a MSDN cd documentation was a live saver. the docs that today have serius content directly descend from that days, the res, as other already said, are just boilerplate autogenerated docs., which nobody maintains anymore because simply the technology is too fast. so probably this doc page abaut usage of DES is directly from 1990 or so... and in that days probably was good enough
[+] TheSpecialist|8 years ago|reply
It does seem useless to make the IV the same as the key. But is there a reason making the IV the same as the key is worse than using 0 as an IV?

Just asking.

[+] norcimo5|8 years ago|reply
To encrypt: tar cz foo | openssl aes-256-cbc -salt -out foo.enc

To decrypt: openssl aes-256-cbc -d -in foo.enc | tar xz

(foo can be a file or directory)

[+] snakeanus|8 years ago|reply
This does not contain a MAC though, does it? Also why CBC? Why not CTR/GCM instead? And why AES256 instead of Chacha20-Poly1305 or some other modern AEAD?
[+] snakeanus|8 years ago|reply
I feel disgusted after reading this. I wonder how many people applied the advices given by the original article because they made the bad decision to trust the official documentation by MS.
[+] bartread|8 years ago|reply
Oh, come on: whatever Microsoft's faults might be they have a very long track record, stretching back decades, of providing overall high quality documentation for developers.

Yes, there are errors. Yes, sometimes there is deeply misguided advice. But, on the whole, MSDN and its ilk has helped me far more often than it's hurt me.

Key point: compared with much other vendor and OSS documentation, Microsoft are absolutely streets ahead.

[+] wintorez|8 years ago|reply
I always look at Microsoft in order to learn how not to do anything /s
[+] giancarlostoro|8 years ago|reply
>It’s a good thing the caesar shift isn’t available in their library or it would probably have ended up in this tutorial.

https://docs.python.org/2/library/codecs.html#python-specifi...

Python does rot13 :)

[+] proaralyst|8 years ago|reply
But that's in the codecs library, not a cryptography library.
[+] Sean1708|8 years ago|reply
To be fair that's not a tutorial on how to encrypt and decrypt a file, it's a reference on the possible encodings you can use for a string.