Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
A lot of people really love the idea of cryptography. To computer geeks like us there is nothing cooler than the idea that computing relatively simple arithmetic on a message can enable you to communicate secretly with anyone in the world, even if there are eavesdroppers. Unfortunately, this means that there are a lot of computer people out there who are enamoured with the idea of creating their own cryptosystems, or their own implementations of the cryptosystems invented by others. It also means there are a lot of software product managers who are convinced that security equals cryptography, and that their products will magically become "secure" if only there is more crypto in them. I've gotten a fair number of questions over the years about how to add crypto to applications -- a subject that I am actually not an expert on -- and most of the time the answer is "don't". Many times the question comes from a developer who, though an expert developer of applications in their line of business doesn't "get" the basic idea of cryptography. As a public service, let me lay it out for you. The fundamental idea of modern cryptography is this:
The strength of the security of a large quantity of data -- known as the "plaintext" -- against discovery or modification by a motivated attacker depends upon the security of a small quantity of data -- known as the "key". (*)
That is, modern crypto is essentially a form of mechanical advantage. With a gearing system or a lever you can turn a small motion into a large motion. With a strong cryptosystem you can turn the security of a 1 KB key file into the security of a 10 MB data file. Cryptosystems do not manufacture new security, any more than a lever manufactures new motion. Cryptosystems turn the security of one thing (the key) into the security of another much larger thing (the plaintext).
It is the failure to understand that fundamental idea that underlies most of the questions I get from non-crypto experts about implementing crypto in their applications. Non-crypto experts get enamoured with the math of the cryptosystem but that is not the part that provides the security. The part that provides the security is the security of the key. The cryptosystem itself (or its implementation) might be weak or strong, but even the strongest modern cryptosystem depends fundamentally on the correct management of the keys for its security.
Before the 1970's, all modern cryptosystems were "shared key" systems. That is, if Alice wanted to send a message to Bob over an insecure channel that had attackers attempting to either read or modify the message, then first Alice and Bob would somehow communicate a secret shared key over a secure channel. Once Alice and Bob both had the shared key, Alice could encrypt the message with the shared key, send it over the insecure channel, and Bob could decrypt it with the shared key. Bob would then have the plaintext message, but the eavesdropper would only have the encrypted message. (There are also techniques in shared-key systems whereby Bob could verify that no one tampered with the encrypted message while it was in the insecure channel.)
Clearly the security of this system depends entirely on there being a secure channel over which Alice and Bob can negotiate what their small shared key is. If they try to exchange the secret key over an insecure channel then an eavesdropper can determine what the secret key is, and it's no longer a secret. (As we'll see later, a more aggressive "man in the middle" can cause even more havoc.)
You might immediately wonder why Alice and Bob need to use crypto at all if a basic assumption is that they have a secure method for key exchange. Why not just use the required-to-be-secure channel for sending the unencrypted text? The point is that the secure channel might be extremely expensive, or it might be available only at certain times. For example, the "secure channel" might be that Alice and Bob meet for coffee in Seattle in their hidden underground bunker, decide on a secret key, and then Alice moves to Switzerland and Bob moves to the Cayman Islands, and they can no longer cheaply meet for coffee. But they can still communicate securely after their key exchange, even if they are no longer in the same room together.
The security of the system also depends upon them both Alice and Bob being able to keep that key a secret. If Alice or Bob reveals the key to a third party -- say Eve, the eavesdropper -- then Eve can discover the plaintext of every message sent over the insecure channel. Worse, if the key is discovered by Mallory -- the "man in the middle" modifying the message -- then she can not only discover the plaintext but can modify the plaintext, encrypt the modified message with the secret key, and send the fake message to Alice or Bob, purporting to be the other.
What about public key cryptosystems? Don't they solve this problem? Turns out, no. Now you have four key management problems. Let me explain:
The idea of a public key cryptosystem is that there are two keys. One, the private key, is kept secret. The other, the public key, is -- you guessed it -- public. A message encrypted with the public key cannot be decrypted with the public key, but it can be decrypted with the private key, and vice versa. Alice and Bob both have a key pair; they do not know each other's private keys, but they do know each other's public keys. To send a message to Bob, Alice encrypts it with her private key(**), and then encrypts that with Bob's public key. She sends the twice-encrypted message over the insecure channel to Bob. Bob decrypts it with his private key, and then decrypts the result with Alice's public key. Now he knows that only he could read the message, because only he has his private key. And he knows that it was not tampered with, because only Alice's public key could decrypt the message. Therefore the message was neither read by Eve nor tampered with by Mallory.
Like I said, we now have four key management problems whereas before we had one. Before, Alice and Bob had to securely exchange their private key and then keep it secret. Now Alice and Bob both have to keep their private keys secret. If Bob's private key is compromised then Eve can read any message sent by Alice to Bob. And if Alice's private key is compromised then Mallory can send a fake message to Bob purporting to come from Alice.
That's two; what are the other two key management problems? I completely glossed over the most important part of the system! The detail that the security of the entire system rests on is that bit above where I said "but they do know each other's public keys". Somehow they had to securely exchange public keys. Why's that?
Suppose Alice and Bob are sending each other their public keys. How do they do that? If they do it over an insecure channel, it does not matter if Eve can read the public keys. They are public, after all. But it matters very much if Mallory can modify the keys in transit! If Alice sends her public key to Bob over an insecure channel then Mallory can intercept the message and replace Alice's public key with Mallory's public key. Mallory now knows Alice's public key, and Bob believes that Mallory is Alice! When Alice sends a message to Bob, Mallory can intercept it and replace it with a different message encrypted with Mallory's private key and Bob's public key. Bob decrypts the message with his private key and Mallory's public key, believing it to be Alice's public key, and Mallory has tricked Bob into thinking that she is Alice.
Similarly, when Bob sends his public key to Alice, Mallory can intercept it and replace it with Mallory's public key. When Alice sends a message to Bob she encrypts it with her private key and Mallory's public key, thinking that it is Bob's public key. Mallory can intercept it, decode it with her private key, read the message, and re-send it along to Bob, encrypted with Bob's real public key.
Somehow there has to be a secure channel by which public keys are exchanged. Remember, the entire point of crypto is to turn the security of a small amount of data into the security of a large amount of data. If the keys aren't secure -- either because the private keys are compromised or the public keys are not correctly associated with their real owners -- then the communication is not secure at all.
This is not of theoretical concern; we have to solve this problem in real life. When you go to a web site that is going to take your credit card number, you want some assurances that first, no eavesdropper is going to be able to see that credit card number when it goes over the wire, and second, that when you send your credit card to a web site, you really are communicating with the real web site, not a fake hacker web site that looks just like it. You might not know the public key of the web site! Without a secure mechanism to obtain that public key, you might be obtaining the man-in-the-middle's public key.
This problem is in practice solved by both the client (you) and the server (the web site) agreeing to trust a third party called the Certifying Authority, or CA. When the web site sends you the certificate containing their name and public key, the certificate is encrypted with the CA's private key. By decrypting it with the CA's public key, you can verify that the CA vouches for that web site actually being associated with that public key. Of course, if you do not trust the certifying authority, or the web site does not obtain certification from that certifying authority, then the key negotiation fails and the browser gives you some error message about the certificate not being verified.
But hold on a minute. Now we have yet another key management problem. The CA has to keep their private key a secret, and the CA has to somehow tell you what their public key is without being attacked by a man in the middle purporting to be the CA! We seem to have not solved the problem at all, but rather just pushed it off another level. How do we finally solve this problem once and for all? Can we use even more crypto?
Nope. This is the point where the crypto stops. (***) It has to stop somewhere; again, the point of crypto is to turn the security of a small thing into the security of a large thing; that security has to come from somewhere originally, just as the energetic motion of a lever has to come from somewhere outside the lever mechanism. The secure transmission of the CA's public key is the piece that has to be accomplished without using any crypto. How you accomplish that is up to you; typically the operating system comes pre-installed with a list of known public keys of certifying authorities that have been verified independently by the operating system vendor. New CA "root" certificates can also be installed by your machine or network administrator. The CA roots are the organizations you trust to make security-impacting decisions on your behalf.
I seem to have strayed somewhat from my original point, but I hope I've made myself clear: the hard part of a secure design that uses crypto is not the math. When adding crypto to an application, the fundamental question you should be asking yourself is not "what math am I going to do on the plaintext to encrypt it?" The fundamental question is "how are my users going to generate keys, securely exchange secret keys or public keys, and keep the secret keys private?" The security of the entire system rests upon being able to leverage the secure generation and distribution of the keys into the security of the message, so solve the key management problem first.
(*) A cryptosystem is strong or weak to the extent that it delivers on this fundamental goal. If the security of the message can be compromised without the user compromising the key then the cryptosystem is weak. The goal of modern crypto is to create cryptosystems that deliver on this promise.
(**) In a realistic scenario she encrypts a hash and appends that to the message, because that is higher performance. But let's gloss over that detail.
(***) Just to complete the sketch: the way HTTPS actually works is that a shared "session" key is securely exchanged between the client and the server. The shared key is then used to encrypt and decrypt all the messages sent between the client and the server. As we already discussed, to exchange that shared key we need a secure channel. To obtain a secure channel, the client and the server agree upon a mutually trusted CA. The server convinces the client that the agreed-upon CA vouches for the server's public key, and so the client now has the real public key of the server. The client can then suggest a shared session key to use for the rest of the communication, and send it to the server encrypted with the server's public key. So: the secure transmission of the shared session key relies upon (1) the secrecy of the server's private key, (2) the secrecy of the CA's private key, and (3) that the client and the server both have an accurate copy of the CA's public key, obtained through some non-crypto-based secure channel. The cryptosystem leverages the secrecy of two private keys and the successful transmitting of one public key into the security of the messages transmitted in the session. If any of those keys are compromised then the whole system falls apart. (To deal with that problem, CA's also provide the service of publishing a "revocation list" of servers that have lost control of their private keys, so that you know to not trust those guys.)
Shouldn't Alice be performing her second encryption with Bob's public key (not her public key)?
That was a typo -- thanks for the note. -- Eric
Excellent article as always Eric!, thank you very much.
In the sentence "Before, Alice and Bob had to securely exchange their private key and then keep it secret." shouldn't it be "generate" instead of exchange?
Another option is to assume that most people aren't Eves.
You still need a secure channel (this might mean meeting people in the real world, which could be unsettling for some of our readers) but instead of assigning all trust to one root CA (which may become defunct or compromised) you could ask a lot of people to sign your public key, and offer to sign theirs.
Now you have a set of known public keys and each (including your own) is signed by a group of people. When you receive (through a non-secure channel) a new public key that has been signed by others you can trace them back.
You have to acquire enough keys and make the assumption we started out with, but this added complexity shields you from what happens when a CA goes all Diginotar on you. Of course, someone might be paying all the people you know to sign a fake public key, but if that happens I think you have worse problems...
What if an OS distribution with public keys of CAs that is downloaded illegally over the internet contained messed-up CA certificates ? (deliberately)
Is this even possible ? and what would the implications be?
What if lots of people were downloading a Linux distribution through an insecure internet connection, and then installing without checking the authenticity of the CA keys contained therein?
<sarcasm>But surely people would never do that!</sarcasm>
It's very easy to do. Just generate a key of your own, install it in the operating system as an additional root CA certificate, and then distribute an image of that "corrupted" operating system. Now you can start creating phishing sites with certificates that are signed by your fake CA certificate.
The implications would be that users of your corrupted operating system would be led to trust the phishing sites as being who you claim them to be. If you do nothing to change the code in the operating system to hide your corrupted root CA certificate from casual investigation, then your corruption may go undetected by the user for a long time.
Interestingly, wrong security is still better than no security at all in most (but not all) applications.
First, it will discourage most (if not all, depending on how important is your data) of eavesdroppers - even a simple XOR is able to stop most of them.
Second, it gives you something to fight in court; I don't think there is a crime of eavesdropping and packet sniffing, but circumvention of security systems is there.
Third, the eavesdropping cannot be claimed to be done "by accident".
As an analogy, I guess you don't have a self-sustained and backupped power station just to power the laser beams pointing to your house door, with snipers on the roof and a constant military door for your house. I guess a lock and optionally a simple antitheft system is enough for you. That's completely unsecure for expert and determined thieves, but it's enough to keep away most if not all the thieves who are realistically going to rob your house. On the other hand, keeping the door unlocked will get you with an empty house in a few days.
Sure, and I have made this argument myself; the "script encoding" algorithm we added to VBScript and JScript is trivially broken. It was added to stop casual inspection, and to show that stealing intellectual property in a script had to have been done deliberately in violation of a licensing agreement. But you can go too far in applying your argument. It is one thing to say "this security system was deliberately designed to provide cheap-but-weak security against casual threats". It is another thing entirely to say "I have a bulletproof cryptosystem that provides strong security against determined attackers" and have it not live up to that promise because of a design or implementation error. A weak security system might be good enough but you have to make that choice as an informed consumer; relying on a weak security system that you've been sold on as a strong security system is an extremely vulnerable position to be in! -- Eric
I'm sorry...is it "256 bits" instead of "256 bytes" perhaps?
Thank you, anyway.
Key sizes are typically measured in bits, and file sizes are typically measured in bytes; I had to choose to either be inconsistent with standard crypto jargon or inconsistent in the comparison. That said, a 256-byte = 2048 bit key is possibly too small a key to use in RSA, and too large a key to use in a typical symmetric cryptosystem. -- Eric
Tried in vain to explain this exact problem to my computer security professor eight years ago, perhaps I'll email her your article :)
Excellent article Eric ... it's worth noting that sometimes even the "experts" get design or implementation of such system wrong. Two recent examples (out of many) that come to mind are the vulnerabilities recently exposed in the SSL/TLS system that underlies the secure HTTP protocol, as well as the password vulnerabilities in OSX Lion. These kinds of flaws in widely used, extensively studied and tested systems should give developers pause ... if the security of your system actually matters, it's worth getting an expert to help design and implement it.
Nice read as always.
Sounds like this may have been motivated by: stackoverflow.com/.../7540173
Did you mean 128 bytes = 1024 bits or 256 bytes = 2048 bits?
Apparently my mental bit shifting was off by one. I've corrected the error. -- Eric
Just a nitpick, but: It's a pet hate of mine when people say that with a "shared key" (symmetric) system, you have to share a key *before* you can encrypt the message and send it over. In fact, you can pick whatever key you want - encrypt the message and send it over - and then share that key *later*. Or you might not share that key *at all*, unless and until some special event occurs. None of that precludes you from "encrypting the message and sending it over".
And all those problems with CAs signing basically anything as long as you pay their fee, perfectly illustrates the point of this article :/
@Ron Warholic: Great link :D
To quote: "so the client message will contain: data to be send, client's public key, client name encrypted with the client's private key."
aka: "Here's the sensible data and this is the key you should use to make sure that nobody changed anything of it!"
"She sends the twice-encrypted message over the insecure channel to Bob. Bob decrypts it with his private key, and then decrypts the result with Alice's public key. Now he knows that only he could read the message, because only he has his private key. And he knows that it was not tampered with, because only Alice's public key could decrypt the message."
This confuses me a bit. Bob can't decrypt messages encrypted with Alice's private key using her public key, right? I was under the impression that you encrypt with a public key and decrypt with a private key - though I'm not knowledgeable at all when it comes to this.
In a public key cryptosystem there are two keys, A and B. A message encrypted with A can be decrypted by B. A message encrypted by B can be decrypted by A. The cryptosystem does not distinguish between the two keys; you can use either of them to encrypt a message, which can then be decrypted by the other. The fact that you make one of the public and one of them private is a fact about you, not about the math of the cryptosystem. -- Eric