Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
OK, we want to sketch out an authentication system which is sufficiently secure against common attacks even if all the details of the system are known to the attacker. Let's start with a simple system, take a look at what its vulnerabilities are, and see if we can mitigate them:
The client transmits the username and password "in the clear" to the server. The server consults a list of usernames and passwords, and grants or denies access accordingly.
There are two big problems with such a system. First, it's susceptible to eavesdropping if the client and the server are not the same machine. If someone can listen in on the message sent from the client to the server with a network packet sniffer then the eavesdropper learns a valid username and password. Second, if the server is ever compromised and the password list is read, the attacker learns everyone's username and password.
Can we mitigate these problems? Let's look at the second vulnerability first. Does the server need to store the password? How could it NOT store the password? What would it compare against?
We can eliminate the need for the server to store the password if we have a hash function. A hash function is a function which takes a string as its argument and produces a number, usually a hundred--or-so-bit integer, as its result.
A good hash algorithm has the property that slightly different inputs produce very different outputs. A one-bit change in the input should cause on average 50% of the output bits to change. Because of this, it should be extremely difficult to deduce the an input that produces a given output, even given partial information about the input. (There are other hash algorithm properties that are important for cryptographic operations such as document signing, but that's a topic for another day.)
Proving that a given algorithm actually has this property can be quite tricky, but we have some industry-standard hash algorithms which have withstood rigorous testing and deep analysis by professional cryptographers and are widely believed to be "one way functions" -- it's easy to go from string to number, and very hard to go backwards.
The client sends the username and password to the server. The server has a list of usernames and the hashes of passwords, but not the passwords themselves. (When the user first created the password, the system hashed the password and then discarded it, saving only the hash.) The server hashes the client-supplied password and compares the hashes.
This is better; now the server is not storing the passwords, just the hashes. If an attacker compromises the server, they can't easily go from the hash back to the password. (It also has the nice property that every entry in the password table is now the same size. Longer passwords do not have longer hashes.)
But there are two new problems with this system. First, any two users who have the same password have the same hash. If one of those users is evil and compromises the server, they immediately learn who has the same password as they do.
Second, this system is susceptible to dictionary attacks. An attacker can hash every word in a large dictionary, compromise the server, and then compare every password hash to every word in the dictionary. Since dictionary words are likely passwords, the attacker will probably be able to figure out at least a few passwords.
And of course we still haven’t mitigated the fact that eavesdroppers could be listening in on the conversation between the client and the server.
Next time, we'll add a little salt to the mix in an attempt to mitigate the dictionary attack and same-password vulnerabilities. Then we'll see if we can use some of the hash technology to mitigate the eavesdropping attack.
A recent question I got about the .NET CLR's hashing algorithm for strings is apropos of our discussion
>> A good hash algorithm has the property that similar inputs produce different outputs.
If that's the case, how can you be sure that the hash of the password submitted by the user attempting to log in will match the hash of that password that was stored in the database?
By "similar" I mean "similar but not identical". That is, the hash for "banana123" should be completely unrelated to the hash for "bananb123". Clearly the hashes of two identical inputs must be identical!
Sorry for the confusion.