I have talked in the past about how passwords for SQL logins are protected in SQL Server (see this post). I would like to describe this scheme in a more generic way and compare it with the alternative of encrypting the passwords, because I have seen people wondering which method they should use.

First, what is authentication? Authentication is the process we go through to verify the identity of a user. It should not be confused with authorization, which is about what actions we allow an already identified user to do. Authorization happens after authentication and relies on its result.

Authentication schemes are based on information falling in three large categories: what the user knows (passwords, PINs, birth date, mother's maiden name), what the user has (id card, badge, debit card, credit card, smartcard), and what the user is (photograph, retina scan, fingerprint, voice recognition). Sometimes combinations are used - a debit card requires knowing its PIN and an identification card will have some biometrics information like a photograph and maybe height and weight information as well.

The goal of password authentication is therefore to verify a user's identity based on a password associated with him that only he should know. If more people know the password, then we cannot really authenticate the user - we're authenticating the group of people that know the password. The good thing about a password is that we can commit it to memory and we can later recall it when we need it - the fact that it resides only in memory is appealing from a security point of view because others can't yet read our minds reliably. The bad thing about a password is that we have to commit it to memory and we may fail to recall it later when we need it - this is an inconvenience that may not have any security impact by itself, but it usually leads to one because, to avoid the situation, we'll usually pick something easy to remember which might also be easy to guess. Writing it down on a piece of paper is not that bad as long as we keep that piece of paper secure. A password is also vulnerable during the moment it leaves our mind to be presented for authentication - by typing it on a keyboard or by saying it on the telephone. Note that all these observations apply to any authentication method based on what the user knows - in fact, the other ones I listed have more problems than passwords - PINs are shorter, and the birth date and the mother's maiden name are not only known to one person.

So let's look at how we can perform password authentication.

Method 1 - Store: Store the user password somewhere in the system. Whenever the user provides his password, we compare it with the stored one and, if they match, we have successfully authenticated him.

Problems: One, but big: the password is stored in clear, meaning that whoever has access to the storage place can learn the passwords stored there. This is a big problem because it means the password can be accessed by bypassing the application and just examining the disk; it will be there if the disk breaks down and gets replaced, for example. How can we improve this? A particularly bad direction would be to try to come up with our own obfuscation scheme for not storing the passwords in clear. We'll not go there, so let's look at proven technologies. We can use an encryption algorithm that has already proved its usefulness in the field.

Method 2 - Store encrypted: Have some encryption key created for the purpose of encrypting passwords. We store the user password encrypted with this key and whenever the user will provide us with his password, we'll decrypt the stored one and compare them.

Problems: A couple smaller ones: the first problem is about finding a way to protect the encryption key, the second problem is that anybody with access to the encryption key will still have access to all the passwords. Finding a way to protect the encryption key is tricky - as I mentioned in other posts, encryption doesn't really solve the problem of protecting information - it just changes the problem of protecting a lot of information into a problem of protecting a tiny bit of information - the encryption key. Whether we address this issue or not, there is no denying that it adds some complexity to the solution. I also don't like the idea that administrators of the system could know my password - yes they could send it back to me if I forget it, but they could also leak it or just use it to find out more about how I choose my passwords. So, I am not happy with this solution - how can we improve it? Well, here comes hashing! Hashing works one-way - you can get a hash from some information, but you cannot retrieve that information from the hash.

Method 3 - Store hash: Choose a hashing algorithm and store the hash of the user password. When the user provides his password, hash it and compare the result against the stored hash.

Problems: One tiny one: Hash collisions - it is possible for two different passwords to have the same hash value, but cryptographic hash algorithms make this very unlikely. Also, there's no way we can find out a password if the user forgets it - we'll have to instead change his password and provide him with the new password and a method of changing it. Of course, before doing this we have to figure out another way of authenticating him - Web applications usually solve this problem by sending the password to the email account associated with the user.

What did we gain with this method? The password is not stored in clear, we don't have any encryption keys to manage, and we have better guarantees that the system and its administrators are not able to retrieve the passwords from the stored hashes. The administrators can still monitor the system when you are providing the password, but that is more difficult than just decrypting the encrypted passwords and is an issue that all these password authentication methods have anyway.

But it's not over yet! We can make this better. To understand why, let's look at how we can attack this method and whether we can thwart some attacks.

A simple way to attempt to retrieve passwords from hashes would be to build a large collection of possible passwords - we won't be able to have all possible passwords in it, but we can collect a lot of likely passwords and we can grow this collection over time. Once we have this collection, we can hash all these passwords using the same algorithm that the application uses (we can reverse engineer it by examining the application or we can simply use the application itself to generate the hashes by creating accounts for all those potential passwords). Whatever the method, we can end up with a collection of hashes and corresponding passwords - we now have a dictionary that we can use to attempt to break other hashes by simply looking them up in it. Of course, we won't be able to break any password, but if we break one and it happens to be an administrator account, it's enough.

This sounds bad, right? You're bound to have some users with poor passwords, maybe even administrators (sa with empty password was a bane of many SQL Server installations) and this attack will break those passwords. It's important to remember that a prerequisite for this attack is to have access to the password hashes in the first place, which is not trivial to get except for the administrators. But this is still not very nice and there is actually a little trick we can do that will make a dictionary attack much more costly.

Method 4 - Store salted hash: The trick we'll use is to add some randomness to the hash generation process - this is called "salting" the hash. What we're going to do is to randomly generate a piece of information called the "salt" for each password that is set for a user. This salt will be combined with the password and we'll store both the salt and the hash in the system. Whenever the user will submit a password, we'll lookup the salt and hash, we'll combine the salt with the submitted password, we'll hash the result and we'll compare it against the stored hash.

How can we combine the salt with the password? One simple way is to prepend the salt to the password - this will work well with most hashing algorithms. Appending the salt may not be as good - it may end up affecting only part of the hash, depending on how the hash algorithm works. If you want to get creative here, look for more information about the hashing algorithm in a cryptography book. 

How does this help with the dictionary attack? Well, because for each hash we have a salt value, a single dictionary won't cut it anymore - the attacker would have to build a dictionary for each salt value - and that is expensive because hashing is not a cheap operation.

How large should a salt be? If you make it too small, 1 byte, for example, it is still feasible to build 256 dictionaries to cover all possible values of that salt. So salts should be picked up to be larger values. For example, a 4 byte salt would require more than 4 billion dictionaries - this is the value used in SQL Server 2005.