If you're trying to encrypt data using a password, how do you convert the password into a key for symmetric encryption? The easiest way might be to simply convert the password to a byte array, and use this array as your key. However, this is a very bad idea and will lead to an easily cracked system. First of all, for a 256 bit encryption algorithm your passwords would all have to be exactly 32 bytes, or you would end up with not enough bits for the key; or worse, too many bits for the key, meaning that every password that starts with the same eight characters will work to decrypt the data.
In the English language, passwords will probably only contain the characters a-z, A-Z, 0-9, and perhaps some symbols. This means that only 70-75 of the possible 256 bit combinations for each byte will be used. In addition, not every symbol will be used with even frequency. Various estimates for the entropy in a single byte of an English language password vary from 1.3 to 4 bits. This means that in order to generate a good 256 bit key, you'll need a password that's anywhere from 64 to 197 bytes long!
Fortunately the .Net Framework has provided several ways for you to convert passwords to keys, all of which are much better alternatives than the simple method mentioned above. All of the methods I mention below will always generate the same key given the same set of inputs, so they can be use to effectively create password-based encryption in your code.
The PasswordDeriveBytes class is available in current releases of the framework, and contains two methods for you to generate keys.
One way of using PasswordDeriveBytes is as a simple wrapper around CAPI's CryptDeriveKey function. This is done by calling the appropriately named CryptDeriveKey method of PasswordDeriveBytes. When calling CryptDeriveKey, you'll be using the password found in the class constructor, but you'll need to pass in the cryptographic algorithm that you're generating a key for, the size of the key that you'd like to use, and the name of the hash algorithm you'd like to use to generate the key. When calling CryptDeriveKey, the salt and iteration count that are set on the PasswordDeriveBytes object are not used, so even having different salts and iteration counts will produce the same key given that the rest of the inputs are also the same. (I'll discuss the use of the salt and iteration count later.)
Some sample code might clear up the use of CryptDeriveKey.
The sample generates a 128 bit key to use with the RC2 algorithm. This key is generated by using the SHA1 hash algorithm. The IV parameter to CryptDeriveKey is actually an out parameter -- it's supposed to represent the IV that you can use with your algorithm. However, current implementations just set this to an array of zeros, so you'll need an IV anyway. (Obviously my IV and password aren't the strongest in the world.)
I occasionally get asked what algorithms you can pass to CryptDeriveKey. Since CryptDeriveKey is a wrapper around CAPI, you need to pass a parameter that CAPI can understand. This means, it must be a string that CryptoConfig will map to a *CryptoServiceProvider class. (Actually, this is simplified a bit .... any algorithm that has the same OID as a CSP class will work, but for all intents and purposes, stick with the CryptoServiceProvider classes). In v1.1 of the framework, these classes are:
The other use of PasswordDeriveBytes is as an implementation of the PBKDF1 algorithm, specified in RFC 2898, section 5.1. (PBKDF stands for Password Based Key Derivation Function). PBKDF1 is a pretty simple algorithm:
As you can see, PBKDF1 uses a salt to reduce the risk of a dictionary attack. Having a large salt will reduce the risk that an attacker can create a list of the output keys for a set of given passwords. Instead of just having to calculate one key, the attacker would have to calculate one key for each salt. RSA, who developed the algorithm as a part of PKCS #5, recommends a salt of at least 64 bits. A 64 bit salt would mean the attacker would need to generate 2^64 keys for each password in order to use a dictionary attack.
In addition, an iteration count is required. In general, the larger the iteration count, the stronger the resulting key. Here, RSA recommends a value of at least 1000. PBKDF1 uses either MD4, MD5, or SHA1 as its underlying hash algorithm, although PasswordDeriveBytes will not care if you use another hash algorithm.
In order to generate your key, you call GetBytes passing in the number of bytes you need for a key. One neat thing about the output of GetBytes is that getting two smaller byte arrays is the same as getting one larger one. For instance, two arrays of 20 bytes put together will be the same as a call to GetBytes(40). You can call GetBytes as many times as you need. PBKDF1 will only produce the number of bytes that the hash algorithm generates, but GetBytes will extend the result further, allowing you to get a large number of bytes out of your password.
The following sample is a rewrite of the CryptDeriveKey sample, using PBKDF1. In this case, I'm using a simple salt, an iteration count of 1000, and the SHA1 hash algorithm. I also use PasswordDeriveBytes to generate an IV for me.
With the release of Whidbey, the frameworks will have an implementation of PBKDF2, in the class Rfc2898DeriveBytes. PBKDF2 is also defined in RFC 2898, in section 5.2. The big advantage of PBKDF2 over PBKDF1 is that the output from PBKDF2 is not bounded to the size of a hash algorithm's output. Although the .Net implementation of PBKDF1 does not impose this limitation on you, in order to be more secure I'd recommended that you move to PBKDF2 when you make the move to Whidbey. As you'll see, moving from PasswordDeriveBytes to Rfc2898DeriveBytes is trivial since Rfc2898DeriveBytes works in much the same way as PasswordDeriveBytes. You start by setting up a password, salt, and iteration count (PBKDF2 uses HMACSHA1 as an underlying pseudo-random generator). Then you call GetBytes repeatedly until you have enough data to encrypt with. The following code is a rewrite of the PBKDF1 code to use PBKDF2:
As you can see, the only change to the code is replacing the PasswordDeriveBytes class with an Rfc2898DeriveBytes class.
Other changes for Whidbey include the ability to set a byte array password (on both PasswordDeriveBytes and Rfc2898DeriveBytes), which increases security by allowing you more control over the password object. (More on that in a future post.)
Hopefully this has provided some useful examples of how to generate a good cryptographic key using a password.