Get on-the-go access to the latest insights featured on our Trustworthy Computing blogs.
Pop security quiz: What’s the most secure way to store a secret?
a) Encrypt it with a strong symmetric cryptographic algorithm such as AES, using a 256-bit key.
b) Encrypt it with a strong asymmetric cryptographic algorithm such as RSA, using a 4096-bit key.
c) Encrypt it using a cryptographic system built into your platform, like the Data Protection API (DPAPI) for Windows.
Have you made your choice? The correct answer is actually:
d) Don’t store the secret at all!
Ok, it was a trick question. But the answer is valid: thieves can’t steal what you don’t store. Let’s apply this principle to the action of authentication – that is, logging into a web site. If a site never stores its users’ passwords, then even if the site is breached, those passwords can’t be stolen. But how can a site authenticate users without storing their passwords? The answer is for the site to store (and subsequently compare) cryptographic hashes of the passwords instead of the plaintext passwords themselves. (If you’re unfamiliar with the concept of hashes, we recommend reading http://msdn.microsoft.com/en-us/library/92f9ye3s.aspx#hash_values before continuing.) By comparing hashes rather than plaintext, the site can still validate that the user does indeed know his or her password – otherwise, the hashes wouldn’t match – but it has no need to ever actually store that password. It’s an elegant solution, but there are a few design considerations you’ll need to implement to ensure you don’t inadvertently weaken the strength of the system.
The first design issue is that simply hashing the passwords alone isn’t enough protection: you also need to add a random salt to each password before you compute its hash value. Remember that for a given hash function, an input value will always hash to the same output value. With enough time, an attacker could compute a table of plaintext strings and their corresponding hash values. In fact, many of these tables (known as “rainbow tables”) already exist and are freely downloadable on the Internet. Armed with a rainbow table, if an attacker could manage to gain access to the list of password hashes on the web site by any means, he could use that table to easily determine the original plaintext passwords. When you salt hashes, you take this weapon out of the attackers’ hands. It’s also important to generate (and store) a unique salt for every user – don’t just use the same salt for everyone. If you did always use the same salt, an attacker could build a new rainbow table using that single salt value, and eventually extract out the passwords.
Figure 1: Comparing salted hashes
The next important design issue to take is to be sure to use a strong cryptographic hash algorithm. MD5 may be a popular choice, but cryptographers have demonstrated weaknesses in it and it’s been considered an unsafe, “broken” algorithm for years. SHA-1 is stronger, but is beginning to show cracks and now cryptographers recommend avoiding SHA-1 as well. The SHA-2 family of hash algorithms is currently considered the strongest, and is the only family of hash algorithms approved for use in Microsoft products per the Microsoft Security Development Lifecycle (SDL) cryptographic standards policy.
Instead of hardcoding your application to use SHA-2, an even better approach would be to implement a “cryptographic agility” that would allow you to change the hash algorithm even after the application has been deployed into production. After all, cryptographic algorithms go stale over time; cryptographers find weaknesses and computing power increases to the point where brute force approaches become feasible. Someday SHA-2 may be considered just as weak as MD5, so planning for this eventuality early may save you a lot of trouble down the road. An in-depth look at hashing agility is beyond the scope of this post, but you can read more about a proposed solution in the MSDN Magazine article Cryptographic Agility. And just as the SDL mandates the use of strong cryptographic algorithms in Microsoft products, it also encourages product teams to use crypto agility where feasible so that teams can more nimbly migrate to new algorithms in the event that a current strong algorithm is broken.
So far, we’ve talked about what to hash (the password and a random unique salt value) and how to hash (using a cryptographically strong hash algorithm in the SHA-2 family, and preferably configurable to allow for future change), but we haven’t talked about where to hash. You might think that performing the hashing on the client tier would be a significant improvement in security, since you’d only need to send the hash over the wire to the server and never the plaintext password itself. However, this doesn’t buy you as much benefit as you’d think. If an attacker has a means of sniffing network traffic, he could still intercept the call and pass the hash to the server himself, thus spoofing the user and taking over his session. At this point, the hash essentially becomes the plaintext password. The only real benefit to this approach is that if the victim is using the same password on multiple web sites, the attacker won’t be able to compromise the victim’s account on those other sites as well, since knowing the hash of a password tells you nothing about the password itself. A better way of defending against this attack is just to perform the hashing on the server side, but to ensure that the password and all credential tokens such as session cookies are always transmitted over SSL/TLS. We’ll explore the topic of secure credential transmission (and other aspects of password management such as password complexity and expiration) in future blog posts.
By following a few simple guidelines, you can help to ensure that your application’s users’ credentials remain secure, even if your database is compromised:
It would be nice if Windows itself followed these guidelines for user passwords!
(IIRC, it doesn't use salt, does not use SHA-2 and does not have a cryptographically agile design at all. Maybe this has changed, though.)
I don't follow the recommendation to hash on the server side. If the program hashes on the client I would assume that would be s local call, so there would be no network traffic to intercept. Why would sending the hash over TLS be less good than sending the password over TLS and hashing on the server?
If you’re storing a hash and sending a hash, that’s not much different than storing a plaintext password and sending a plaintext password: the hash effectively becomes the password in this case. If an attacker finds a way to steal the credential database, he could then trivially impersonate any user just by bypassing any client-side code and sending the password hash directly to the server.
i want to know how to keep somone from running macros and backing up all my files? keep them out when they are in my computer?
Implicitly, hashing on the client side also means that you cannot effectively salt the password before hashing.
Surely the recommendation should then be (for applications which are concerned that its passwords may be reused on other sites) to double hash it: to hash it on the client, transmit over a secure channel, and to salt+hash it again on the server before doing the compare.
I also echo the original L's comments about Microsoft following these guidelines. Eg, I believe that authenticators such as session cookies are also transmitted over non-secure channels.
If an attacker steals your password, no amount of hashing either client- or server-side is going to prevent them from spoofing you. Let’s say that Alice uses the same password to log into foo.com as she does at bar.com. Now let’s say that foo.com is storing its users’ passwords in plaintext, and Eve exploits a SQL injection vulnerability on foo.com to extract out the contents of the credential database. Even if bar.com does implement the double-hash mechanism you describe, that still won’t prevent Eve from logging in as Alice on bar.com (assuming bar.com is using single-factor authentication). And foo.com wouldn’t have needed to implement a double-hash to provide an effective defense; use of SSL/TLS and a single salted server-side hash with a strong algorithm would have been sufficient.
What about hashing time? General purpose hashes are designed to be very fast, and that is not what we want in case of credential storage. Of course given secret strong enough the search space is big that even with single iteration of SHA256 it is impossible to check it. But what about algorithms like bcrypt, scrypt or even Rfc2898DeriveBytes? Isn't using time consuming hash function better, than using just plain, general purpose hash?
By using time consuming functions, such as Password-Based Key Derivation Function 2 with thousands of iterations, you make brute force efforts much less feasible indeed, especially for the local credentials storage. But then you need to think about the load on the server side and protection against Denial of Service.
The server-side load shouldn't be an issue with bcrypt, scrypt, PBKDF2. These are all scaleable so you can pick a value that is acceptable even under what you consider to be a high, but not-malicious load. Then, implement login delays (e.g. 1 sec delay after a failed login) and CAPTCHA to prevent automated denial of service. Some experts recommend picking a cost value to make hashing take something like 1/5 of a second, but this is pretty aggressive and would be easy to DoS even with CAPTCHA and delays. At 1/1000 or 1/100 of a second, delays and CAPTCHA should be plenty to stop most DoS. If an attacker can send thousands of bots after your site, he can probably achieve DoS anyway.
Bcrypt in particular has been deployed as the default in OpenBSD and as an option in Linux for a long time and DoS has not been an issue.