A recent question I got about the .NET CLR's hashing algorithm for strings is apropos of our discussion from January on using salted hashes for security purposes. (If you missed it, you can read it here:
Part One, Part Two, Part Three and Part Four). The question was basically "my database of password hashes doesn't seem to work with .NET v2.0, what's up with that?"
To make a long story short, the answer is UNDER NO CIRCUMSTANCES SHOULD YOU USE THE .NET STRING HASH ALGORITHM (that is, String.GetHashCode()) FOR SECURITY PURPOSES. That is not what it was designed for. If it hurts when you do that then stop doing it!
The slightly longer version goes like this. Suppose you want to store some secrets in a database, but you only need to be able to confirm that the user knows the secret. As I discussed in my series on salted hashes, a hash is a commonly used tool for this task because a cryptographic hash has some nice properties. Namely, it is a fixed number of bits (in the 100's of bits range), small changes to input produce huge changes in output, and it is very difficult to go from the hash back to the original secret. Another nice property that I didn't call out in my earlier article is that there are industry-standard hash algorithms where you can be reasonably guaranteed that any two implementations will produce the same results when given the same set of bits.
The .NET CLR string hash algorithm has none of these nice properties, and therefore is completely unsuitable for a cryptographic hash function. Specifically:
- The string hash algorithm was designed to be blindingly fast rather than hard to run backwards, so it is likely that a mathematically astute attacker will be able to rapidly deduce facts about the input knowing only the hash.
- Worse, being only 32 bits, using brute force to find a message that produces a given hash becomes doable in an afternoon with a PC rather than a trillion years.
- Finally, the string hash algorithm is not an industry standard and is
not guaranteed to produce the same behaviour between versions. And in fact it does not. The .NET 2.0 CLR uses a different algorithm for string hashing than the .NET 1.1 CLR. If you are saving .NET 1.1 CLR hash values in a database then you will not be able to match them when you upgrade to 2.0. The hash algorithm was specifically NOT designed to be forward/backward compatible and we called that out in the documentation because we knew that it probably would not be. Unlike last week, I have no problem whatsoever with making breaking changes when we call out in the documentation that you cannot rely on version-to-version identical output for a function.
So please don't do that; you're just asking for a world of hurt if you do. Use SHA1 or MD5 or some other algorithm designed for that purpose. (Yes, I know that weaknesses have been discovered in these algorithms, but they are still orders of magnitude better than a hash designed for hash table balancing!)