More on Hashcodes

One of the devs on the BCL team just added a bit to my recent post on hashcodes  Enjoy!

 

Brad's comment above applies to Object's GetHashCode implementation, which most interesting classes override, providing their own hash function. We believe GetHashCode should be used as a hash function that returns a seemingly-random value that could be negative or duplicated for multiple values. In V1, Object's GetHashCode unfortunately gave some stronger guarantees than this that a few people wanted to depend on, but that wasn't in the contract of the method. Their code is already broken on version 2 (we think the only people that depended on this were internal the company, and they would have long since found their bug & corrected it).

Note that we also don't want user code taking a dependency on our existing hash function implementations for any type - ideally we could change them every time we build the product. To elaborate on that, let's look at String.

String uses a different hash function that looks at each character, XOR'ing in the new character with a (presumably prime) number. We'll change String's hash function in a future version so it both executes faster and produces a better distribution. This will improve lookups in hash tables when using strings as keys. But because we'll change the hash function, it is also important to not depend on one particular version's implementation of GetHashCode. IE, never write the values you get back from GetHashCode to disk and read them back later, or sort values based on their hash function then persist that data to a file or send it over a network.

Brian Grunkemeyer
MS CLR Base Class Library team

 

Published 06 October 03 01:22 by BradA
Filed under:

Comments

# Frank Hileman said on October 8, 2003 10:55 AM:
Currently the BCL hash functions and hash table is based on the ancient Knuth-style "xor and mod prime" hashing algorithm and data structure, which uses a prime number of buckets. Although still taught in schools, this type of hashing has been obsolete for some time. It would be nice if a faster, modern hash algorithm/data structure were adopted, such as this one: <a href="http://www.isthe.com/chongo/tech/comp/fnv/">http://www.isthe.com/chongo/tech/comp/fnv/</a> This would require a change in the current hashcode algorithm recommendations. XORing the bytes together is not a good way to construct a hashcode for a modern hashtable, which does not use a prime number of buckets. If someone from the BCL team could comment that be nice!
# Gabriel Fogante said on December 15, 2003 9:29 AM:
One question... Does the following code allways return unique values? Guid.NewGuid().GetHashCode()
# Ken Brubaker said on February 20, 2004 10:46 AM:
For my own reference, I thought I'd compile a quick list of design guidelines added by Brad Abrams, et al.
New Comments to this post are disabled

Search

Go

This Blog

Syndication

Page view tracker