Amazon.com Widgets

Some thoughts on Object.GetHashCode()

One of the architects on the CLR team just reminded me of a detail about GetHashCode on object… Specifically you should NOT use it as a unique ID for an object.  It is possible for two different objects to return the same value. 

 

By the contract of GetHashCode we know that if two objects have different hash codes, they are not the same object.  But if two objects have the same hash code, they are not necessarily the same object (but it’s very likely that they are).

 

Today we use the index of the sync block for the hashcode.  This has some perf issues as we delay create the sync block so it is allocated only when needed.  With V1.0 ands V1.1 this means calling GetHashCode creates a syncblock.  In a future version we will change GetHashCode to return essentially a random number.   The number is of course stable so that invoking Object.GetHashCode repeatedly on any given object will always return the same value.  But it does make the contract described above more apparent, so you are well advised to code with that that in mind today.

 

Published 30 September 03 01:52 by BradA
Filed under:

Comments

# jeff clausius said on September 30, 2003 5:53 PM:
brad: point noted. one quick question. do you happen to know which hashing algorithm is used on the System.String class? i don't believe the id coming from get hash code is from the sync block, but rather a hash of the strings contents. am i wrong?
# Brian Grunkemeyer said on September 30, 2003 11:05 PM:
Brad's comment above applies to Object's GetHashCode implementation, which most interesting classes override, providing their own hash function. We believe GetHashCode should be used as a hash function that returns a seemingly-random value that could be negative or duplicated for multiple values. In V1, Object's GetHashCode unfortunately gave some stronger guarantees than this that a few people wanted to depend on, but that wasn't in the contract of the method. Their code is already broken on version 2 (we think the only people that depended on this were internal the company, and they would have long since found their bug & corrected it). Note that we also don't want user code taking a dependency on our existing hash function implementations for any type - ideally we could change them every time we build the product. To elaborate on that, let's look at String. String uses a different hash function that looks at each character, XOR'ing in the new character with a (presumably prime) number. We'll change String's hash function in a future version so it both executes faster and produces a better distribution. This will improve lookups in hash tables when using strings as keys. But because we'll change the hash function, it is also important to not depend on one particular version's implementation of GetHashCode. IE, never write the values you get back from GetHashCode to disk and read them back later, or sort values based on their hash function then persist that data to a file or send it over a network. Brian Grunkemeyer MS CLR Base Class Library team
# damien morton said on November 14, 2003 6:12 PM:
A usefull addition to the BCL would be a standardised way of combining hashcodes that is a little more sophisticated than the hashcode-xor strewn through the MSDN documentation. A simple 32-bit linear conguential hash might suffice.
# Sriram Krishnan said on September 6, 2004 8:05 AM:
Chasing the hash code
# Hackward and Foreword said on October 3, 2004 7:28 AM:
New Comments to this post are disabled

Search

This Blog

Syndication

Page view tracker