How can I speed up hashtable lookups with struct object as keys?

How can I speed up hashtable lookups with struct object as keys?

Rate This
  • Comments 22

When you have struct objects as the key in a hashtable, the lookup operation of the hashtable performs miserably. This can be attributes to the GetHashCode() function which is used internally to do the lookup.

If a struct contains only simple value types (int, short, etc.), the algorithm which computes the GetHashCode creates hashes which fall into mostly the same bucket.

Example, lets say, the hashtable creates 10 buckets. Then, most probably, all the keys are being put into the same bucket. Hence when a lookup is performed, the .NET runtime has to traverse through this entire bucket to get the value.

BUCKET1 - Value1, Value2, Value3,...,valuen
BUCKET2 - (empty)
BUCKET3 - (empty)
BUCKET4 - (empty)
BUCKET5 - (empty)
BUCKET6 - (empty)
BUCKET7 - (empty)
BUCKET8 - (empty)
BUCKET9 - (empty)
BUCKET10- (empty)

Hence, instead of the lookup operation being O(1), it becomes O(n) on an average case.

To overcome this drawback, consider overriding the GetHashCode() function and making the life easier for the .NET Runtime.

An example would be to create a string by merging all your value types in the struct and joining them by using a special character as a demarcator.

Since your struct is a lookup criteria, it is sure that all the struct values will be different, and hence the string generated is guaranteed unique.

Now the string generated has a method(GetHashCode()), since it is derived from System.Object (like all other objects). Just return the output of this API. A code example will help to understand better.

struct
{
	int a;
	short b;
	public struct(int _a, short _b)
	{
		a = _a;
		b = _b;
	}
	public override int GetHashCode()
	{
		string hash = a.ToString() + ":" b.ToString();
		return hash.GetHashCode();
	}
}

Since you are generating hashcode explicitly which is guaranteed to be unique, it will boost the performance of your lookup.

Reference: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemobjectclassgethashcodetopic.asp

[author: Vipul Patel, C# MVP]

Leave a Comment
  • Please add 3 and 3 and type the answer here:
  • Post
  • Hi Patel,

    Will it actually slow things down as you have to concatenate two strings while have to get the hashcode from string? It seems to be an expensive operation. Would it be better to multiply the value with some prime value like in other sample?

    public override int GetHashCode()
    {
    return a.GetHashCode() * 29 + b.GetHashCode();
    }
  •  
    Check out
    http://blogs.msdn.com/csharpfaq/archive/2006/03/20/556192.aspx 
    for a FAQ on...
  • Hi Nat,

    I agree that concatenate operations on strings are expensive.

    But the lookup on struct if GetHashCode() is not overridden is very time-consuming. I got the idea of an faq about facing a real-life problem.

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemobjectclassgethashcodetopic.asp mentions the above point.
  • After thinking about this for a while I wonder what you're trying to accomplish.

    My first problem is your statement:

    "Since you are generating hashcode explicitly which is guaranteed to be unique, it will boost the performance of your lookup."

    Hashes are not unique.
    http://en.wikipedia.org/wiki/Hash_function
  • Hi mashiharu,

    The purpose of this faq is to enable better lookup for structs. I will get in touch with the csharpfaq team to get the subject title of this post corrected.

    Basically when you are trying to use struct objects as-is in the hashtable as a the key, the lookup is terrible. The purpose of the above code snippet is an example on how to generate hash code explicitly. Here, I have taken example of creating a string by concatenating all the fields of the struct and joining them using a predetermined key ':'.

    Using string is just an example, you can use whatever mechanism you want, as long it guarantees
    a. A hashcode generated for a struct type is unique and same, no matter how many times it is generated.
    b. If your two struct objects are same, then the hash code generated should be the same.

    Hope it clarifies. Please feel free to suggest any changes which will result in a better example. We are all here to learn and share.

    Thanks
    Vipul
  • My point is that using string to do hashing may actually slow your code down rather than making it faster especially in scenario of small data points or when the first field in the struct is well dispersed.

    BTW, your perception on GetHashCode may be a little bit incorrect. GetHashCode on struct (or ValueType in C#) is overridden inside ValueType object. The default implementation is to get hash code from the first field only. Please read this for more detail... http://www.informit.com/content/images/0321245660/items/wagner_item10.pdf

    But in Roter, it seems to use every field to calculate hash code though...
    http://dotnet.di.unipi.it/Content/sscli/docs/doxygen/fx/bcl/valuetype_8cs-source.html
  • Hi mashiharu,

    Asnwering your original question, the string I am generating is bound to be unique as it is being generated from a set of struct objects which are used as a key for the hashtable. And since the string is unique, the GetHashCode on the string will definitely generate a unique hash code.

    Thanks
    Vipul
  • hi Nat,

    <quote>
    especially in scenario of small data points or when the first field in the struct is well dispersed.
    <quote>

    This is explicitly the condition which was not available to me for a real-life situation. My struct was composed of 11 fields(short, int, short, int,....)

    and many(about 15%) of the value types had the same first field, they used to differ on the other fields. With 1,000,000+ records, my performance was dismal.  

    <quote>
    GetHashCode on struct (or ValueType in C#) is overridden inside ValueType object. The default implementation is to get hash code from the first field only<quote>
    In the reference, you have a string as the first member, which is quaranteed to be unique. That may not the case with other structs on other examples.
  • btw, good news or not I don't know... .NET 2.0 seems to change the behavior to be what Rotor does. So you get the same behavior but it will be slower as it uses reflection to do the job.
  • "And since the string is unique, the GetHashCode on the string will definitely generate a unique hash code."

    Wrong.

    Try these examples:

    Console.WriteLine("0:93121".GetHashCode());
    Console.WriteLine("0:2546870".GetHashCode());

  • Or if you are on .net 1.1:

    Console.WriteLine("0:22980214".GetHashCode());
    Console.WriteLine("0:12538469".GetHashCode());
  • mashiharu,

    yes, there is something fishy about ':'.  I beleive it has something to do with the formating specifier.

    I tried concatenating with '%' and '_' and it generated unique codes. Can you please verify, if they are also prone to a loop-hole. After your response, I will update the FAQ article accordingly. We need readers to have the best solution.
  • No, there is no problem with the ":" and it was a clever way to ensure the string are unique.  The hash however is not (see wikipedia link above).

    Example: You have a 2-bit number (0-3) which you want to generate a 1-bit (0-1) unique key for.

    0 -> 0
    1 -> 1
    2 -> ?
    3 -> ?

    Can't be done.  

    You get the same problem when you try to take a 32 + 16 = 48 bit number and turn into a 32-bit unique key.
  • Mashiharu, can you please suggest a good workaround which can be used in the FAQ sample?
Page 1 of 2 (22 items) 12