A question about the Hash algorithm in Hashtable [Kit George]

A question about the Hash algorithm in Hashtable [Kit George]

  • Comments 7
A question was raised regarding the hash algorithm used in Hashtable by Frank Hileman:
"Any chance that the Hashtable will be changed from the xor/mod prime oldskool type to something modern like this: http://www.isthe.com/chongo/tech/comp/fnv/"

Frank, its unlikely. We may certainly explore the opportunity for a new collection which supports an alternate algorithm, but changing the behavior of the existing class is probably not going to happen.

  • Thanks for the response. A different hashtable class could be created by anyone. But the FNV is not a new data structure, as much as a new way of computing hash codes. So to use these more modern types of tables, you need different GetHashCode algorithms, in many cases. I asked because the more GetHashCode functions that are written, the more we are locked into the old type of hash table. And the newer algorithms, such as FNV, are supposed to be much more efficient in distribution (which means faster and smaller tables). Many OS hashtables are much faster thanks to FNV.
  • FNV isnt the only possible string hash function, and, in fact, FNV produces bad bit distributions amongst the higher bits. See "Performance in Practice of String Hashing Functions" by M.V. Ramakrishna and Justin Zobel for a thoughrogh review of string hashing functions. http://citeseer.nj.nec.com/ramakrishna97performance.html See also http://www.burtleburtle.net/bob/hash/evahash.html by Bob jenkins for a thoughrogh review of hash testing approaches.
  • Sorry for accidental posting on question form, I decided to ask here instead but it submitted for some reason, please discard. In MSDN it is stated that a Hashtable can support one writer and multiple readers concurrently. On the other topic it is written that "To support one or more writers, all operations on the Hashtable must be done through the wrapper returned by the Synchronized method.". Which is true? If first, aren't there performance issues for single-threaded application, or how do you achieve one-writer-many-readers without copying data locally? If second, there is a bug in SyncHashtable not locking indexer getter...
  • FYI - we have thought about changing the hash function on String, and we might do that sometime depending on some performance testing. It looks like we already have tweaked it a little bit since V1.1. We do hope that no one depends on the current behavior of GetHashCode, as we'll probably change it. Just ensure that you don't persist hash functions to disk or try to reverse engineer our hash function - you'll be broken later if you do. BTW, we have changed the behavior of Object::GetHashCode and ValueType::GetHashCode for performance reasons - they still obey the contract that they must obey, but if you are somehow counting on an object to give you the same value between different versions of the CLR, you have a problem. (Note that Hashtable implements an OnDeserializationCallback which calls GetHashCode on every key again, so this isn't a problem for serialized Hashtable instances.) The upshot is we reserve the right to reimplement GetHashCode on every object differently in every build of the CLR.
  • damien: yes, I saw those papers. The Zobel one seems a little old, testing perf on a sparc 20? I am not saying FNV is the best, but if you look around, you will see where it has been used and made a big difference. The main point is that more support is needed from the BCL because if people write their own GetHashCode functions they may be optimizing to a particular type of data structure. Also regardless of your favorite hash algorithm/structure the mod prime one is dated.
  • 问题的引出:最近在阅读SSCLI的BCL的代码,解析Hash过程自然是至关重要了:)本文需要的基础1.知道Hashtable的概念和过程2.知道类型系统不知道Hashtable?...

  • 对一些类型产生HashCode的探索,Hope that helps :)

Page 1 of 1 (7 items)