Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
There are all kinds of interesting things in the Unicode standard. For example, the block of characters from U+A000 to U+A48F is for representing syllables in the "Yi script". Apparently it is a Chinese language writing system developed during the Tang Dynasty.
A string drawn from this block has an unusual property; the string consists of just two characters, both the same: a repetition of character U+A0A2:
string s = "ꂢꂢ";
Or, if your browser can't hack the Yi script, that's the equivalent of the C# program fragment:
string s = "\uA0A2\uA0A2";
What curious property does this string have?
I'll leave some hints in the comments, and post the answer next time..
UPDATE: A couple people have figured it out, so don't read the comments too far if you don't want to be spoiled. I'll post a follow-up article on Friday.
GetHashCode in general can only express inequality withing a single AppDomain.
Code usually relies on a low number of collisions to get a large performance gain. But any code that relies on `GetHashCode()` being unique for correctness is broken.
Have there been any DOS attacks on IIS by passing in a bunch of these strings of varying lengths? Non-uniform hashes sometimes lead to nasty problems like that if data structures rely on a good distribution of hash values.
Remembered this post while reading the above article.