What curious property does this string have?

What curious property does this string have?

Rate This
  • Comments 33

There are all kinds of interesting things in the Unicode standard. For example, the block of characters from U+A000 to U+A48F is for representing syllables in the "Yi script". Apparently it is a Chinese language writing system developed during the Tang Dynasty.

A string drawn from this block has an unusual property; the string consists of just two characters, both the same: a repetition of character U+A0A2:

string s = "ꂢꂢ";

Or, if your browser can't hack the Yi script, that's the equivalent of the C# program fragment:

string s = "\uA0A2\uA0A2";

What curious property does this string have?

I'll leave some hints in the comments, and post the answer next time..

UPDATE: A couple people have figured it out, so don't read the comments too far if you don't want to be spoiled. I'll post a follow-up article on Friday.

  • Hint #1: The curious property is platform-dependent; you'll want to be using a 32 bit version of CLR v4.

  • Hint #2: The curious property is also a property of a much more commonly-used string.

  • It has the same hash code of String.Empty?

    Wow, you are fast!

    Hint #3: You are on the right track; there is an even more curious property this string has which is related to its hash code being equal to that of String.Empty. -- Eric

  • "s.ToUpper() == s.ToLower()" is true. Though that's not that curious.

    Indeed, I think all strings in Chinese-style languages have this property. - Eric

  • I am not seeing any curious property either.  Although I did notice that it doesn't have a case difference lice [ICR].  

  • Is it something to do with byte order marks?  Something like it matches an empty string in a different encoding with byte order marks?

  • Bingo to Eyal - in a 32-bit process, s.GetHashCode() == string.Empty.GetHashCode() but not on an x64 process. I would guess this is a lesson in depending on hash codes? :-)

  • I'm at work right now and only have access (easily) to a 2008 R2, ergo a 64 bit CLR. I'll give it a look when I get home.

  • Well Eyal is correct, on the x86 v4 CLR, "\uA0A2\uA0A2".GetHashcode() == "".GetHashcode(). Though technically that doesn't meet the criterion of hint number 2. Unless the curious property that "\uA0A2\uA0A2" shares with a much more commonly used string (i.e. string.Empty) is 'having the hashcode 757602046'. But I don't know, I just don't find that property all that curious.

  • As far as I can tell the Hashcode also match using x64.

  • Colisions like this happen in real system everytime when using GetHashCode(). However a lot of comparing and sorting infraestructure of the framework, like LINQ for example, depend on it.

    For me this model of equality is (on some scenarios) broken. I would like to see a much improven hashing algortihm, with less colision probabilty, be implemented in future versiones of dotnet.

  • @Paul Irwin

    Hashcodes are meant to collide! Not a problem depending on them! :)

  • @James Hart

    Maybe the curious property is "sharing a hash code with string.Empty."  Interestingly, Eric only seems to claim that the property is curious on s, but not on the "much more commonly-used string."

  • @iCe

    Hashcodes are not meant to express equality -- a good reason why it would be a broken equality model --, but it surely can express inequality.

  • guessing here:

    it's the shortest (in code points) string whose hashcode matches? or the only 2 codepoint string to match?

    The smallest (if treated as an unsigned number) legal UTF-16 string which shares the hashcode?

    Or is the fact it's a palindrome (and string.Empty is essentially a palindrome in root case) and the only such palindrom to share the hashcode?

Page 1 of 3 (33 items) 123