I'm often asked why particular strings fail the IdnToAscii function.  The answer is "because its not a legal IDN name, that's why" :-)  But why aren't some strings legal IDN names?

Some rules act on the "label" level.  In www.microsoft.com www, microsoft & com are each separate labels.

  • IDN prohibits some characters, such as space characters, control characters, etc.
  • IDN is currently at Unicode 3.2, characters added after that point are prohibited.  Eventually that'll probably be upgraded to Unicode 5 or later, but currently many issues regarding security, etc are prohibiting it.
  • Our APIs also disallow the ASCII control and Backspace characters even if the IDN_USE_STD3_ASCII_RULES flag is not set since control characters don't make sense in domain names.  0 in particular is bad in strings.
  • There are rules regarding mixing bidirectional characters.  Basically if there are any right to left (RTL) characters, then there may not be any left to right (LTR) characters in the same label.  Those characters that aren't specifically RTL or LTR can be mixed with RTL or LTR characters.
  • If the label has RTL characters, the first and last characters must be RTL as well.  Other characters that aren't explicitly RTL or LTR may be in the string, but not in the first or last positions.
  • Punycode labels are longer than their source Unicode strings.  So some Unicode strings will convert to Punycode labels that exceed the allowable DNS label or name length.  Those will therefore be illegal as well.

For more information about IDN, you can look at the RFCs and sources: