<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>I'm not a Klingon (&lt;span style="font-family:pIqaD,code2000"&gt; &lt;/span&gt;) : Unicode and Code Pages/Encodings</title><link>http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx</link><description>Tags: Unicode and Code Pages/Encodings</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Alternate encoding names recognized by .Net / IE</title><link>http://blogs.msdn.com/shawnste/archive/2009/08/18/alternate-encoding-names-recognized-by-net-ie.aspx</link><pubDate>Tue, 18 Aug 2009 20:07:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9874285</guid><dc:creator>shawnste</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/9874285.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=9874285</wfw:commentRss><description>&lt;P&gt;If you run the sample from &lt;A href="http://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings.aspx" mce_href="http://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings.aspx"&gt;http://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings.aspx&lt;/A&gt;&amp;nbsp;then you can get a list of what Microsoft .Net thinks each Encoding/Code Page's name is.&amp;nbsp; (WebName is more consistent to what's used in charset). eg:&lt;/P&gt;&lt;CODE&gt;
&lt;P&gt;&lt;SPAN style="COLOR: blue"&gt;using&lt;/SPAN&gt; System;&lt;BR&gt;&lt;SPAN style="COLOR: blue"&gt;using&lt;/SPAN&gt; System.Text;&lt;BR&gt;&lt;BR&gt;&lt;SPAN style="COLOR: blue"&gt;public&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;class&lt;/SPAN&gt; SamplesEncoding&lt;BR&gt;{&lt;BR&gt;&amp;nbsp;&amp;nbsp; &lt;SPAN style="COLOR: blue"&gt;public&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;static&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;void&lt;/SPAN&gt; Main()&lt;BR&gt;&amp;nbsp;&amp;nbsp; {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="COLOR: green"&gt;// For every encoding, get the property values.&lt;/SPAN&gt;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="COLOR: blue"&gt;foreach&lt;/SPAN&gt;( EncodingInfo ei &lt;SPAN style="COLOR: blue"&gt;in&lt;/SPAN&gt; Encoding.GetEncodings() )&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Encoding e = ei.GetEncoding();&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Console.Write( &lt;SPAN style="COLOR: maroon"&gt;&lt;SPAN style="COLOR: maroon"&gt;"{0,-6} {1,-25} "&lt;/SPAN&gt;&lt;/SPAN&gt;, ei.CodePage, ei.Name );&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR&gt;&amp;nbsp;&amp;nbsp; }&lt;BR&gt;}&lt;/P&gt;&lt;/CODE&gt;
&lt;P mce_keep="true"&gt;There are several other names that are recognized by Encoding.GetEncoding() however, similar to what IE would recognize in a charset tag.&amp;nbsp; I'm not sure if there's a way to get at the full list of aliases programatically, but this is what you'd get for these input strings:&amp;nbsp; &lt;/P&gt;
&lt;TABLE&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TH&gt;Label&lt;/TH&gt;
&lt;TH&gt;Code Page&lt;/TH&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"437"&lt;/TD&gt;
&lt;TD&gt;437&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ANSI_X3.4-1968"&lt;/TD&gt;
&lt;TD&gt;20127&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ANSI_X3.4-1986"&lt;/TD&gt;
&lt;TD&gt;20127&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"arabic"&lt;/TD&gt;
&lt;TD&gt;28596&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ascii"&lt;/TD&gt;
&lt;TD&gt;20127&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ASMO-708"&lt;/TD&gt;
&lt;TD&gt;708&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"Big5"&lt;/TD&gt;
&lt;TD&gt;950&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"Big5-HKSCS"&lt;/TD&gt;
&lt;TD&gt;950&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CCSID00858"&lt;/TD&gt;
&lt;TD&gt;858&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CCSID00924"&lt;/TD&gt;
&lt;TD&gt;20924&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CCSID01140"&lt;/TD&gt;
&lt;TD&gt;1140&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CCSID01141"&lt;/TD&gt;
&lt;TD&gt;1141&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CCSID01142"&lt;/TD&gt;
&lt;TD&gt;1142&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CCSID01143"&lt;/TD&gt;
&lt;TD&gt;1143&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CCSID01144"&lt;/TD&gt;
&lt;TD&gt;1144&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CCSID01145"&lt;/TD&gt;
&lt;TD&gt;1145&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CCSID01146"&lt;/TD&gt;
&lt;TD&gt;1146&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CCSID01147"&lt;/TD&gt;
&lt;TD&gt;1147&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CCSID01148"&lt;/TD&gt;
&lt;TD&gt;1148&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CCSID01149"&lt;/TD&gt;
&lt;TD&gt;1149&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"chinese"&lt;/TD&gt;
&lt;TD&gt;936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cn-big5"&lt;/TD&gt;
&lt;TD&gt;950&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CN-GB"&lt;/TD&gt;
&lt;TD&gt;936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP00858"&lt;/TD&gt;
&lt;TD&gt;858&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP00924"&lt;/TD&gt;
&lt;TD&gt;20924&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP01140"&lt;/TD&gt;
&lt;TD&gt;1140&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP01141"&lt;/TD&gt;
&lt;TD&gt;1141&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP01142"&lt;/TD&gt;
&lt;TD&gt;1142&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP01143"&lt;/TD&gt;
&lt;TD&gt;1143&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP01144"&lt;/TD&gt;
&lt;TD&gt;1144&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP01145"&lt;/TD&gt;
&lt;TD&gt;1145&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP01146"&lt;/TD&gt;
&lt;TD&gt;1146&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP01147"&lt;/TD&gt;
&lt;TD&gt;1147&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP01148"&lt;/TD&gt;
&lt;TD&gt;1148&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP01149"&lt;/TD&gt;
&lt;TD&gt;1149&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp037"&lt;/TD&gt;
&lt;TD&gt;37&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp1025"&lt;/TD&gt;
&lt;TD&gt;21025&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP1026"&lt;/TD&gt;
&lt;TD&gt;1026&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp1256"&lt;/TD&gt;
&lt;TD&gt;1256&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP273"&lt;/TD&gt;
&lt;TD&gt;20273&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP278"&lt;/TD&gt;
&lt;TD&gt;20278&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP280"&lt;/TD&gt;
&lt;TD&gt;20280&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP284"&lt;/TD&gt;
&lt;TD&gt;20284&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP285"&lt;/TD&gt;
&lt;TD&gt;20285&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp290"&lt;/TD&gt;
&lt;TD&gt;20290&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp297"&lt;/TD&gt;
&lt;TD&gt;20297&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp367"&lt;/TD&gt;
&lt;TD&gt;20127&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp420"&lt;/TD&gt;
&lt;TD&gt;20420&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp423"&lt;/TD&gt;
&lt;TD&gt;20423&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp424"&lt;/TD&gt;
&lt;TD&gt;20424&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp437"&lt;/TD&gt;
&lt;TD&gt;437&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP500"&lt;/TD&gt;
&lt;TD&gt;500&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp50227"&lt;/TD&gt;
&lt;TD&gt;50227&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp819"&lt;/TD&gt;
&lt;TD&gt;28591&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp850"&lt;/TD&gt;
&lt;TD&gt;850&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp852"&lt;/TD&gt;
&lt;TD&gt;852&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp855"&lt;/TD&gt;
&lt;TD&gt;855&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp857"&lt;/TD&gt;
&lt;TD&gt;857&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp858"&lt;/TD&gt;
&lt;TD&gt;858&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp860"&lt;/TD&gt;
&lt;TD&gt;860&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp861"&lt;/TD&gt;
&lt;TD&gt;861&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp862"&lt;/TD&gt;
&lt;TD&gt;862&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp863"&lt;/TD&gt;
&lt;TD&gt;863&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp864"&lt;/TD&gt;
&lt;TD&gt;864&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp865"&lt;/TD&gt;
&lt;TD&gt;865&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp866"&lt;/TD&gt;
&lt;TD&gt;866&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp869"&lt;/TD&gt;
&lt;TD&gt;869&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP870"&lt;/TD&gt;
&lt;TD&gt;870&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP871"&lt;/TD&gt;
&lt;TD&gt;20871&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp875"&lt;/TD&gt;
&lt;TD&gt;875&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cp880"&lt;/TD&gt;
&lt;TD&gt;20880&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"CP905"&lt;/TD&gt;
&lt;TD&gt;20905&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csASCII"&lt;/TD&gt;
&lt;TD&gt;20127&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csbig5"&lt;/TD&gt;
&lt;TD&gt;950&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csEUCKR"&lt;/TD&gt;
&lt;TD&gt;51949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csEUCPkdFmtJapanese"&lt;/TD&gt;
&lt;TD&gt;51932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csGB2312"&lt;/TD&gt;
&lt;TD&gt;936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csGB231280"&lt;/TD&gt;
&lt;TD&gt;936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM037"&lt;/TD&gt;
&lt;TD&gt;37&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM1026"&lt;/TD&gt;
&lt;TD&gt;1026&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM273"&lt;/TD&gt;
&lt;TD&gt;20273&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM277"&lt;/TD&gt;
&lt;TD&gt;20277&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM278"&lt;/TD&gt;
&lt;TD&gt;20278&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM280"&lt;/TD&gt;
&lt;TD&gt;20280&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM284"&lt;/TD&gt;
&lt;TD&gt;20284&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM285"&lt;/TD&gt;
&lt;TD&gt;20285&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM290"&lt;/TD&gt;
&lt;TD&gt;20290&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM297"&lt;/TD&gt;
&lt;TD&gt;20297&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM420"&lt;/TD&gt;
&lt;TD&gt;20420&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM423"&lt;/TD&gt;
&lt;TD&gt;20423&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM424"&lt;/TD&gt;
&lt;TD&gt;20424&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM500"&lt;/TD&gt;
&lt;TD&gt;500&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM870"&lt;/TD&gt;
&lt;TD&gt;870&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM871"&lt;/TD&gt;
&lt;TD&gt;20871&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM880"&lt;/TD&gt;
&lt;TD&gt;20880&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBM905"&lt;/TD&gt;
&lt;TD&gt;20905&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csIBMThai"&lt;/TD&gt;
&lt;TD&gt;20838&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csISO2022JP"&lt;/TD&gt;
&lt;TD&gt;50221&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csISO2022KR"&lt;/TD&gt;
&lt;TD&gt;50225&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csISO58GB231280"&lt;/TD&gt;
&lt;TD&gt;936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csISOLatin1"&lt;/TD&gt;
&lt;TD&gt;28591&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csISOLatin2"&lt;/TD&gt;
&lt;TD&gt;28592&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csISOLatin3"&lt;/TD&gt;
&lt;TD&gt;28593&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csISOLatin4"&lt;/TD&gt;
&lt;TD&gt;28594&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csISOLatin5"&lt;/TD&gt;
&lt;TD&gt;28599&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csISOLatin9"&lt;/TD&gt;
&lt;TD&gt;28605&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csISOLatinArabic"&lt;/TD&gt;
&lt;TD&gt;28596&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csISOLatinCyrillic"&lt;/TD&gt;
&lt;TD&gt;28595&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csISOLatinGreek"&lt;/TD&gt;
&lt;TD&gt;28597&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csISOLatinHebrew"&lt;/TD&gt;
&lt;TD&gt;28598&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csKOI8R"&lt;/TD&gt;
&lt;TD&gt;20866&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csKSC56011987"&lt;/TD&gt;
&lt;TD&gt;949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csPC8CodePage437"&lt;/TD&gt;
&lt;TD&gt;437&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csShiftJIS"&lt;/TD&gt;
&lt;TD&gt;932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csUnicode11UTF7"&lt;/TD&gt;
&lt;TD&gt;65000&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"csWindows31J"&lt;/TD&gt;
&lt;TD&gt;932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"cyrillic"&lt;/TD&gt;
&lt;TD&gt;28595&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"DIN_66003"&lt;/TD&gt;
&lt;TD&gt;20106&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"DOS-720"&lt;/TD&gt;
&lt;TD&gt;720&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"DOS-862"&lt;/TD&gt;
&lt;TD&gt;862&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"DOS-874"&lt;/TD&gt;
&lt;TD&gt;874&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-ar1"&lt;/TD&gt;
&lt;TD&gt;20420&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-be"&lt;/TD&gt;
&lt;TD&gt;500&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-ca"&lt;/TD&gt;
&lt;TD&gt;37&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-ch"&lt;/TD&gt;
&lt;TD&gt;500&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"EBCDIC-CP-DK"&lt;/TD&gt;
&lt;TD&gt;20277&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-es"&lt;/TD&gt;
&lt;TD&gt;20284&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-fi"&lt;/TD&gt;
&lt;TD&gt;20278&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-fr"&lt;/TD&gt;
&lt;TD&gt;20297&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-gb"&lt;/TD&gt;
&lt;TD&gt;20285&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-gr"&lt;/TD&gt;
&lt;TD&gt;20423&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-he"&lt;/TD&gt;
&lt;TD&gt;20424&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-is"&lt;/TD&gt;
&lt;TD&gt;20871&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-it"&lt;/TD&gt;
&lt;TD&gt;20280&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-nl"&lt;/TD&gt;
&lt;TD&gt;37&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"EBCDIC-CP-NO"&lt;/TD&gt;
&lt;TD&gt;20277&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-roece"&lt;/TD&gt;
&lt;TD&gt;870&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-se"&lt;/TD&gt;
&lt;TD&gt;20278&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-tr"&lt;/TD&gt;
&lt;TD&gt;20905&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-us"&lt;/TD&gt;
&lt;TD&gt;37&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-wt"&lt;/TD&gt;
&lt;TD&gt;37&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-cp-yu"&lt;/TD&gt;
&lt;TD&gt;870&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"EBCDIC-Cyrillic"&lt;/TD&gt;
&lt;TD&gt;20880&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-de-273+euro"&lt;/TD&gt;
&lt;TD&gt;1141&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-dk-277+euro"&lt;/TD&gt;
&lt;TD&gt;1142&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-es-284+euro"&lt;/TD&gt;
&lt;TD&gt;1145&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-fi-278+euro"&lt;/TD&gt;
&lt;TD&gt;1143&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-fr-297+euro"&lt;/TD&gt;
&lt;TD&gt;1147&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-gb-285+euro"&lt;/TD&gt;
&lt;TD&gt;1146&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-international-500+euro"&lt;/TD&gt;
&lt;TD&gt;1148&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-is-871+euro"&lt;/TD&gt;
&lt;TD&gt;1149&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-it-280+euro"&lt;/TD&gt;
&lt;TD&gt;1144&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"EBCDIC-JP-kana"&lt;/TD&gt;
&lt;TD&gt;20290&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-Latin9--euro"&lt;/TD&gt;
&lt;TD&gt;20924&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-no-277+euro"&lt;/TD&gt;
&lt;TD&gt;1142&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-se-278+euro"&lt;/TD&gt;
&lt;TD&gt;1143&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ebcdic-us-37+euro"&lt;/TD&gt;
&lt;TD&gt;1140&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ECMA-114"&lt;/TD&gt;
&lt;TD&gt;28596&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ECMA-118"&lt;/TD&gt;
&lt;TD&gt;28597&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ELOT_928"&lt;/TD&gt;
&lt;TD&gt;28597&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"euc-cn"&lt;/TD&gt;
&lt;TD&gt;51936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"euc-jp"&lt;/TD&gt;
&lt;TD&gt;51932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"euc-kr"&lt;/TD&gt;
&lt;TD&gt;51949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"Extended_UNIX_Code_Packed_Format_for_Japanese"&lt;/TD&gt;
&lt;TD&gt;51932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"GB18030"&lt;/TD&gt;
&lt;TD&gt;54936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"GB2312"&lt;/TD&gt;
&lt;TD&gt;936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"GB2312-80"&lt;/TD&gt;
&lt;TD&gt;936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"GB231280"&lt;/TD&gt;
&lt;TD&gt;936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"GBK"&lt;/TD&gt;
&lt;TD&gt;936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"GB_2312-80"&lt;/TD&gt;
&lt;TD&gt;936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"German"&lt;/TD&gt;
&lt;TD&gt;20106&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"greek"&lt;/TD&gt;
&lt;TD&gt;28597&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"greek8"&lt;/TD&gt;
&lt;TD&gt;28597&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"hebrew"&lt;/TD&gt;
&lt;TD&gt;28598&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"hz-gb-2312"&lt;/TD&gt;
&lt;TD&gt;52936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM-Thai"&lt;/TD&gt;
&lt;TD&gt;20838&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM00858"&lt;/TD&gt;
&lt;TD&gt;858&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM00924"&lt;/TD&gt;
&lt;TD&gt;20924&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM01047"&lt;/TD&gt;
&lt;TD&gt;1047&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM01140"&lt;/TD&gt;
&lt;TD&gt;1140&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM01141"&lt;/TD&gt;
&lt;TD&gt;1141&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM01142"&lt;/TD&gt;
&lt;TD&gt;1142&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM01143"&lt;/TD&gt;
&lt;TD&gt;1143&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM01144"&lt;/TD&gt;
&lt;TD&gt;1144&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM01145"&lt;/TD&gt;
&lt;TD&gt;1145&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM01146"&lt;/TD&gt;
&lt;TD&gt;1146&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM01147"&lt;/TD&gt;
&lt;TD&gt;1147&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM01148"&lt;/TD&gt;
&lt;TD&gt;1148&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM01149"&lt;/TD&gt;
&lt;TD&gt;1149&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM037"&lt;/TD&gt;
&lt;TD&gt;37&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM1026"&lt;/TD&gt;
&lt;TD&gt;1026&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM273"&lt;/TD&gt;
&lt;TD&gt;20273&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM277"&lt;/TD&gt;
&lt;TD&gt;20277&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM278"&lt;/TD&gt;
&lt;TD&gt;20278&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM280"&lt;/TD&gt;
&lt;TD&gt;20280&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM284"&lt;/TD&gt;
&lt;TD&gt;20284&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM285"&lt;/TD&gt;
&lt;TD&gt;20285&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM290"&lt;/TD&gt;
&lt;TD&gt;20290&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM297"&lt;/TD&gt;
&lt;TD&gt;20297&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM367"&lt;/TD&gt;
&lt;TD&gt;20127&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM420"&lt;/TD&gt;
&lt;TD&gt;20420&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM423"&lt;/TD&gt;
&lt;TD&gt;20423&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM424"&lt;/TD&gt;
&lt;TD&gt;20424&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM437"&lt;/TD&gt;
&lt;TD&gt;437&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM500"&lt;/TD&gt;
&lt;TD&gt;500&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ibm737"&lt;/TD&gt;
&lt;TD&gt;737&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ibm775"&lt;/TD&gt;
&lt;TD&gt;775&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ibm819"&lt;/TD&gt;
&lt;TD&gt;28591&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM850"&lt;/TD&gt;
&lt;TD&gt;850&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM852"&lt;/TD&gt;
&lt;TD&gt;852&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM855"&lt;/TD&gt;
&lt;TD&gt;855&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM857"&lt;/TD&gt;
&lt;TD&gt;857&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM860"&lt;/TD&gt;
&lt;TD&gt;860&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM861"&lt;/TD&gt;
&lt;TD&gt;861&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM862"&lt;/TD&gt;
&lt;TD&gt;862&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM863"&lt;/TD&gt;
&lt;TD&gt;863&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM864"&lt;/TD&gt;
&lt;TD&gt;864&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM865"&lt;/TD&gt;
&lt;TD&gt;865&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM866"&lt;/TD&gt;
&lt;TD&gt;866&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM869"&lt;/TD&gt;
&lt;TD&gt;869&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM870"&lt;/TD&gt;
&lt;TD&gt;870&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM871"&lt;/TD&gt;
&lt;TD&gt;20871&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM880"&lt;/TD&gt;
&lt;TD&gt;20880&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"IBM905"&lt;/TD&gt;
&lt;TD&gt;20905&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"irv"&lt;/TD&gt;
&lt;TD&gt;20105&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO-10646-UCS-2"&lt;/TD&gt;
&lt;TD&gt;1200&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-2022-jp"&lt;/TD&gt;
&lt;TD&gt;50220&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-2022-jpeuc"&lt;/TD&gt;
&lt;TD&gt;51932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-2022-kr"&lt;/TD&gt;
&lt;TD&gt;50225&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-2022-kr-7"&lt;/TD&gt;
&lt;TD&gt;50225&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-2022-kr-7bit"&lt;/TD&gt;
&lt;TD&gt;50225&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-2022-kr-8"&lt;/TD&gt;
&lt;TD&gt;51949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-2022-kr-8bit"&lt;/TD&gt;
&lt;TD&gt;51949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-8859-1"&lt;/TD&gt;
&lt;TD&gt;28591&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-8859-11"&lt;/TD&gt;
&lt;TD&gt;874&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-8859-13"&lt;/TD&gt;
&lt;TD&gt;28603&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-8859-15"&lt;/TD&gt;
&lt;TD&gt;28605&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-8859-2"&lt;/TD&gt;
&lt;TD&gt;28592&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-8859-3"&lt;/TD&gt;
&lt;TD&gt;28593&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-8859-4"&lt;/TD&gt;
&lt;TD&gt;28594&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-8859-5"&lt;/TD&gt;
&lt;TD&gt;28595&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-8859-6"&lt;/TD&gt;
&lt;TD&gt;28596&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-8859-7"&lt;/TD&gt;
&lt;TD&gt;28597&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-8859-8"&lt;/TD&gt;
&lt;TD&gt;28598&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO-8859-8 Visual"&lt;/TD&gt;
&lt;TD&gt;28598&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-8859-8-i"&lt;/TD&gt;
&lt;TD&gt;38598&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-8859-9"&lt;/TD&gt;
&lt;TD&gt;28599&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-ir-100"&lt;/TD&gt;
&lt;TD&gt;28591&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-ir-101"&lt;/TD&gt;
&lt;TD&gt;28592&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-ir-109"&lt;/TD&gt;
&lt;TD&gt;28593&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-ir-110"&lt;/TD&gt;
&lt;TD&gt;28594&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-ir-126"&lt;/TD&gt;
&lt;TD&gt;28597&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-ir-127"&lt;/TD&gt;
&lt;TD&gt;28596&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-ir-138"&lt;/TD&gt;
&lt;TD&gt;28598&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-ir-144"&lt;/TD&gt;
&lt;TD&gt;28595&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-ir-148"&lt;/TD&gt;
&lt;TD&gt;28599&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-ir-149"&lt;/TD&gt;
&lt;TD&gt;949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-ir-58"&lt;/TD&gt;
&lt;TD&gt;936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso-ir-6"&lt;/TD&gt;
&lt;TD&gt;20127&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO646-US"&lt;/TD&gt;
&lt;TD&gt;20127&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso8859-1"&lt;/TD&gt;
&lt;TD&gt;28591&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso8859-2"&lt;/TD&gt;
&lt;TD&gt;28592&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_646.irv:1991"&lt;/TD&gt;
&lt;TD&gt;20127&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso_8859-1"&lt;/TD&gt;
&lt;TD&gt;28591&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-15"&lt;/TD&gt;
&lt;TD&gt;28605&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso_8859-1:1987"&lt;/TD&gt;
&lt;TD&gt;28591&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso_8859-2"&lt;/TD&gt;
&lt;TD&gt;28592&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"iso_8859-2:1987"&lt;/TD&gt;
&lt;TD&gt;28592&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-3"&lt;/TD&gt;
&lt;TD&gt;28593&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-3:1988"&lt;/TD&gt;
&lt;TD&gt;28593&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-4"&lt;/TD&gt;
&lt;TD&gt;28594&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-4:1988"&lt;/TD&gt;
&lt;TD&gt;28594&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-5"&lt;/TD&gt;
&lt;TD&gt;28595&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-5:1988"&lt;/TD&gt;
&lt;TD&gt;28595&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-6"&lt;/TD&gt;
&lt;TD&gt;28596&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-6:1987"&lt;/TD&gt;
&lt;TD&gt;28596&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-7"&lt;/TD&gt;
&lt;TD&gt;28597&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-7:1987"&lt;/TD&gt;
&lt;TD&gt;28597&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-8"&lt;/TD&gt;
&lt;TD&gt;28598&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-8:1988"&lt;/TD&gt;
&lt;TD&gt;28598&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-9"&lt;/TD&gt;
&lt;TD&gt;28599&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ISO_8859-9:1989"&lt;/TD&gt;
&lt;TD&gt;28599&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"Johab"&lt;/TD&gt;
&lt;TD&gt;1361&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"koi"&lt;/TD&gt;
&lt;TD&gt;20866&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"koi8"&lt;/TD&gt;
&lt;TD&gt;20866&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"koi8-r"&lt;/TD&gt;
&lt;TD&gt;20866&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"koi8-ru"&lt;/TD&gt;
&lt;TD&gt;21866&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"koi8-u"&lt;/TD&gt;
&lt;TD&gt;21866&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"koi8r"&lt;/TD&gt;
&lt;TD&gt;20866&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"korean"&lt;/TD&gt;
&lt;TD&gt;949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ks-c-5601"&lt;/TD&gt;
&lt;TD&gt;949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ks-c5601"&lt;/TD&gt;
&lt;TD&gt;949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"KSC5601"&lt;/TD&gt;
&lt;TD&gt;949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"KSC_5601"&lt;/TD&gt;
&lt;TD&gt;949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ks_c_5601"&lt;/TD&gt;
&lt;TD&gt;949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ks_c_5601-1987"&lt;/TD&gt;
&lt;TD&gt;949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ks_c_5601-1989"&lt;/TD&gt;
&lt;TD&gt;949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ks_c_5601_1987"&lt;/TD&gt;
&lt;TD&gt;949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"l1"&lt;/TD&gt;
&lt;TD&gt;28591&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"l2"&lt;/TD&gt;
&lt;TD&gt;28592&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"l3"&lt;/TD&gt;
&lt;TD&gt;28593&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"l4"&lt;/TD&gt;
&lt;TD&gt;28594&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"l5"&lt;/TD&gt;
&lt;TD&gt;28599&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"l9"&lt;/TD&gt;
&lt;TD&gt;28605&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"latin1"&lt;/TD&gt;
&lt;TD&gt;28591&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"latin2"&lt;/TD&gt;
&lt;TD&gt;28592&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"latin3"&lt;/TD&gt;
&lt;TD&gt;28593&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"latin4"&lt;/TD&gt;
&lt;TD&gt;28594&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"latin5"&lt;/TD&gt;
&lt;TD&gt;28599&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"latin9"&lt;/TD&gt;
&lt;TD&gt;28605&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"logical"&lt;/TD&gt;
&lt;TD&gt;28598&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"macintosh"&lt;/TD&gt;
&lt;TD&gt;10000&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ms_Kanji"&lt;/TD&gt;
&lt;TD&gt;932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"Norwegian"&lt;/TD&gt;
&lt;TD&gt;20108&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"NS_4551-1"&lt;/TD&gt;
&lt;TD&gt;20108&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"PC-Multilingual-850+euro"&lt;/TD&gt;
&lt;TD&gt;858&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"SEN_850200_B"&lt;/TD&gt;
&lt;TD&gt;20107&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"shift-jis"&lt;/TD&gt;
&lt;TD&gt;932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"shift_jis"&lt;/TD&gt;
&lt;TD&gt;932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"sjis"&lt;/TD&gt;
&lt;TD&gt;932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"Swedish"&lt;/TD&gt;
&lt;TD&gt;20107&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"TIS-620"&lt;/TD&gt;
&lt;TD&gt;874&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"ucs-2"&lt;/TD&gt;
&lt;TD&gt;1200&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"unicode"&lt;/TD&gt;
&lt;TD&gt;1200&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"unicode-1-1-utf-7"&lt;/TD&gt;
&lt;TD&gt;65000&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"unicode-1-1-utf-8"&lt;/TD&gt;
&lt;TD&gt;65001&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"unicode-2-0-utf-7"&lt;/TD&gt;
&lt;TD&gt;65000&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"unicode-2-0-utf-8"&lt;/TD&gt;
&lt;TD&gt;65001&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"unicodeFFFE"&lt;/TD&gt;
&lt;TD&gt;1201&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"us"&lt;/TD&gt;
&lt;TD&gt;20127&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"us-ascii"&lt;/TD&gt;
&lt;TD&gt;20127&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"utf-16"&lt;/TD&gt;
&lt;TD&gt;1200&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"UTF-16BE"&lt;/TD&gt;
&lt;TD&gt;1201&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"UTF-16LE"&lt;/TD&gt;
&lt;TD&gt;1200&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"utf-32"&lt;/TD&gt;
&lt;TD&gt;12000&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"UTF-32BE"&lt;/TD&gt;
&lt;TD&gt;12001&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"UTF-32LE"&lt;/TD&gt;
&lt;TD&gt;12000&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"utf-7"&lt;/TD&gt;
&lt;TD&gt;65000&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"utf-8"&lt;/TD&gt;
&lt;TD&gt;65001&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"visual"&lt;/TD&gt;
&lt;TD&gt;28598&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"windows-1250"&lt;/TD&gt;
&lt;TD&gt;1250&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"windows-1251"&lt;/TD&gt;
&lt;TD&gt;1251&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"windows-1252"&lt;/TD&gt;
&lt;TD&gt;1252&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"windows-1253"&lt;/TD&gt;
&lt;TD&gt;1253&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"Windows-1254"&lt;/TD&gt;
&lt;TD&gt;1254&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"windows-1255"&lt;/TD&gt;
&lt;TD&gt;1255&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"windows-1256"&lt;/TD&gt;
&lt;TD&gt;1256&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"windows-1257"&lt;/TD&gt;
&lt;TD&gt;1257&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"windows-1258"&lt;/TD&gt;
&lt;TD&gt;1258&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"windows-874"&lt;/TD&gt;
&lt;TD&gt;874&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-ansi"&lt;/TD&gt;
&lt;TD&gt;1252&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-Chinese-CNS"&lt;/TD&gt;
&lt;TD&gt;20000&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-Chinese-Eten"&lt;/TD&gt;
&lt;TD&gt;20002&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-cp1250"&lt;/TD&gt;
&lt;TD&gt;1250&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-cp1251"&lt;/TD&gt;
&lt;TD&gt;1251&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-cp20001"&lt;/TD&gt;
&lt;TD&gt;20001&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-cp20003"&lt;/TD&gt;
&lt;TD&gt;20003&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-cp20004"&lt;/TD&gt;
&lt;TD&gt;20004&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-cp20005"&lt;/TD&gt;
&lt;TD&gt;20005&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-cp20261"&lt;/TD&gt;
&lt;TD&gt;20261&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-cp20269"&lt;/TD&gt;
&lt;TD&gt;20269&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-cp20936"&lt;/TD&gt;
&lt;TD&gt;20936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-cp20949"&lt;/TD&gt;
&lt;TD&gt;20949&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-cp50227"&lt;/TD&gt;
&lt;TD&gt;50227&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"X-EBCDIC-KoreanExtended"&lt;/TD&gt;
&lt;TD&gt;20833&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-euc"&lt;/TD&gt;
&lt;TD&gt;51932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-euc-cn"&lt;/TD&gt;
&lt;TD&gt;51936&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-euc-jp"&lt;/TD&gt;
&lt;TD&gt;51932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-Europa"&lt;/TD&gt;
&lt;TD&gt;29001&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-IA5"&lt;/TD&gt;
&lt;TD&gt;20105&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-IA5-German"&lt;/TD&gt;
&lt;TD&gt;20106&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-IA5-Norwegian"&lt;/TD&gt;
&lt;TD&gt;20108&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-IA5-Swedish"&lt;/TD&gt;
&lt;TD&gt;20107&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-iscii-as"&lt;/TD&gt;
&lt;TD&gt;57006&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-iscii-be"&lt;/TD&gt;
&lt;TD&gt;57003&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-iscii-de"&lt;/TD&gt;
&lt;TD&gt;57002&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-iscii-gu"&lt;/TD&gt;
&lt;TD&gt;57010&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-iscii-ka"&lt;/TD&gt;
&lt;TD&gt;57008&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-iscii-ma"&lt;/TD&gt;
&lt;TD&gt;57009&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-iscii-or"&lt;/TD&gt;
&lt;TD&gt;57007&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-iscii-pa"&lt;/TD&gt;
&lt;TD&gt;57011&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-iscii-ta"&lt;/TD&gt;
&lt;TD&gt;57004&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-iscii-te"&lt;/TD&gt;
&lt;TD&gt;57005&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-arabic"&lt;/TD&gt;
&lt;TD&gt;10004&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-ce"&lt;/TD&gt;
&lt;TD&gt;10029&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-chinesesimp"&lt;/TD&gt;
&lt;TD&gt;10008&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-chinesetrad"&lt;/TD&gt;
&lt;TD&gt;10002&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-croatian"&lt;/TD&gt;
&lt;TD&gt;10082&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-cyrillic"&lt;/TD&gt;
&lt;TD&gt;10007&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-greek"&lt;/TD&gt;
&lt;TD&gt;10006&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-hebrew"&lt;/TD&gt;
&lt;TD&gt;10005&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-icelandic"&lt;/TD&gt;
&lt;TD&gt;10079&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-japanese"&lt;/TD&gt;
&lt;TD&gt;10001&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-korean"&lt;/TD&gt;
&lt;TD&gt;10003&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-romanian"&lt;/TD&gt;
&lt;TD&gt;10010&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-thai"&lt;/TD&gt;
&lt;TD&gt;10021&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-turkish"&lt;/TD&gt;
&lt;TD&gt;10081&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-mac-ukrainian"&lt;/TD&gt;
&lt;TD&gt;10017&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-ms-cp932"&lt;/TD&gt;
&lt;TD&gt;932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-sjis"&lt;/TD&gt;
&lt;TD&gt;932&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-unicode-1-1-utf-7"&lt;/TD&gt;
&lt;TD&gt;65000&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-unicode-1-1-utf-8"&lt;/TD&gt;
&lt;TD&gt;65001&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-unicode-2-0-utf-7"&lt;/TD&gt;
&lt;TD&gt;65000&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-unicode-2-0-utf-8"&lt;/TD&gt;
&lt;TD&gt;65001&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;"x-x-big5"&lt;/TD&gt;
&lt;TD&gt;950&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;
&lt;P mce_keep="true"&gt;There's one really egregious name here.&amp;nbsp; UnicodeFFFE is actually Big Endian UTF-16.&amp;nbsp; It's like the byte order mark (BOM) for UTF-16BE written in little endian order.&amp;nbsp; Try to use UTF-16BE instead :)&lt;/P&gt;
&lt;P mce_keep="true"&gt;Note that historically lots of data on the web has been mis-tagged, or isn't tagged at all.&amp;nbsp; For data from windows machines that data is often in the windows system code page, such as windows-1252.&amp;nbsp; So sometimes browsers may attempt to use the current system code page, or try to guess (with varying degrees of success), the actual code page.&amp;nbsp; Additionally there are differences between different vendor's code page behavior causing further ambiguity.&lt;/P&gt;
&lt;P mce_keep="true"&gt;See also:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;&lt;A href="http://blogs.msdn.com/shawnste/pages/code-pages-unicode-encodings.aspx"&gt;Code Pages, Unicode &amp;amp; Encodings&lt;/A&gt;&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;&lt;A href="http://blogs.msdn.com/controlpanel/blogs/The%20History%20of%20Code%20Pages" mce_href="http://blogs.msdn.com/controlpanel/blogs/The History of Code Pages"&gt;The History of Code Pages.aspx&lt;/A&gt;&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;&lt;A href="http://blogs.msdn.com/shawnste/archive/2007/03/20/some-reasons-to-make-your-application-unicode.aspx"&gt;Some Reasons to Make Your Application Unicode.aspx&lt;/A&gt;&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9874285" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category></item><item><title>Unicode, IDN (IDNA), EAI (IMA) and Homograph Security</title><link>http://blogs.msdn.com/shawnste/archive/2009/07/07/unicode-idn-idna-eai-ima-and-homograph-security.aspx</link><pubDate>Tue, 07 Jul 2009 22:33:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9823241</guid><dc:creator>shawnste</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/9823241.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=9823241</wfw:commentRss><description>&lt;P&gt;I wrote about IDN &amp;amp; Security before &lt;A href="http://blogs.msdn.com/shawnste/archive/2005/03/03/384692.aspx"&gt;http://blogs.msdn.com/shawnste/archive/2005/03/03/384692.aspx&lt;/A&gt; but thought I'd share some of my more&amp;nbsp;updated views about security of URLs/IDN/Unicode/Email addresses.&lt;/P&gt;
&lt;P&gt;People haven't really bothered much with DNS&amp;nbsp;or character based&amp;nbsp;security when it was limited to ASCII.&amp;nbsp; I'm not sure if this because&amp;nbsp;people just&amp;nbsp;didn't think about it, or if they thought there wasn't a problem or whatever.&amp;nbsp; What security attacks happen have been regarded more as "oh, that's curious" rather than a real concern.&amp;nbsp; Basically there seems to be a presumption that a script, like&amp;nbsp;the ASCII subset of Latin,&amp;nbsp;are inherintly secure.&amp;nbsp; Therefore it would seem reasonable that if ASCII Latin can be secure, then other scripts, or mixed script environments have homographs, then those scenarios must be insecure and are therefore broken.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Latin and ASCII aren't Secure&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The problem with that logic is that it's flawed.&amp;nbsp; Homographs exist in Latin/ASCII, however &lt;A href="http://rnicrosoft.com/"&gt;http://rnicrosoft.com&lt;/A&gt; tends to be regarded as "quaint and amusing" rather than a security problem.&amp;nbsp; (There used to be a web page there, dunno what happened).&amp;nbsp; Similarly g00gle or MlCROSOFT or whatnot can all happen in ASCII.&amp;nbsp; Some things can be done to ASCII to limit the risk, such as choosing fonts or making things lowercase, but that's not always possible.&amp;nbsp; &lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Strings are Typed and Read by Humans&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Even if the scripts themselves are perfect, the strings we use with the scripts are not.&amp;nbsp; For example, users have to type them in, and they may or may not use upper or lower case (in cased scripts).&amp;nbsp; I heard one computer expert indicate that users should just figure out how to enter URLs in lower case, in Unicode Normalization Form C.&amp;nbsp; (Instead of addressing the problem we should educate all the users).&amp;nbsp; I wish he were joking.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Depending on the context, there are things you can do to ASCII only strings that can confuse users.&amp;nbsp; For example &lt;A href="http://microsoft.secure.com/"&gt;http://microsoft.secure.com&lt;/A&gt; isn't going to necessarily go to a Microsoft site.&amp;nbsp; &lt;A href="http://secure.com/microsoft.com"&gt;http://secure.com/microsoft.com&lt;/A&gt; is a similar trick.&lt;/P&gt;
&lt;P mce_keep="true"&gt;DNS isn't the only subject of these problems.&amp;nbsp; I get mail all the time in the form &lt;A href="mailto:company@mail-servicing.com"&gt;company@mail-servicing.com&lt;/A&gt; where "company" is a legitimate company and "mail-servicing" is the people they've contracted to send their bulk mail.&amp;nbsp; So it's impossible for me to determine if that's actually a good address for the company.&amp;nbsp; Even worse is when the mail contains a link.&amp;nbsp; "Provide feedback about your recent warrenty support to&amp;nbsp;&lt;A href="http://feedback-surveys.com/OEMsupport"&gt;http://feedback-surveys.com/OEMsupport&lt;/A&gt;"&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;Strings aren't Even Strings&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Sometimes what we click on isn't even related to where we end up going.&amp;nbsp; We've all seen phishing attacks that are look like &lt;A href="http://207.46.232.182/" mce_href="http://207.46.232.182"&gt;mybank.com&lt;/A&gt; but go to an IP address that no one can tell if it's real or not.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;Strings aren't Always Specific&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;In some environments strings often aren't even very specific.&amp;nbsp; I'm pretty certain that if I want a live.com account that I won't get shawn or shawns or even shawnsteele.&amp;nbsp; Instead I'll be shawn7935 or something.&amp;nbsp; There's another Shawn here at work that gets some of my mail from simple typos, let alone malicious intent.&amp;nbsp; There's a pretty good chance that&amp;nbsp;Fred8374&amp;nbsp;could pass himself off as Fred8347 if he really wanted to.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P mce_keep="true"&gt;We've even&amp;nbsp;been trained that strings&amp;nbsp;don't even have to be close.&amp;nbsp;&amp;nbsp;If I buy something on eBay from "JoesBestStuff", it takes some faith for me to pay SallySewing7@live.com (apologies if those are real accounts).&amp;nbsp; I've been quite amused at the varation betwee "seller's name" and the email sometimes.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Even when we expect them to be the same, there are many spellings for some words.&amp;nbsp; "Mohammed" is often transliterated differently to Latin.&amp;nbsp; Unless you deal with one quite often, you're likely to assume most spellings are the same.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;Globalization of Strings&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Now we've figured out that strings aren't secure, and we'll get tricked even if they were secure.&amp;nbsp; How does that change in a global environment, such as with IDNA or EAI/IMA strings?&amp;nbsp; Not much.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Sticking to Latin, you suddenly gain a bunch of look-alikes (homographs) by allowing non-ASCII values.&amp;nbsp; Strings like mícrosoft, mïcrosoft and mıcrosoft are all “close enough” to be convused, particularly at a quick glance, even more so if the user is conditioned to expect the "real" string.&amp;nbsp; E.g:&amp;nbsp; "Important security update for windows, go download it from Mícrosoft.com"&amp;nbsp; We're already expecting to see microsoft, so the few different pixels are easily missed.&lt;/P&gt;
&lt;P mce_keep="true"&gt;For other scripts the problem can be much more severe.&amp;nbsp; Complex scripts can have simliar appearing strings, and many include numerous characters.&amp;nbsp; Chinese for example has enough characters available that it can be fairly easy in some cases to find a rare character that is similar in appearance to a common character which people have been preconditioned to expect.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;"I Solved Homographs"&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;This leads to a&amp;nbsp;typical problem for developers, particularly "Western" Latin-script based developers.&amp;nbsp; We tend to expect that if we solve script mixing so that we can't mix up Cyrillic and Latin, that we've solved the homograph problem.&amp;nbsp; Instead, we've barely scratched the surface and effectively buried our heads in the sand.&lt;/P&gt;
&lt;P mce_keep="true"&gt;In some cases the "solution" can be worse than the problem.&amp;nbsp; For example, some browsers decide that I don't understand Cyrillic since my user locale is en-US (or Klingon), and then prints out punycode.&amp;nbsp; That's mildly useful to me as a warning, however it does the same thing for Chinese.&amp;nbsp; It's very unlikely that I'm going to confuse Chinese with Latin, but I'll get Punycode in the address bar anyawy.&amp;nbsp; Now I have no chance of finding out what the actual URL is supposed to look like.&amp;nbsp; Punycode is all gibberish, but I could probably decipher a Chinese glyph enough to see if it looked similar to what I expected.&amp;nbsp; With any punicode strings, I don't even need homographs to confuse me, any Chinese would look the same.&amp;nbsp; For that matter I could be expecting Chinese, but it could actually be Japanese or Korean, or Cyrillic for that matter.&amp;nbsp; I'm not trying to say that the browsers' approach is "wrong", just that&amp;nbsp;while this approach&amp;nbsp;may address some problems,&amp;nbsp;it can also cause new ones.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Most of the "solutions" to Homographs that I've seen are similar in my opinion.&amp;nbsp; They may address a specific issue, but don't solve the entire problem globally.&amp;nbsp; I also think some approaches are unnecessarily limiting.&amp;nbsp; Mitigations that reduce the surface area for an attack are useful, however developers should recognize the limitations of those approaches and make sure they aren't spending tons of effort "shutting the window, but leaving the front door wide open."&amp;nbsp; That only provides a false sense of security, which can be far worse than the original problem.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;Comprehensive Solutions&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;So instead of thinking that strings like URLs are inherintly secure somehow if they're ASCII, and focusing on the differences from ASCII, like Cyrillic homographs, we should rather assume that ANY URL might not take us to a place we want to go.&amp;nbsp; Even an ASCII one.&lt;/P&gt;
&lt;P mce_keep="true"&gt;A much better solution to URL security is one that addresses the entire system rather than focusing on Homographs.&amp;nbsp; IE, for example, detects malicious web sites (I don't know exactly how it works, but I gather there's blacklisting and bad&amp;nbsp;behavior detection, kinda like virus checking for web sites).&amp;nbsp; This is far more effective than preventing mixed scripts, and has the advantage of working with ASCII only URLs.&amp;nbsp; It also does a good job against homographs, pretty much making the punicode-in-the-address-bar irrelevent.&amp;nbsp; It also works with many forms of attack, even non-obvious ones.&amp;nbsp; &lt;/P&gt;
&lt;P mce_keep="true"&gt;My opinion is that if you do a "good job" of detecting any phishing/spoofing type web site, even ASCII-only, then the need for Homograph detection is much reduced.&amp;nbsp; And if you can't do that, then the attackers will merely add an extra label or something to get around your homograph detection.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;Mitigation by Protocol&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;For things like IDN, it is interesting to consider how the protocol itself approaches security.&amp;nbsp; Some things are "obvious" as not being interesting for a name.&amp;nbsp; Compatibility characters, control characters, etc. could somewhat readily be excluded.&amp;nbsp; Some things are generally considered technically "obvious" to some users, but may frustrate others.&amp;nbsp; It is generally considered that lower casing the DNS name causes less confusing (can't mix up lower case l with capital I), but I doubt that AAA.com prefers lower casing.&amp;nbsp; Similarly IDNA2003 allows unicode "symbols,"&amp;nbsp;which are widely regarded as being useless, particularly since they're hard to type, but I suspect that someone would like I♥NY.&amp;nbsp; So there's a gray area that gets a bit confusing.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Consideration for other protocols is similar.&amp;nbsp; EAI (email) is interesting because it basically defers "correctness" to the registrar (whoever runs the mail server).&amp;nbsp; IDN provides some restriction by protocol and more at the registrar level.&lt;/P&gt;
&lt;P mce_keep="true"&gt;One problem with restricting valid characters at the protocol level is that it works OK in a small set, but once you get to a global audiance the rules get very complicated.&amp;nbsp; Domain names allowed (most) English names when they were restricted to ASCII, but German and French had difficulties.&amp;nbsp; With IDN additional languages are supported, but perhaps the needs of an English registrar and a German one differ.&amp;nbsp; A complete set of rules applicable world-wide for all strings in all languages may not be possible (eg: turkish i), but even if they were, they would be very complex and difficult to implement for every application adopting a protocol.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;Mitigation by Registrar&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Restriction at the registrar can be more effective, though perhaps less consistent.&amp;nbsp; A registrar could be like a domain name registrar, but for these purposes you could also think of the person that assigns user accounts at a business, or&amp;nbsp;email address registration from your ISP.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Registrars can restrict languages to those used in the country they support.&amp;nbsp; They can bundle or block&amp;nbsp;homographs or alternate spellings (like Traditional and Simplified Chinese spellings of the same word.)&amp;nbsp; In a business they could have certain rules. &amp;nbsp;First name, last initial, or first initial, last name is common for user accounts in many companies, at least until they get too many employees).&lt;/P&gt;
&lt;P mce_keep="true"&gt;IDN has some restrictions by protocol, but allows much tighter restriction at the registrar level.&amp;nbsp; Ironically, a label at a lower level could then have different "rules" than at the higher level.&amp;nbsp; EAI allows the local part to be determined entirely by the provider/registrar rather than the protocol.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Rules at the "registrar" level can still be very complex for a complete set of rules, however cases with conceptual differences can still be adopted as applicable for the registrar's environment, whereas a protocol level rule has to either be too flexible, or disallow one registrar's legitimate scenario.&amp;nbsp; Rules at the registrar level can also be adjusted more readily than at the protocol level.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;Mitigation by Application&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;An application can also decide to be more comprehensive than the protocol.&amp;nbsp; An application may also have more information,&amp;nbsp;such as blacklists or user settings.&amp;nbsp; They can make choices for some users like "they only read English, so don't bother with Cyrillic then," and a different choice for a different user.&amp;nbsp; Applications can also potentially be grayer in their behavior.&amp;nbsp; Instead of "allowing" and "disallowing" strings, they can say "gee, I'm not so sure, you really want to do this?", or flag it and continue.&amp;nbsp; They can also be dynamic, such as when you add a sender to a junk mail filter.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;IDN vs EAI/IMA vs Unicode&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Pretty much this entire "strings aren't secure"&amp;nbsp;concept applies to any Unicode (or for that matter any other code page) string.&amp;nbsp; That could be an IDN domain name, an EAI mail address, a user account name, etc.&amp;nbsp; Some environments may be more ameniable to certain solutions than others, but the types of attacks that impact a Unicode&amp;nbsp;IDN label could also succeed with the local (user name) part of a Unicode&amp;nbsp;EAI&amp;nbsp;email address.&amp;nbsp; The general concepts are portable.&lt;/P&gt;
&lt;P mce_keep="true"&gt;I used IDN heavily as an example, but the same things happen to EAI addresses, user account names, logon credentials, etc.&amp;nbsp; Anything that uses Unicode, or strings, needs to realize that strings can't be expected to be inherintly "secure."&lt;/P&gt;
&lt;P mce_keep="true"&gt;There's more info on some thinking about Unicode Security in Unicode TR#39 &lt;A href="http://www.unicode.org/draft/reports/tr39/tr39.html"&gt;http://www.unicode.org/draft/reports/tr39/tr39.html&lt;/A&gt;.&amp;nbsp; TR39 addresses the appropriate use of Unicode characters and homographs, but this is at best a mitigation of the more general security concerns of identifier strings.&amp;nbsp; Phishing and spoofing would still happen even in plain ASCII.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Hope this was helpful, or at least interesting,&lt;/P&gt;
&lt;P mce_keep="true"&gt;Shawn&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9823241" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/IDN+_2800_Internationalized+Domain+Names_2900_/default.aspx">IDN (Internationalized Domain Names)</category><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category><category domain="http://blogs.msdn.com/shawnste/archive/tags/eMail+Address+Internationalization/default.aspx">eMail Address Internationalization</category></item><item><title>Writing "fields" of data to an encoded file.</title><link>http://blogs.msdn.com/shawnste/archive/2009/06/01/writing-fields-of-data-to-an-encoded-file.aspx</link><pubDate>Tue, 02 Jun 2009 03:38:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9683075</guid><dc:creator>shawnste</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/9683075.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=9683075</wfw:commentRss><description>&lt;P&gt;The moral here is "Use Unicode," so you can skip the details below if you want :)&lt;/P&gt;
&lt;P&gt;A common problem when storing string data in various fields is how to encode it.&amp;nbsp; Obviously you can store the Unicode as Unicode, which is a good choice for an XML file or text file.&amp;nbsp; However, sometimes data gets mixed with other non-string data or stored in a record, like a database record.&amp;nbsp; There are several ways to do that, but some common formats are delimited fields, fixed width fields, counted fields.&amp;nbsp; I'm going to ignore more robust protocols like XML for this problem.&lt;/P&gt;
&lt;P&gt;A delimited field would be a character between fields that indicated that one field ended an another started.&amp;nbsp; Common delimiters are null (0), comma, and tab.&amp;nbsp; Using delimited fields, a list of names would look something like "Joe,Mary,Sally,Fred".&lt;/P&gt;
&lt;P&gt;A fixed width field would be a field of a known size regardless of the input data size.&amp;nbsp; Generally data that is too short is padded with a space or null, and data that is too long is clipped.&amp;nbsp; If our "names" field was of fixed size four, then the previous list could look something like "Joe_MarySallFred".&amp;nbsp; Note the _ to pad the 3 character name, that Sally is clipped, and that the other names are "run together".&lt;/P&gt;
&lt;P&gt;A counted field would indicate the field size for each piece of data before outputting the data.&amp;nbsp; The advantage is that it doesn't have the size restriction/clipping of&amp;nbsp;fixed width fields, nor does it have to waste space with unnecessary padding.&amp;nbsp; (It could still be clipped for&amp;nbsp;large strings as the&amp;nbsp;count is likely restricted so some # of bits).&amp;nbsp; Similarly delimiters aren't a problem.&amp;nbsp;&amp;nbsp;Generally the count is binary, but I'll show an example using numbers "3Joe4Mary5Sally4Fred"&lt;/P&gt;
&lt;P&gt;A somewhat&amp;nbsp;obvious way to store and read Unicode char or Unicode string data in the above formats is to write it in Unicode.&amp;nbsp; Counted fields can just count the Unicode code points to be read in.&amp;nbsp; Fixed width fields can similarly check for the space available and use Unicode character counts.&amp;nbsp;&amp;nbsp; Delimited fields can also use Unicode.&lt;/P&gt;
&lt;P&gt;When the desired output isn't Unicode (UTF-16)&amp;nbsp;however, then you start running into some interesting problems.&amp;nbsp; Encodings (code pages) don't have a 1:1 relationship with UTF-16 code points, so you have to be careful.&amp;nbsp; Additionally some encodings shift modes and maintain state through shift or escape sequences.&lt;/P&gt;
&lt;P&gt;For&amp;nbsp;all of the fixed, counted, delimited techniques shift states cause an additional problem&amp;nbsp;in that either the writer has to terminate the sequence, or persist the state until the next field.&amp;nbsp; Consider 2 fields where field 1 has some ASCII data that looks like "Joe" followed by shift&amp;nbsp;sequence, then&amp;nbsp;a Japanese character, and field 2 has "Kelly" in what looks like ASCII.&amp;nbsp; If the decoder retains the state between reading the 2 fields, it may accidentally read in "Kelly" as Japanese and presumably corrupt the output.&amp;nbsp; Alternatively if "Kelly" was really intended to read in "japanese" mode, then any application starting to read at field 2 gets confused since it didn't see the shift at the end of field 1.&amp;nbsp; &lt;/P&gt;
&lt;P&gt;For that reason I like to make sure the fields are "complete", flushing the encoder at the end of each field (this is different than writing a pure-text document like XML).&amp;nbsp; So then field 1 above would have a shift-back-to-ASCII sequence at the end.&lt;/P&gt;
&lt;P&gt;For fixed fields this could introduce another problem because the shift-back-to-ASCII sequence may exceed the allowed field size.&amp;nbsp; In that case the string would have to be made smaller before encoding to allow enough room for flushing.&lt;/P&gt;
&lt;P&gt;For delimited fields there's an additional problem in that the delimiter could accidentally look like part of an encoded sequence.&amp;nbsp; Delimiters should only be tested on the decoded data.&lt;/P&gt;
&lt;P&gt;For counted fields you start having trouble if the count isn't in encoded bytes.&amp;nbsp; If you counted the Unicode code points, then encode those code points, you don't know how many bytes to read back in when decoding.&amp;nbsp; It isn't possible to "just guess" when to stop reading data because there may or may not be some state changing data that you are expected to either ignore or read.&amp;nbsp; For example "Joe++" where ++ is a Japanese character could look like:&lt;/P&gt;
&lt;P&gt;4&amp;lt;shift-to-ascii&amp;gt;Joe&amp;lt;shift-to-Japanese&amp;gt;&amp;lt;+&amp;gt;&amp;lt;+&amp;gt;, or&lt;BR&gt;4&amp;lt;shift-to-ascii&amp;gt;Joe&amp;lt;shift-to-Japanese&amp;gt;&amp;lt;+&amp;gt;&amp;lt;+&amp;gt;&amp;lt;shift-to-ascii&amp;gt;, or&lt;BR&gt;4&amp;lt;shift-to-ascii&amp;gt;Joe&amp;lt;shift-to-Japanese&amp;gt;&amp;lt;+&amp;gt;&amp;lt;+&amp;gt;&amp;lt;shift-to-mode-q&amp;gt;&amp;lt;shift-to-mode-z&amp;gt;&amp;lt;shift-to-mode-x&amp;gt;&lt;/P&gt;
&lt;P&gt;where "4" represents the count, &amp;lt;+&amp;gt; represents the encoded character, and &amp;lt;shift...&amp;gt; indicates some sort of state change that doesn't cause output directly by itself.&lt;/P&gt;
&lt;P&gt;Since the application doesn't know whether to expect the trailing &amp;lt;shift&amp;gt; sequence(s), it may not read enough data, and then may try to use &amp;lt;shift-to-ascii&amp;gt; as the count of the next field.&amp;nbsp; Similarly if it does see a &amp;lt;shift-to-ascii&amp;gt; and tries to read it in, then maybe it'll be confused if that was actually the count of the next field that just happened to look like a mode change.&lt;/P&gt;
&lt;P&gt;So the moral is: Use UTF-16 because that's what the strings look like so they're less likely to get shifty about their sizes.&amp;nbsp; &lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Use Unicode.&amp;nbsp; Either UTF-16, or maybe use UTF-8, though it still can change size and you have to be careful, but at least each code point represents a Unicode code point.&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;If you must count, try to count the actual encoded data size, not the unencoded form since that'll be confusing when decoding.&lt;/LI&gt;
&lt;LI&gt;Be good and flush your encoder if you must encode, so that the state gets back into a known state (usually ASCII) and then the decoding application doesn't get confused if they don't reset their decoder.&lt;/LI&gt;
&lt;LI&gt;Make sure you say which encoding you used.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Of course you may be talking to a GPS or something where you don't get to define the standard.&amp;nbsp; In that case you can just watch out for these caveats.&amp;nbsp; Should you be designing such a protocol however, make sure to use Unicode.&amp;nbsp; If that cannot happen, at least make sure to pay attention to the impact of encoding and decoding the data when the protocol's used.&lt;/P&gt;
&lt;P&gt;-Shawn&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9683075" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category><category domain="http://blogs.msdn.com/shawnste/archive/tags/System.Text/default.aspx">System.Text</category></item><item><title>Don't use MB_COMPOSITE, MB_PRECOMPOSED or WC_COMPOSITECHECK</title><link>http://blogs.msdn.com/shawnste/archive/2009/05/06/don-t-use-mb-composite-mb-precomposed-or-wc-compositecheck.aspx</link><pubDate>Thu, 07 May 2009 05:39:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9592652</guid><dc:creator>shawnste</dc:creator><slash:comments>4</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/9592652.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=9592652</wfw:commentRss><description>&lt;P&gt;This pretty much&amp;nbsp;demonstrates another reason to Use Unicode, but if you do need to use some&amp;nbsp;non-Unicode encoding until you can convert to Unicode, please&amp;nbsp;don't use these flags.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MultiByteToWideChar() and WideCharToMultiByte() provide some interesting sounding flags that are actually useless, slow, badly broken, or far worse.&amp;nbsp; All of these flags would be expected to behave like Unicode Normalization, so you should instead use NormalizeString() to handle the desired behavior, either Form C for composed strings or Form D for decomposed strings.&lt;/P&gt;
&lt;P&gt;MB_PRECOMPOSED is the simplest to address:&amp;nbsp; Basically this flag doesn't really do anything.&amp;nbsp; Nominally it would put data into something like Normalization Form C, however most code pages are already in a composed form, so there's little real impact.&amp;nbsp; Just to make sure, the flag's ignored internally :)&lt;/P&gt;
&lt;P&gt;MB_COMPOSITE is my most hated of these flags.&amp;nbsp; First of all, it nominally pretends to put the data into something like Normalization Form D, decomposed into a base character and combining characters.&amp;nbsp; To me that's the opposite of "Composite".&amp;nbsp; Indeed, I've seen numerous code examples that seem to be passing MB_COMPOSITE expecting Form C data, and pretty much zero examples expecting Form D data.&amp;nbsp; Windows leans towards Form C internally (though you may use Form D or mixed data), so this flag probably isn't that helpful.&amp;nbsp; If you really want to decompose your data, then use NormalizeString with Form D instead of this flag.&lt;/P&gt;
&lt;P&gt;MB_COMPOSITE also is very slow because it does a lookup in some data tables.&amp;nbsp; NormalizeString with&amp;nbsp;Form D is probably faster.&lt;/P&gt;
&lt;P&gt;MB_COMPOSITE also has some horrible behavior for many code points:&lt;BR&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Several code points will not round trip if this flag is set, even if WC_COMPOSITECHECK is used when converting back to the code page.&lt;/LI&gt;
&lt;LI&gt;Additionally its data tables are incomplete and inconsistent with the normalization&lt;/LI&gt;
&lt;LI&gt;Worse, some characters are decomposed into nonsensical sequences.&lt;/LI&gt;
&lt;LI&gt;Lastly some sequences decompose to strange choices, breaking some text.&amp;nbsp; Japanese is particularly impacted.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;WC_COMPOSITECHECK basically has all of the problems of MB_COMPOSITE (its used in the other direction).&amp;nbsp; Its name isn't as annoying to me though.&amp;nbsp; Nominally WC_COMPOSITECHECK puts the data into Normalization Form C before encoding.&amp;nbsp; Since most code pages are in a composed form Normalization Form C isn't a bad idea, however please use NormalizeString with Form C instead of this flag.&lt;/P&gt;
&lt;P&gt;WC_COMPOSITECHECK is also very slow because of the way it does lookup.&amp;nbsp; NormalizeString with Form C is probably faster.&lt;/P&gt;
&lt;P mce_keep="true"&gt;WC_COMPOSITECHECK also has horrible behavior for many code points:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;It will convert sensible sequences into a form that, when round tripped by MB_COMPOSITE will end up in nonsensical forms.&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Sequences of&amp;nbsp;3 code points created by&amp;nbsp;MB_COMPOSITE aren't correctly decoded by WC_COMPOSITECHECK back into their single code point form, resulting in extra ? when round tripping data.&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Several sequences map to a single code point, which MB_COMPOSITE will map back to a single form, so they won't round trip.&amp;nbsp; If you really need similar behavior try Normalization Form C, or KC if you really need the multiple mappings.&amp;nbsp; KC causes data to not round trip, so it might not be appropriate for all applications.&amp;nbsp; (Of course converting to the code page will also&amp;nbsp;likely cause data to be lost so that may not matter so much).&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Again some sequences are composed in a strange form based on appearance rather than linguistics.&amp;nbsp; This could cause some unexpected behavior.&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Some scripts, like Japanese, are particularly impacted.&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P mce_keep="true"&gt;Hopefully I've terrified you and you'll stop using these flags, perhaps using NormalizeString() if you really need similar behavior.&amp;nbsp; Most applications don't even really need that though.&amp;nbsp; Of course you always have the option of Using Unicode!&lt;/P&gt;
&lt;P mce_keep="true"&gt;'til next time,&lt;BR&gt;Shawn&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9592652" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category></item><item><title>Front page uses windows-1252, shouldn't it be iso-8859-1?</title><link>http://blogs.msdn.com/shawnste/archive/2008/11/13/front-page-uses-windows-1252-shouldn-t-it-be-iso-8859-1.aspx</link><pubDate>Thu, 13 Nov 2008 23:41:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9067400</guid><dc:creator>shawnste</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/9067400.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=9067400</wfw:commentRss><description>&lt;P&gt;I received this question:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;I use Frontpage for my webpage design and FP automatically inserts the meta tag "&amp;lt;meta http-equiv="Content-Type" content="text/html; charset=windows-1252"&amp;gt;".&lt;BR&gt;&amp;nbsp;&lt;BR&gt;Should I have reference to ISO-8859-1 ?&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I'm not a front page expert, and I can't answer all&amp;nbsp;questions like this, however this is an&amp;nbsp;common confusion.&amp;nbsp; Windows-1252 is very similar to ISO-8859-1, but they aren't identical.&amp;nbsp; Web sites and browsers have historically often treated these as equivilent, but they aren't, which is a great reason to &lt;A class="" href="http://blogs.msdn.com/shawnste/archive/2007/03/20/some-reasons-to-make-your-application-unicode.aspx" mce_href="http://blogs.msdn.com/shawnste/archive/2007/03/20/some-reasons-to-make-your-application-unicode.aspx"&gt;use unicode&lt;/A&gt; for your encoding.&amp;nbsp; (No, I don't know how to make front page use UTF-8, but that'd be the best solution).&amp;nbsp; Looking on &lt;A class="" href="http://search.live.com/results.aspx?q=windows-1252+iso-8859-1&amp;amp;src=IE-SearchBox" mce_href="http://search.live.com/results.aspx?q=windows-1252+iso-8859-1&amp;amp;src=IE-SearchBox"&gt;search.live.com&lt;/A&gt;&amp;nbsp;(of course) for iso-8859-1 and windows-1252 will find some discussion of the differences.&amp;nbsp; Wikipedia has some articles (they change so I won't quote them directly, but their encoding related articles are usually informative and often accurate.)&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9067400" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category></item><item><title>Where to Look Up Information About Microsoft Code Pages?</title><link>http://blogs.msdn.com/shawnste/archive/2008/06/04/where-to-look-up-information-about-microsoft-code-pages.aspx</link><pubDate>Wed, 04 Jun 2008 20:12:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8573643</guid><dc:creator>shawnste</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/8573643.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=8573643</wfw:commentRss><description>&lt;P&gt;First of all, remember to &lt;A class="" title="Unicode and Code Pages" href="http://blogs.msdn.com/shawnste/pages/code-pages-unicode-encodings.aspx" mce_href="http://blogs.msdn.com/shawnste/pages/code-pages-unicode-encodings.aspx"&gt;Use Unicode&lt;/A&gt; when practical :)&amp;nbsp; Sometimes older applications don't allow Unicode, although they usually then don't allow Microsoft code pages as well (usually being ASCII or Latin-1, which are different).&lt;/P&gt;
&lt;P&gt;But when you do have a question about how Microsoft's "ANSI" (They're not really ANSI) code pages behave, there are a few places you can look.&amp;nbsp; Unicode has a mapping on their server at &lt;A href="http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/"&gt;http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/&lt;/A&gt;&amp;nbsp;of the Microsoft "ANSI" code pages.&amp;nbsp; They're also mentioned on the Microsoft globaldev &lt;A href="http://www.microsoft.com/globaldev/reference/cphome.mspx"&gt;http://www.microsoft.com/globaldev/reference/cphome.mspx&lt;/A&gt;&amp;nbsp;site, and in &lt;A class="" title="Appendix H, Developing International Software" href="http://msdn.microsoft.com/en-us/library/cc195051.aspx" mce_href="http://msdn.microsoft.com/en-us/library/cc195051.aspx"&gt;Appendix H&lt;/A&gt; of Developing International Software.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8573643" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category></item><item><title>Unicode use on the web</title><link>http://blogs.msdn.com/shawnste/archive/2008/05/05/unicode-use-on-the-web.aspx</link><pubDate>Mon, 05 May 2008 22:57:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8461315</guid><dc:creator>shawnste</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/8461315.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=8461315</wfw:commentRss><description>&lt;P&gt;Google posted a blog about unicode use on the web &lt;A href="http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html"&gt;http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html&lt;/A&gt;&amp;nbsp; They also announced that they now support Unicode 5.1, which is probably a good thing, but I found the graph most interesting &lt;A href="http://bp1.blogger.com/_Ap14FtNN91w/SBzrtHJfLnI/AAAAAAAAA5U/TV7_g2_sWq0/s1600-h/Unicode2.gif"&gt;http://bp1.blogger.com/_Ap14FtNN91w/SBzrtHJfLnI/AAAAAAAAA5U/TV7_g2_sWq0/s1600-h/Unicode2.gif&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;The graph shows UTF-8 as now being the most prevalent encoding on the web, with a steady decline of ASCII and declining trends for other common encodings.&amp;nbsp; That's got to be a good thing for character portability between machines on the web.&amp;nbsp; Lack of proper declaration and use of encodings is one of the biggest problems with interoperability on the web, its nice to see Unicode gaining ground.&amp;nbsp; The rate of growth is also really good, a nice strong&amp;nbsp;curve there at the end&amp;nbsp;:)&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8461315" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category></item><item><title>Server 2008 U+FFFD behavior for unknown or illegal UTF-8 sequences.</title><link>http://blogs.msdn.com/shawnste/archive/2008/03/10/server-2008-u-fffd-behavior-for-unknown-or-illegal-utf-8-sequences.aspx</link><pubDate>Mon, 10 Mar 2008 21:24:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8135910</guid><dc:creator>shawnste</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/8135910.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=8135910</wfw:commentRss><description>&lt;P&gt;In my post &lt;A href="http://blogs.msdn.com/shawnste/archive/2006/06/16/634666.aspx" mce_href="http://blogs.msdn.com/shawnste/archive/2006/06/16/634666.aspx"&gt;Change to Unicode Encoding for Unicode 5.0 conformance&lt;/A&gt;&amp;nbsp;I mentioned that the behavior of illegal characters has changed for Unicode 5 conformance in Windows Vista / .Net 2.0+.&amp;nbsp; Those changes have also been inherited by Server 2008.&lt;/P&gt;
&lt;P&gt;Also check out my collection of code page related articles at &lt;A href="http://blogs.msdn.com/shawnste/pages/code-pages-unicode-encodings.aspx"&gt;Code Pages, Unicode &amp;amp; Encodings&lt;/A&gt;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8135910" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category></item><item><title>Code pages and security issues</title><link>http://blogs.msdn.com/shawnste/archive/2008/01/17/code-pages-and-security-issues.aspx</link><pubDate>Fri, 18 Jan 2008 03:34:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7144990</guid><dc:creator>shawnste</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/7144990.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=7144990</wfw:commentRss><description>&lt;P&gt;One of the reasons I always suggest "Use Unicode!" is that there are security problems converting between code pages.&amp;nbsp; &lt;/P&gt;
&lt;P&gt;One of the reasons I always suggest "Use Unicode!" is that there are security problems converting between code pages. In short if data is going to be converted between code pages after some sort of security validation is done, then that validation could be negated. This is true of lots of data transformations, but it seems to surprise people a lot when applied to code page transformations. &lt;/P&gt;
&lt;P&gt;There are lots of reasons for this, but some are:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Transformations can have "best-fit" mappings.&amp;nbsp; For example, if I test for "C:\windows" in some form, but a tranformation maps the&amp;nbsp;fullwidth compatibility characters (~U+FF00, such as ff43 (ｃ)), then the security check gets invalidated.&lt;/LI&gt;
&lt;LI&gt;Similar things happen with different versions of code pages.&amp;nbsp; For example, ASCII is sometimes handled by "dropping" the high bit so 0xc3 ends up being 0x43.&amp;nbsp; Alternatively sometimes&amp;nbsp;applications just "pretend" that it was really ANSI and map it to whatever code page the user is using.&lt;/LI&gt;
&lt;LI&gt;Code pages aren't always tagged correctly, so an ANSI validation using a default system&amp;nbsp;code page on a server may&amp;nbsp;yield different results than the same code with a different default system code page on a client.&lt;/LI&gt;
&lt;LI&gt;Some code pages also have difference between systems.&amp;nbsp; Windows itself&amp;nbsp;has slightly different&amp;nbsp;behavior between MLang (often used by IE)&amp;nbsp;and MultiByteToWideChar().&lt;/LI&gt;
&lt;LI&gt;Different systems handle unexpected, unassigned or illegal code points in different manners.&amp;nbsp; Sometimes that means ? or the equivilent.&amp;nbsp; Sometimes gibberish, sometimes dropped data (so then your C:\windows test on C?:?\?W?i?n?d?o?w?s doesn't work if all the ? disappear).&lt;/LI&gt;
&lt;LI&gt;Sometimes behavior changes, such as &lt;A href="http://blogs.msdn.com/shawnste/archive/2006/06/16/634666.aspx"&gt;&lt;STRONG&gt;&lt;FONT color=#006bad&gt;Change to Unicode Encoding for Unicode 5.0 conformance&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/A&gt; .&amp;nbsp; In this case the change is to the Unicode parsing itself, but still any security test should be done after reading the input data.&lt;/LI&gt;
&lt;LI&gt;Escape sequences might modify data in some environments that provide escaping mechanisms for characters that a code page doesn't support.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;A related problem is the IDN and code page parsing that browsers sometimes do.&amp;nbsp; &amp;amp;amp; named and numeric entities in HTML can end up with a different appearance.&amp;nbsp; % escaping is common in URLs, and IDN xn-- encoding happens in domain names.&amp;nbsp; An application may decode these, even at unexpected times, and cause problems if the data was assumed to be in a different state before the decoding.&lt;/P&gt;
&lt;P&gt;So the moral is: Do any security tests after any conversions have been done.&amp;nbsp; If you have to&amp;nbsp;retransmit the data,&amp;nbsp;try to use an encoding like Unicode that&amp;nbsp;has&amp;nbsp;fewer&amp;nbsp;edge case behaviors that could trip you up.&amp;nbsp; If possible, revalidate the data after the transmission if it has to be decoded.&amp;nbsp;&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=7144990" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category></item><item><title>Michael has a blog about converting apps from ANSI to Unicode</title><link>http://blogs.msdn.com/shawnste/archive/2007/11/21/michael-has-a-blog-about-converting-apps-from-ansi-to-unicode.aspx</link><pubDate>Thu, 22 Nov 2007 00:18:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:6459945</guid><dc:creator>shawnste</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/6459945.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=6459945</wfw:commentRss><description>&lt;P&gt;Lots of apps are now Unicode, but some need to make the shift from ANSI (like Japanese shift-jis) to Unicode.&amp;nbsp; Michael has a series of blog posts about a project conversion.&amp;nbsp; &lt;A href="http://blogs.msdn.com/michkap/archive/2007/01/05/1413001.aspx"&gt;http://blogs.msdn.com/michkap/archive/2007/01/05/1413001.aspx&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;I recently had a customer question about removing a shift_jis dependency and moving to Unicode, so I thought I'd blog about it, but I've been busy, so in the meantime I thought maybe Michael's blog would help :0)&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=6459945" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category></item><item><title>Are we going to update or maintain the best fit &amp;/or code page mappings?</title><link>http://blogs.msdn.com/shawnste/archive/2007/09/24/are-we-going-to-update-or-maintain-the-best-fit-or-code-page-mappings.aspx</link><pubDate>Mon, 24 Sep 2007 20:30:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:5103419</guid><dc:creator>shawnste</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/5103419.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=5103419</wfw:commentRss><description>&lt;P&gt;People wonder if we're going to update our best fit code page mappings, or even our code page mappings.&amp;nbsp; The answer is no.&amp;nbsp; Changing character mappings causes difficulties for applications and our experience has been that doing so breaks as much as it "fixes".&amp;nbsp; We'd prefer applications move to Unicode, then you don't have to worry about best-fit, or if a character is supported.&lt;/P&gt;
&lt;P&gt;Best fit behavior is the behavior of some code pages to map unknown unicode characters to a character that someone thought was similar that the code page supported.&amp;nbsp; Examples would be mapping ｋ(U+FF4B, full width k) to k, or ĩ (U+0-129 latin letter small i with tilde) to i,or ∞ (U+221e, infinity symbol) to 8.&amp;nbsp; Some of these seem reasonable, however we aren’t consistent in our mappings, most break the meaning, and some mappings (∞-&amp;gt;8) changes the meaning completely.&lt;/P&gt;
&lt;P&gt;The best fit mappings were created “a long time ago”, contained “omissions”, and haven’t been updated to include new Unicode characters.&amp;nbsp; “Newer” code pages don’t necessarily include the same best fit mappings, and, by now, the mappings are fairly inconsistent and incomplete.&amp;nbsp; So we don’t recommend that the mappings be used, and we don’t intend to change or “fix” the best fit behaviors.&lt;/P&gt;
&lt;P&gt;We also don’t like to change other code page data either.&amp;nbsp; “unassigned” code points can have arbitrary behavior or map to Unicode PUA code points.&amp;nbsp; Some applications use those code points (perhaps unwisely) as formatting codes or to cause special behavior.&amp;nbsp; Adding a mapping could break such an application.&amp;nbsp; Other applications or systems may provide a glyph for an unassigned code point that round trips, however that might not be the designed intent, and changing the code point behavior could break those applications or fonts.&lt;/P&gt;
&lt;P&gt;Code page standards are also sometimes extended, modified, or corrected.&amp;nbsp; Changing the behavior however impacts all applications using that behavior and our experience is that such changes across the installed windows code base causes as much trouble as it solves.&lt;/P&gt;
&lt;P&gt;So we like to keep the code page mappings stable.&amp;nbsp; My recommendations for code page use are:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Use Unicode unless explicitly required for some standard or protocol (and try to upgrade the standards or protocols to allow Unicode).&lt;/LI&gt;
&lt;LI&gt;If you can’t use Unicode, explicitly specify the mapping that is used.&amp;nbsp; (Some applications or standards presume whatever the OS uses, ie: windows ANSI code page, which causes serious interoperability problems.)&lt;/LI&gt;
&lt;LI&gt;&lt;A class="" href="http://blogs.msdn.com/shawnste/archive/2006/01/19/515047.aspx" mce_href="http://blogs.msdn.com/shawnste/archive/2006/01/19/515047.aspx"&gt;Avoid best fit mappings&lt;/A&gt;.&amp;nbsp; At best they cause spelling errors or offend customers.&amp;nbsp; At worst they can cause security problems.&lt;/LI&gt;
&lt;LI&gt;Avoid unassigned code points, their behavior is undefined and could cause difficulty if a different machines or software have a different interpretation.&lt;/LI&gt;
&lt;LI&gt;Use care when using the Unicode private use area (PUA).&amp;nbsp; Its use is private.&amp;nbsp; If data is persisted in the PUA, then there is a risk that future versions or other machines may not read the data correctly.&amp;nbsp; Eventually migration of data between different PUA mappings may become necessary, and migrating such data is rarely trivial.&amp;nbsp; The Hong Kong HKSCS mappings are an example of such a difficulty.&lt;/LI&gt;
&lt;LI&gt;&lt;A class="" href="http://blogs.msdn.com/shawnste/archive/2006/06/16/634666.aspx" mce_href="http://blogs.msdn.com/shawnste/archive/2006/06/16/634666.aspx"&gt;Don’t rely on illegal or undefined code page behavior&lt;/A&gt;.&amp;nbsp; Illegal sequences might change between versions or software.&amp;nbsp; Shift modes that aren’t implemented could be implemented on other machines, etc.&lt;/LI&gt;
&lt;LI&gt;Don't presume that illegal or undefined code page behavior will remain stable.&lt;/LI&gt;
&lt;LI&gt;&lt;A class="" href="http://blogs.msdn.com/shawnste/archive/2005/09/26/474105.aspx" mce_href="http://blogs.msdn.com/shawnste/archive/2005/09/26/474105.aspx"&gt;Don’t pretend binary data is text&lt;/A&gt; in some code page (or Unicode).&amp;nbsp; Variations in code page mappings could then prevent the data from round tripping, particularly if the binary data ends up in undefined or illegal code point behavior.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Hope that helps, more&amp;nbsp;posts about&amp;nbsp;common code page concerns are at &lt;A href="http://blogs.msdn.com/shawnste/pages/code-pages-unicode-encodings.aspx"&gt;http://blogs.msdn.com/shawnste/pages/code-pages-unicode-encodings.aspx&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Shawn&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=5103419" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category></item><item><title>UTF-16, UTF-8 &amp; UTF-32 update to conform with Unicode 5.0's security concerns.</title><link>http://blogs.msdn.com/shawnste/archive/2007/07/23/utf-16-utf-8-utf-32-update-to-conform-with-unicode-5-0-s-security-concerns.aspx</link><pubDate>Mon, 23 Jul 2007 19:07:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4011544</guid><dc:creator>shawnste</dc:creator><slash:comments>4</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/4011544.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=4011544</wfw:commentRss><description>&lt;P&gt;My post &lt;A href="http://blogs.msdn.com/shawnste/archive/2006/06/16/634666.aspx" mce_href="http://blogs.msdn.com/shawnste/archive/2006/06/16/634666.aspx"&gt;Change to Unicode Encoding for Unicode 5.0 conformance&lt;/A&gt; now applies to .Net 2.0 with &lt;A href="http://www.microsoft.com/technet/security/Bulletin/ms07-040.mspx" mce_href="http://www.microsoft.com/technet/security/Bulletin/ms07-040.mspx"&gt;MS07-040&lt;/A&gt; applied.&amp;nbsp; Updates include a list of known issues, please see the list of known issues for &lt;A href="http://www.microsoft.com/technet/security/Bulletin/ms07-040.mspx"&gt;MS07-040&lt;/A&gt; described in &lt;A href="http://support.microsoft.com/kb/931212"&gt;KB 931212&lt;/A&gt; for more information.&amp;nbsp; &lt;A href="http://support.microsoft.com/kb/940521/"&gt;KB 940521&lt;/A&gt; describes this behavior in pandrticular.&amp;nbsp; This fix reduces the chance of spoofing similar strings.&amp;nbsp; Unicode 5.0 specifies this change due to security concerns regarding spoofing.&lt;/P&gt;
&lt;P&gt;As mentioned in the KB:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Before this change, invalid characters in the middle of text strings would only be silently removed. For example, the string "Ad\xD800min\xDC00istrator" would change to "Administrator" as the Unicode characters U+D800 and U+DC00 are invalid . This could cause a security problem for some programs. After you install the security update MS07-040, this string would now become "Ad\xFFFDmin\xFFFDistrator", and decode to "Ad�min�istrator" where the � is the Unicode replacement character.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;The first time we introduced this behavior was in Vista, and since then I've received several reports of issues with the new behavior.&amp;nbsp; In nearly all of those cases there were usually some flawed assumptions contributing to the problems.&amp;nbsp; Some examples&amp;nbsp;were:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Programs trying to convert byte[] arrays to Unicode (see &lt;A href="http://blogs.msdn.com/shawnste/archive/2005/09/26/474105.aspx"&gt;Avoid treating binary data as a string&lt;/A&gt;) and then having problems when the data didn't round trip.&amp;nbsp; Note that prior to this change the data didn't round trip either, data was lost, but after the change it is more obvious since the FFFD's are present (which is the point of the security aspect of the change by the Unicode consortium).&lt;/LI&gt;
&lt;LI&gt;Doing something like that, then trying to make a hash of the resulting value.&amp;nbsp; After the update the hash doesn't match.&amp;nbsp; Note that even prior to the update a very large number of values have the same hash, so this was not nearly as secure as the application had hoped.&lt;/LI&gt;
&lt;LI&gt;Some applications made oopses with the behavior of Unicode, accidentally decoding extra byte(s) instead of pairs causing illegal UTF-16 or UTF-8.&amp;nbsp; Those were ignored and the app worked despite the bug, but the update prevents the error from working.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Note that&amp;nbsp;before the update .Net 2.0 on Vista and .Net 2.0&amp;nbsp;RTM had different Unicode decoding behavior.&amp;nbsp;&amp;nbsp;With the&amp;nbsp;update applied&amp;nbsp;they have the same behavior.&lt;/P&gt;
&lt;P&gt;Hope this is helpful,&lt;/P&gt;
&lt;P&gt;Shawn&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=4011544" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category></item><item><title>I see my favorite Ansi function has the behavior I want.</title><link>http://blogs.msdn.com/shawnste/archive/2007/06/15/i-see-my-favorite-ansi-function-has-the-behavior-i-want.aspx</link><pubDate>Fri, 15 Jun 2007 20:04:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3316009</guid><dc:creator>shawnste</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/3316009.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=3316009</wfw:commentRss><description>&lt;P&gt;Occasionally I am asked about the A version of a W function.&amp;nbsp; Ie: GetLocaleInfoA does something that appears more convenient to some user than GetLocaleInfoW.&amp;nbsp; The implied thought is that maybe they should just use the A version.&lt;/P&gt;
&lt;P&gt;For the most part our A functions are just wrappers for the W functions, so any perceived benefit is probably not real.&amp;nbsp;&amp;nbsp; Additionally since the A function is just a wrapper function, under the hood we have to convert any input and output strings to and from Unicode.&amp;nbsp; That'll probably make the call take several times longer if nothing else.&lt;/P&gt;
&lt;P&gt;For inputs the A version doesn't really restrict the code to particular&amp;nbsp;Unicode sets.&amp;nbsp; Sometimes its perceived that some Unicode character(s) are undesirable in the input stream.&amp;nbsp; In this case either the application already restricts the input to this Unicode subset, in which case there's no difference in the supported character set for Unicode; or the application passes unknown data, relying on the Ansi to Unicode conversion to filter unwanted data.&amp;nbsp; Of course then the unwanted data is just mangled, converted to ? or whatnot.&amp;nbsp; So the unwanted data isn't really stripped, and, even worse, the unwanted data can be corrupted in a manner that causes a security hole.&amp;nbsp; For example, I've seen a password hashing algorithm that then converted the hash to code page 1252.&amp;nbsp;&amp;nbsp;Quite often this caused a bunch of ??? in the resulting hash.&amp;nbsp; Since many combinations cause the ??? the password hash would match a very&amp;nbsp;large number of inputs, pretty much defeating whatever security was provided originally.&lt;/P&gt;
&lt;P&gt;For outputs the A version also doesn't prevent any Unicode code points from being read, they're just converted to junk like ? when the call returns.&amp;nbsp; So then the results are restricted to a subset of Unicode, but the restriction is done in a fairly useless manner.&amp;nbsp; I've seen configuration values being stored like this (imagine a user name), and then they pretty much just end up in ???? when read back.&lt;/P&gt;
&lt;P&gt;So, in short, use Unicode! :)&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3316009" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category></item><item><title>Why can't we strip the diacritics?</title><link>http://blogs.msdn.com/shawnste/archive/2007/06/08/why-can-t-we-strip-the-diacritics.aspx</link><pubDate>Sat, 09 Jun 2007 05:20:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3149153</guid><dc:creator>shawnste</dc:creator><slash:comments>5</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/3149153.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=3149153</wfw:commentRss><description>&lt;P&gt;We have some "best-fit" behavior which we generally consider to be "bad".&amp;nbsp; Any loss of data is generally a bad thing, so we recommend storing data in Unicode (so you don't lose anything).&amp;nbsp; Assuming you can't use Unicode, why is it so bad to just make everything ASCII-like?&amp;nbsp; Maybe you have a published house or direct marketing firm that can't handle Unicode, so you'll just get rid of those annoying decorations.&lt;/P&gt;
&lt;P&gt;In American English the diacritics are effectively quaint decorations.&amp;nbsp; Many people naïvely assume that when word auto-corrects naive to naïve that this is just a prettiness factor.&amp;nbsp; When they resume spell checking their résumé the diacritics become more important.&amp;nbsp; In English its fair to spell résumé as resume, but it seems cooler to add the accents.&amp;nbsp; Since we stole (borrowed is more politically correct) the word from French, we have a french-like pronunciation of résumé, and aren't likely to confuse it with resume.&lt;/P&gt;
&lt;P&gt;In most other languages diacritics aren't optional.&amp;nbsp; You wouldn't exchange a z with an s in english just because they look similar.&amp;nbsp; "A real singer" is a lot different than "a real zinger".&lt;/P&gt;
&lt;P&gt;Recently I encountered the the following example, a user wanted to get around those pesky diacritics by mapping to ASCII.&lt;/P&gt;
&lt;P&gt;The suggested input was:&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; último año de carrera&lt;/P&gt;
&lt;P&gt;The desired output was:&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ultimo ano de carrera&lt;/P&gt;
&lt;P&gt;My Spanish is nearly non-existent, however word's spell checker tells me these are all legitimate Spanish words, even without the accents.&amp;nbsp; The meaning goes from something like "the last year of the race" to "I completed the anus of the race."&lt;/P&gt;
&lt;P&gt;Now imagine that you're trying to reach a new market and you do that to your customer's names or potential customer's names, how long will they remain your customer? &lt;/P&gt;
&lt;P&gt;- Shawn&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3149153" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category><category domain="http://blogs.msdn.com/shawnste/archive/tags/System.Text/default.aspx">System.Text</category></item><item><title>Encoder/Decoder Encoding fallbacks fail after 2GB of data has been converted</title><link>http://blogs.msdn.com/shawnste/archive/2007/06/07/encoder-decoder-encoding-fallbacks-fail-after-2gb-of-data-has-been-converted.aspx</link><pubDate>Thu, 07 Jun 2007 23:19:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3148083</guid><dc:creator>shawnste</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/3148083.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=3148083</wfw:commentRss><description>&lt;P&gt;We have an unfortunate bug in .Net v2.0+ that causes encoding or decoding of more than 2GB of data to fail.&amp;nbsp; That's a lot of data, but it still shouldn't do that.&amp;nbsp; This is a problem with our built in fallbacks.&lt;/P&gt;
&lt;P&gt;Ironically, if you encounter bad bytes then the bug is reset and you're "good" for another 2GB.&amp;nbsp; This bug happens to most of our code pages for valid data, but some optimizations make it unlikely to happen&amp;nbsp;in Unicode, ASCII &amp;amp; Latin-1.&amp;nbsp; There are some workarounds.&amp;nbsp; Some of these don't work if you're insulated from the decoder/encoder (like using a StreamWriter):&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Change the encoder and decoder&amp;nbsp;fallbacks to&amp;nbsp;custom fallbacks, or use&amp;nbsp;the built-in EncoderExceptionFallback.&amp;nbsp; If you have known-good data, the ExceptionFallback would be a good choice.&lt;/LI&gt;
&lt;LI&gt;Use UTF-8 or UTF-16.&amp;nbsp; I think this nearly completely solves the problem.&amp;nbsp; At the minimum it extends the data by enough magnitudes that your computer would probably die of hardware failure before you hit the bug.&lt;/LI&gt;
&lt;LI&gt;Unconvertible data resets the bug, so you have another 2GB before it'll die.&amp;nbsp; You may be able to occasionally introduce an unconvertible code point (like U+FFFD).&lt;/LI&gt;
&lt;LI&gt;This only happens&amp;nbsp;when the encoder/decoder fallback buffers aren't reset.&amp;nbsp; Using the Encoding.GetBytes/GetChars&amp;nbsp;won't fail unless you&amp;nbsp;try&amp;nbsp;a string longer than 2GB.&amp;nbsp; If you are using short text segments that&amp;nbsp;don't need the Encoder or Decoder state this would be a good state.&amp;nbsp; For example, if you're piping a bunch of messages to the console, you might consider just sending one line at a time using the Encoding class.&lt;/LI&gt;
&lt;LI&gt;Getting a new Encoder or Decoder object when possible will&amp;nbsp;give you a fresh start.&amp;nbsp; For example if you process a bunch of smaller documents you might change encoders/decoders between documents, or between records or whatever.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Hope that helps,&lt;/P&gt;
&lt;P&gt;Shawn&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3148083" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/Unicode+and+Code+Pages_2F00_Encodings/default.aspx">Unicode and Code Pages/Encodings</category><category domain="http://blogs.msdn.com/shawnste/archive/tags/System.Text/default.aspx">System.Text</category></item></channel></rss>