<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx</link><description>Notepad has to guess the encoding and can be tricked into guessing wrong.</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95296</link><pubDate>Wed, 24 Mar 2004 15:28:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95296</guid><dc:creator>Nate S</dc:creator><description>Notepad does a good job of detecting these variations, in my experience.&lt;br&gt;&lt;br&gt;However, it's confusing when Microsoft documentation refers to UCS-2 (or is it UTF-16 now?) as &amp;quot;Unicode&amp;quot;. I've seen a lot of people who think that Unicode means &amp;quot;two bytes per character&amp;quot;, which isn't even true of UTF-16. UTF-8 and UTF-7 are no less Unicode than UCS-2/UTF-16.</description></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95332</link><pubDate>Wed, 24 Mar 2004 16:22:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95332</guid><dc:creator>David Cumps</dc:creator><description>Hey, thanks for the nice explenation!</description></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95334</link><pubDate>Wed, 24 Mar 2004 16:25:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95334</guid><dc:creator>Mike Dunn</dc:creator><description>As I understand it, UCS-2 != UTF-16&lt;br&gt;UCS-2 can only encode U+0000 to U+FFFF (2 bytes per wide char, no more)&lt;br&gt;UTF-16 can encode all Unicode code points.</description></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95336</link><pubDate>Wed, 24 Mar 2004 16:26:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95336</guid><dc:creator>Bertg</dc:creator><description>that makes a lot of sence...&lt;br&gt;but...&lt;br&gt;If notepad is unsure shouldn't it ask to the user if the text is displayed correctely and if not try an other encoding?&lt;br&gt;(sorry about the spelling)</description></item><item><title>Notepad file encoding</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95364</link><pubDate>Wed, 24 Mar 2004 20:16:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95364</guid><dc:creator>Deep Thoughts...</dc:creator><description /></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95425</link><pubDate>Wed, 24 Mar 2004 18:54:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95425</guid><dc:creator>Simon Cooke [exMSFT]</dc:creator><description>Nate: Just wondering... did UCS-2 / UTF-16 even exist back when Unicode was in v1.0? IIRC, when Unicode first started out (and Microsoft implemented it from v1), there was only UCS-2. Which explains why they call that &amp;quot;Unicode&amp;quot; - because when they started implementing, that WAS Unicode.</description></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95491</link><pubDate>Wed, 24 Mar 2004 20:06:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95491</guid><dc:creator>RichB</dc:creator><description>Talking about Notepad, why does it reformat itself when I save, yet forget to repaint itself? It reformats itself using it's line wrapping algorithm which is different prior to a save.&lt;br&gt;&lt;br&gt;Notepad has been like this since NT4 at least.</description></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95505</link><pubDate>Wed, 24 Mar 2004 20:29:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95505</guid><dc:creator>Nate S</dc:creator><description>I think you're right, Simon. At the time the Unicode people were thinking it would be a 2-byte encoding. Still, that was a long time ago. Even newer systems like C# are still using UCS-2 and calling it Unicode.</description></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95568</link><pubDate>Wed, 24 Mar 2004 22:04:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95568</guid><dc:creator>Joe</dc:creator><description>Unicode is nothing but a big table.  Therefore nothing is really &amp;quot;Unicode&amp;quot; except the Unicode table.  In the computer world we deal with encodings of value in the Unicode table, and in that way UCS-2 is a Unicode encoding as much as UCS-4 is a Unicode encoding.  That is, UCS-4 isn't &amp;quot;Unicode&amp;quot; in the same way the UCS-2 isn't &amp;quot;Unicode.&amp;quot;  But UCS-2 is a perfectly valid Unicode encoding.  Microsoft has chosen UCS-2 as its internal Unicode encoding, they could have chosen UTF-8 and still called it &amp;quot;Unicode&amp;quot;.&lt;br&gt;&lt;br&gt;This is a very handy site dealing with UTF-8, but also addresses a lot of stuff surrounding Unicode: &lt;a target="_new" href="http://www.cl.cam.ac.uk/~mgk25/unicode.html"&gt;http://www.cl.cam.ac.uk/~mgk25/unicode.html&lt;/a&gt;</description></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95575</link><pubDate>Wed, 24 Mar 2004 22:12:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95575</guid><dc:creator>asdf</dc:creator><description>Search for &amp;quot;EM_GETHANDLE wrap&amp;quot; (without the quotes) on google groups for an explanation of how notepad (and other apps) implement wordwrapping with the edit control. Basically it flashes because it destroys and recreates the edit control.</description></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95592</link><pubDate>Wed, 24 Mar 2004 22:31:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95592</guid><dc:creator>Nate S</dc:creator><description>Exactly Joe. In Raymond's post above, you can see some confusion. He lists the available encodings as ANSI, Unicode, UTF-8, UTF-7, and so on.&lt;br&gt;&lt;br&gt;But &amp;quot;Unicode&amp;quot; is not an encoding. UTF-8, UTF-7 etc. are encodings. The encoding which he refers to as Unicode is properly called UCS-2.&lt;br&gt;&lt;br&gt;This is a small but nagging problem in Microsoft documentation.</description></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95634</link><pubDate>Wed, 24 Mar 2004 23:45:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95634</guid><dc:creator>Norman Diamond</dc:creator><description>Actually the encodings that do not have special prefixes and which are still supported by Notepad are the traditional ANSI encoding (i.e., Shift-JIS, ANSI code page 932) and the Unicode (little-endian) encoding with no BOM.&lt;br&gt;&lt;br&gt;But in the example posted by David Cumps, Notepad did not choose between those two encodings.  Notepad chose a wildly different encoding.  Notepad used its usual Japanese font for display, in which it chose a total of eight items:  five double-byte full-width Kanji characters; and three single-byte non-displayable characters for which it displayed three half-width black rectangles.  But the encoding is not a Japanese encoding, so Notebook's choice of characters was nonsense.&lt;br&gt;&lt;br&gt;Here's another example, related to a recently discussed IE security bug.  I think you'll need a tool other than Notepad to create the file, a single byte with value 0x01, i.e. a Ctrl-A.  Open the file in Notepad and it displays a British pound sign.  If you have a Japanese font, it displays a full-width British pound sign, as if it were a double-byte character.  But move the text cursor to the right and it only moves half-way through the character, because there's only a single byte.  Add some single-byte half-width characters after that, and you can see Notepad get really confused about which characters are which.  Add some full-width characters and you can watch Notepad move the text cursor through midpoints of characters instead of between characters.</description></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95653</link><pubDate>Thu, 25 Mar 2004 00:30:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95653</guid><dc:creator>Peter Evans</dc:creator><description>Yeah Joe and Nate,&lt;br&gt;&lt;br&gt;The confusing thing with Unicode is that ever one confuses its various encodings as equivalent to being Unicode.  &lt;br&gt;&lt;br&gt;When in reality the Unicode(ISO/IED 10646) really defines an order set of code values assigned to  characters names and properties and rules for mapping those code values and properties to glyphs.  &lt;br&gt;&lt;br&gt;The collective code values and properties can be used with other unicode rules to  manipulate the character entities for various writing systems and natural languages.&lt;br&gt;&lt;br&gt;Even more confusion comes from the need to translate from various UTF-X format encodings or UCS-2 code values into the full spectrum UCS-4 code values.  &lt;br&gt;&lt;br&gt;In the end, all the encoding stuff just results in a precise lookup for finding the code value and character properties from the byte encodings whether they be UTF-7, UTF-8,UTF-16, UTF-32.&lt;br&gt;&lt;br&gt;I think the confusion is more of UNICODE naming / branding problem than a Microsoft documenation issue.  Unicode really is the brand for compliance to the standard and not the encoding of the standard.&lt;br&gt;&lt;br&gt;Most coders do not care about UNICODE all they need to know is how do I decode/encode it and how do I detect it and what APIS work with it.&lt;br&gt;Afer all isn't that what the purpose of the Uniscribe APIs were.  Not sure what there equivalent is in .NET yet.</description></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95750</link><pubDate>Thu, 25 Mar 2004 04:38:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95750</guid><dc:creator>Simon Cooke [exMSFT]</dc:creator><description>&amp;gt; Most coders do not care about UNICODE all &lt;br&gt;&amp;gt; they need to know is how do I decode/encode &lt;br&gt;&amp;gt; it and how do I detect it and what APIS work &lt;br&gt;&amp;gt; with it. &lt;br&gt;&amp;gt; Afer all isn't that what the purpose of the &lt;br&gt;&amp;gt; Uniscribe APIs were. Not sure what there &lt;br&gt;&amp;gt; equivalent is in .NET yet. &lt;br&gt;&lt;br&gt;No equivalent in .NET, alas. And Uniscribe is one scary, badly documented, weak-on-samples API.&lt;br&gt;&lt;br&gt;It's fine if you're doing single-style text, but the moment you go for something with more to it, or start worrying about resolution independent layout, and you start biting your fingernails...</description></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#95757</link><pubDate>Thu, 25 Mar 2004 04:51:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:95757</guid><dc:creator>/dev/null</dc:creator><description>That was insightful !&lt;br&gt;&lt;br&gt;But how come other editors open the file correctly (Wordpad, emacs)? I mean, how do they come to know what encoding is it.&lt;br&gt;&lt;br&gt;And if they can figure out, then why not Notepad?&lt;br&gt;</description></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#96087</link><pubDate>Thu, 25 Mar 2004 15:52:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:96087</guid><dc:creator>Jordan Russell</dc:creator><description>Notepad would get it right too if it didn't employ IsTextUnicode's documented-to-be-unreliable &amp;quot;statistical analysis&amp;quot; (IS_TEXT_UNICODE_STATISTICS).</description></item><item><title>re: Notepad bug? Encoding issue?</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#96239</link><pubDate>Thu, 25 Mar 2004 21:52:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:96239</guid><dc:creator>David Cumps</dc:creator><description /></item><item><title>re: Some files come up strange in Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#97312</link><pubDate>Sat, 27 Mar 2004 08:23:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:97312</guid><dc:creator>Raymond Chen</dc:creator><description>Bertg: On why Notepad doesn't prompt if it's not sure: Because it's almost never 100% sure. The file &amp;quot;Hi&amp;quot; sure looks like 8-bit ASCII but maybe it's the single Unicode character U+6948. The UTF-8 file &amp;quot;Hi&amp;quot; might actually be an 8-bit file that happens to begin with EF BB BF (&amp;quot;&amp;#239;&amp;#187;&amp;#191;&amp;quot;).&lt;br&gt;&lt;br&gt;If Notepad prompted if there was ambiguity, it would be prompting an awful lot.&lt;br&gt;&lt;br&gt;You can try to override the autodetector from the File.Open dialog if you find that it detected incorrectly.</description></item><item><title>How to Determine Text File Encoding</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#464618</link><pubDate>Tue, 13 Sep 2005 13:49:21 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:464618</guid><dc:creator>CodeSnipers.com</dc:creator><description>With the explosion of international text resources brought by the Internet, the standards for determining file encodings have become more important. This is my attempt at making the text file encoding issues digestible by leaving out some of the unimporta</description></item><item><title>The Binary Cult Blog  </title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#634945</link><pubDate>Sat, 17 Jun 2006 06:34:32 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:634945</guid><dc:creator>The Binary Cult Blog  </dc:creator><description>PingBack from &lt;a rel="nofollow" target="_new" href="http://blog.binarycult.com/?p="&gt;http://blog.binarycult.com/?p=&lt;/a&gt;</description></item><item><title>My Wierd Wired World  &amp;raquo; Blog Archive   &amp;raquo; Buggy Notepad</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#638827</link><pubDate>Tue, 20 Jun 2006 11:36:04 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:638827</guid><dc:creator>My Wierd Wired World  » Blog Archive   » Buggy Notepad</dc:creator><description>PingBack from &lt;a rel="nofollow" target="_new" href="http://ychittaranjan.wordpress.com/2006/06/20/buggy-notepad/"&gt;http://ychittaranjan.wordpress.com/2006/06/20/buggy-notepad/&lt;/a&gt;</description></item><item><title>El M??dem  &amp;raquo; Blog Archive   &amp;raquo; Referencia a Bush en Windows - el motivo</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#1712328</link><pubDate>Mon, 19 Feb 2007 09:22:06 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1712328</guid><dc:creator>El M??dem  » Blog Archive   » Referencia a Bush en Windows - el motivo</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://www.elmodem.com/archivo/2007/02/19/referencia-a-bush-en-windows-el-motivo/"&gt;http://www.elmodem.com/archivo/2007/02/19/referencia-a-bush-en-windows-el-motivo/&lt;/a&gt;&lt;/p&gt;
</description></item><item><title>The Notepad file encoding problem, redux</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#2164898</link><pubDate>Tue, 17 Apr 2007 20:58:59 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:2164898</guid><dc:creator>The Old New Thing</dc:creator><description>&lt;p&gt;Let's take another look.&lt;/p&gt;</description></item><item><title>' + title + ' - ' + basename(imgurl) + '(' + w + 'x' + h +')</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#3310043</link><pubDate>Fri, 15 Jun 2007 12:35:20 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3310043</guid><dc:creator>' + title + ' - ' + basename(imgurl) + '(' + w + 'x' + h +')</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://kent.ppolis.com/?p=130"&gt;http://kent.ppolis.com/?p=130&lt;/a&gt;&lt;/p&gt;
</description></item><item><title>BOM BOM BOM &amp;laquo; ???????????????</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#4994356</link><pubDate>Wed, 19 Sep 2007 11:21:51 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4994356</guid><dc:creator>BOM BOM BOM « ???????????????</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://atedev.wordpress.com/2007/09/19/bom-bom-bom/"&gt;http://atedev.wordpress.com/2007/09/19/bom-bom-bom/&lt;/a&gt;&lt;/p&gt;
</description></item><item><title>Mails that I really, really don't want to receive - 2</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#5122511</link><pubDate>Tue, 25 Sep 2007 18:19:12 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:5122511</guid><dc:creator>MSDN Blog Postings  » Mails that I really, really don't want to receive - 2</dc:creator><description>&lt;P&gt;PingBack from &lt;A href="http://blogs.msdn.com/pranavwagh/archive/2007/09/25/mails-that-i-really-really-don-t-want-to-receive-2.aspx"&gt;http://blogs.msdn.com/pranavwagh/archive/2007/09/25/mails-that-i-really-really-don-t-want-to-receive-2.aspx&lt;/A&gt;&lt;/P&gt;</description></item><item><title>Microsoft crazy facts - WCCFtech.com</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#8434779</link><pubDate>Mon, 28 Apr 2008 11:07:52 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8434779</guid><dc:creator>Microsoft crazy facts - WCCFtech.com</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://wccftech.com/forum/microsoft-crazy-facts-18198.html#post212534"&gt;http://wccftech.com/forum/microsoft-crazy-facts-18198.html#post212534&lt;/a&gt;&lt;/p&gt;
</description></item><item><title>Conspira??ia teoriei &amp;laquo; Fly on the Windscreen</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#9043695</link><pubDate>Wed, 05 Nov 2008 10:59:03 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9043695</guid><dc:creator>Conspira??ia teoriei &amp;laquo; Fly on the Windscreen</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://dorinlazar.wordpress.com/2008/11/05/conspiratia-teoriei/"&gt;http://dorinlazar.wordpress.com/2008/11/05/conspiratia-teoriei/&lt;/a&gt;&lt;/p&gt;
</description></item><item><title>StUF &amp;#8211; receiving data from a provider where UTF-8 is in fact ISO-8859 &amp;laquo; The Wiert Corner</title><link>http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx#9599710</link><pubDate>Sun, 10 May 2009 01:38:37 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9599710</guid><dc:creator>StUF &amp;#8211; receiving data from a provider where UTF-8 is in fact ISO-8859 &amp;laquo; The Wiert Corner</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://wiert.wordpress.com/2009/05/08/stuf-receiving-data-from-a-provider-where-utf-8-is-in-fact-iso-8859/"&gt;http://wiert.wordpress.com/2009/05/08/stuf-receiving-data-from-a-provider-where-utf-8-is-in-fact-iso-8859/&lt;/a&gt;&lt;/p&gt;
</description></item></channel></rss>