<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx</link><description>(The alternate title should be spoken with either a circa-1982 Jeff Spicoli or circa-1989 Theodore "Ted" Logan mannerism and accent) U+feff has two jobs in the Unicode standard: Job #1 , and its namesake, is as a ZERO WIDTH NO-BREAK SPACE. The name kind</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#357393</link><pubDate>Thu, 20 Jan 2005 18:32:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:357393</guid><dc:creator>Mike Dunn</dc:creator><description>Strange things are afoot at the U+004B U+20DD ;)</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#357523</link><pubDate>Thu, 20 Jan 2005 21:07:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:357523</guid><dc:creator>Centaur</dc:creator><description>The XML specification explicitly permits a UTF-16 BOM at the beginning of the file or stream. Otherwise, it must start with the XML declaration (&amp;lt;?xml version=…&amp;gt;), no whitespace or other characters allowed. At least that’s how I’d interpret sections 4.3.3 and 2.1.</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#357597</link><pubDate>Thu, 20 Jan 2005 22:45:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:357597</guid><dc:creator>Dean Harding</dc:creator><description>Heh, I used to work for Unisys.  I always felt bad correcting people when they thought I said &amp;quot;Unicef&amp;quot;, cause suddenly I'm not such the good samaritan that they thought I was...</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#357804</link><pubDate>Fri, 21 Jan 2005 01:04:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:357804</guid><dc:creator>Michael Kaplan</dc:creator><description>Mike Dunn -- something not Kosher? :-)&lt;br&gt;&lt;br&gt;Centaur -- the XML spec allows the BOM; it even describes it. So anyone who does not allow it does so at their peril....</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#357818</link><pubDate>Fri, 21 Jan 2005 01:43:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:357818</guid><dc:creator>Michael Kaplan</dc:creator><description>The Unicode FAQ talks about this issue a bit, also.&lt;br&gt;&lt;br&gt;&lt;a target="_new" href="http://www.unicode.org/faq/utf_bom.html#BOM"&gt;http://www.unicode.org/faq/utf_bom.html#BOM&lt;/a&gt;&lt;br&gt;&lt;br&gt;With the number of bytes wasted in web/email communication over a character that takes up only 2-4 bytes in storage and no visible space, it is no wonder that people find Unicode to be complicated!</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#357872</link><pubDate>Fri, 21 Jan 2005 04:17:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:357872</guid><dc:creator>Michael Grier [MSFT]</dc:creator><description>Back in visual studio, we had a few people who were really focussed on getting the editors to be really good Unicode citizens.  My (possibly revisionist) history is that we actually introduced use of the utf-8 BOM over there around the time of win98 (vs 6).  NT caught up when visual studio users were creating &amp;quot;text files&amp;quot; (whatever the heck /that/ means... :-)  that other people couldn't open in notepad.&lt;br&gt;&lt;br&gt;Re: so much attention:&lt;br&gt;&lt;br&gt;My 1st dev mgr at Microsoft always noted that it was the little picayune issues that drew the most heated debates because everyone felt they understood /all/ the issues.&lt;br&gt;&lt;br&gt;to quote Kosh: the avelance has started, it is too late for the pebbles to vote.&lt;br&gt;&lt;br&gt;UTF-8 has a BOM and people just need to learn to love it.  (The tricky question is when to preserve/not preserve a BOM found in a byte stream...)  I think you're right; just because something is 8-bit clean doesn't make it a good utf-8 citizen.  It has to be very careful not to split an encoding (just like a good UTF-16 citizen has to know not to split high/low surrogates...)&lt;br&gt;&lt;br&gt;</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#357910</link><pubDate>Fri, 21 Jan 2005 06:29:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:357910</guid><dc:creator>Michael Kaplan</dc:creator><description>Interesting! I had not heard this before... but I guess the timing is right. I never remember trying UTF-8 in VS6, did it really work?</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#357941</link><pubDate>Fri, 21 Jan 2005 07:42:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:357941</guid><dc:creator>Mo</dc:creator><description>I think the confusion reigns because people expect saving a file as UTF-8 to mean &amp;quot;Save it as UTF-8 if it contains non-ASCII characters, and ASCII otherwise&amp;quot;, so they expect the BOM to be only present if characters with values greater than 127 are contained within the file.</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#357968</link><pubDate>Fri, 21 Jan 2005 08:38:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:357968</guid><dc:creator>Serge Wautier</dc:creator><description>What is supposed to be the caret behaviour when encountering such a character ?&lt;br&gt;&lt;br&gt;I pasted the sponsor message into Notepad and I noticed that even though you don't see the BOM, you can definitely 'feel' it when moving the caret : You need to press the arrow key twice between the 2 &amp;quot;.&lt;br&gt;&lt;br&gt;Does it mean that it's not completely true to say that apps may safely ignore it, especially at the beginning of a doc: If the app provides edition of the contents, users will have a weird experience and bug reports will flood in !&lt;br&gt;&lt;br&gt;Also, how does text rendering work ? The BOM is not in the font I use in Notepad.</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#357982</link><pubDate>Fri, 21 Jan 2005 09:39:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:357982</guid><dc:creator>TLKH</dc:creator><description>As far as I remember from the time when I implemented unicode line breaking algorithm for my editor, U+200d allows breaking before/after it.&lt;br&gt;The real zero-width-non-breaking-space character (except for BOM) is U+2060, not mentioned in this article.</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#358099</link><pubDate>Fri, 21 Jan 2005 14:38:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:358099</guid><dc:creator>Michael Kaplan</dc:creator><description>TLKH is right -- U+2060 (WORD JOINER) is the preferred character that took on the job formerly occupied by Job #1 of the ZWNBSP. I will put a correction in on the page).</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#358103</link><pubDate>Fri, 21 Jan 2005 14:45:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:358103</guid><dc:creator>Michael Kaplan</dc:creator><description>Serge -- hard to say what the caret behavior should be here -- after all it *is* a space, even though it is zero width. The fact that it is deprecated makes it even less likely that implementations will do much more than ignore it....</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#358591</link><pubDate>Sat, 22 Jan 2005 07:51:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:358591</guid><dc:creator>Robert</dc:creator><description>&amp;quot;For the record, it has occurred to me in the past that it would not be a bad idea to add an option to save files without the BOM.&amp;quot;&lt;br&gt;&lt;br&gt;It would be convienient if UTF-8 could be selected as the &amp;quot;ANSI&amp;quot; codepage in the control panel's advanced regional and language options. Then Notepad and many applications designed for ANSI would automatically support UTF-8 (without BOM). I would prefer this because nowerdays I rarely create text files with legacy ANSI encoding.&lt;br&gt;&lt;br&gt;For those few applications that make specific assumptions about the ANSI codepage (hard-coded strings with character codes &amp;gt;= 128 etc.), AppLocale provides a good solution:&lt;br&gt;&lt;br&gt;&lt;a target="_new" href="http://www.microsoft.com/globaldev/tools/apploc.mspx"&gt;http://www.microsoft.com/globaldev/tools/apploc.mspx&lt;/a&gt;&lt;br&gt;&lt;br&gt;(A UTF-8 &amp;quot;ANSI&amp;quot; codepage may cause problems if the API implementation depends on assumptions like &amp;quot;ANSI character &amp;lt;= double-byte&amp;quot;).&lt;br&gt;</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#358597</link><pubDate>Sat, 22 Jan 2005 08:40:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:358597</guid><dc:creator>Michael Kaplan</dc:creator><description>Unfortunately, this is not possible -- there are too many bugs in Windows and in apps for components that will not work with UTF-8 here....</description></item><item><title>UTF-8, BOM, Micrisoft</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#358927</link><pubDate>Sun, 23 Jan 2005 16:08:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:358927</guid><dc:creator>zmx's Weblog/鍾明勳的部落格</dc:creator><description>在 Michael Kaplan 那看到 Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!) 解釋為甚麼 Windows 2000 以後的 Notepad 存 UTF-8 的檔案會加上 BOM(Byte Order Mark, U+FEFF), 主要是因為 UTF-8 和 ASCII 是相容的, 為了避免使用者自己忘記用甚麼存, 造成 UTF-8 檔案用 ASCII...</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#359123</link><pubDate>Sun, 23 Jan 2005 23:22:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:359123</guid><dc:creator>Michael Grier [MSFT]</dc:creator><description>Re: vs6 and utf-8:&lt;br&gt;&lt;br&gt;It did in the new shell (&amp;quot;vegas&amp;quot; as I recall).  Only Visual Interdev and Visual J++ used the new shell.&lt;br&gt;</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#359128</link><pubDate>Sun, 23 Jan 2005 23:42:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:359128</guid><dc:creator>Michael Kaplan</dc:creator><description>Ah yes, that part is true, and I have actually used that before to look at some of the collation source files back before the VS.NET shell was solid enough for daily use. &lt;br&gt;&lt;br&gt;I never knew it drove the Notepad feature, though -- thats cool. Fascinating how one piece of the company drives another, sometimes....</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#360851</link><pubDate>Wed, 26 Jan 2005 16:26:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:360851</guid><dc:creator>Chris Walker</dc:creator><description>I'm been maintaining Notepad for Windows NT/2000/XP/Server 2003 for more than 10 years, so I know some of the history of it. &lt;br&gt;&lt;br&gt;First off, any additional complexity to the interface has always had heavy pushback from management and user. Second, it has had to respond to various changes in commonly used character sets over the years. &lt;br&gt;&lt;br&gt;Notepad has to guess the character encoding if it does not know what it is. It uses the &lt;a title="IsTextUnicode" href="http://msdn.microsoft.com/library/en-us/intl/unicode_81np.asp" target="_blank"&gt;IsTextUnicode&lt;/a&gt;() API to help it, but in the end, it is still a guess. It may be worth a blog entry to discuss just this API and its (mis)usage. &lt;br&gt;&lt;br&gt;Notepad only edits Unicode, so all other formats are converted to Unicode when the file is opened and converted back if possible to its original format when saved. If the saved format is some form of Unicode, it will also output the BOM at the beginning of the file. If the file has a BOM, then there is no need to call the unreliable IsTextUnicode() API the next time it is opened. &lt;br&gt;&lt;br&gt;Notepad remembers what the format of the file was when it was read in and uses this format as the default to save. If the edited file can not be saved in the same format without data loss, the user is warning when saving. Otherwise, no UI is thrown. The net result is ASCII files stay ASCII and Unicode files stay Unicode &lt;br&gt;&lt;br&gt;Notepad will never send the Edit Control the BOM. It will skip the BOM if it exists. &lt;br&gt;&lt;br&gt;History: &lt;br&gt;&lt;br&gt;NT 3.1 shipped with an ASCII only Notepad. In the fall of 1993, several applications were converted to use Unicode. At this time, Notepad started using the BOM. I can't tell you how this decision was made; my memory isn't that good. We also converted other applications like Cardfile and Paintbrush. These first shipped on NT 3.5. &lt;br&gt;&lt;br&gt;With the advent of the popularity of the Internet, other character formats needed to be supported. This was how support of the Big Endian Unicode came from. I'm sorry for the name that was used, but that was the best we could do at the time. The BOM helped a lot for this. I believe the first time Big Endian Unicode support was shipped was NT 5.0, er, Windows 2000. &lt;br&gt;&lt;br&gt;It would be easy enough to have Notepad not output a BOM. It would *just* be a UI change in the SaveFile dialog. &lt;br&gt;&lt;br&gt;The performance of Notepad on large files is not completely related to the fact that it reads the whole file into memory. Remember that in the ASCII file case, it has to convert the whole file to Unicode before it can start. As it turns out, this is very fast compared to the CPU bound work that the Multiline Edit Control does to build up some internal data structures. Of course, if reading the file into memory requires the OS to page to the pagefile, you are going to be hurting. Even after you get a big file swallowed, your experience editing this file will not be pretty. Just try adding for deleting a character at a time. The Edit Control exposes the memory to the application and is required that it be ready to save at any time. So you can imagine that adding one character will shuffle all the characters above it down one character. Fine for small files, but a killer algorithm for a large file. &lt;br&gt;&lt;br&gt;One could add complexity to Notepad and solve some of these problems. One could had a &amp;quot;BOM&amp;quot; checkbox to the save dialog complete with an explanation as to what a BOM is and why the end user should care. One could add options to save ASCII files in various code pages complete with code page documentation. One could add a preview on the Open Dialog and allow the user to pick the proper character encoding. One could scan files looking for encoding text and if found use that as the default. &lt;br&gt;&lt;br&gt;Notepad is not just a wrapper for the Edit control. In addition to the file encoding problems, it also has reasonable printing and a text search capability which has some interesting International issues. It hosts just about every Common Dialog (Find, ReplaceText, Print, PageSetup, Open, Save). As far as I can figure, it is only missing ChooseColor. I know this because bugs in Common Dialogs are often reported to me first. &lt;br&gt;&lt;br&gt;An interesting blog entry would discuss the different End of Line sequence standards. Windows uses carriage-return/linefeed pairs as the legal EOL, while Un*x implementations tend to use newline. You can see this in Notepad when you load a file that just uses the newline as the EOL since the Edit Control uses the Windows standard and bare newline characters are not considered EOLs.</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#360861</link><pubDate>Wed, 26 Jan 2005 16:39:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:360861</guid><dc:creator>Michael Kaplan</dc:creator><description>Awesome info, Chris! And I agree that some of the entries you mention would make good future posts &lt;br&gt;&lt;br&gt;Especially &lt;a title="IsTextUnicode" href="http://msdn.microsoft.com/library/en-us/intl/unicode_81np.asp" target="_blank"&gt;IsTextUnicode&lt;/a&gt; (which is kind of a pain for us since we do not own it but everyone assumes we do), the EOL issue in from *nix-created files (which gets some complaints but much less often than the BOM issue).&lt;br&gt;&lt;br&gt;Sorry if any of the backtalk about &amp;quot;just an edit control&amp;quot; caused grief -- it is a shorthand we sometimes when talking about rendering issues to distinguish it from WordPad (which is &amp;quot;just&amp;quot; a wrapper around RichEdit, in the same sense). &lt;br&gt;&lt;br&gt;Obviously both are more than &amp;quot;just&amp;quot; a wrapper so I will choose those words with more care. As a team that finds most of its bug reported through applications that use us even if the bug is not ours, I can understand bugs being reported that actually are not ours....&lt;br&gt;&lt;br&gt;But thankis for the info, it makes the post itself markedly better to have the full story from one who knows rather than after-the-fact guesses and suppositions.&lt;br&gt;&lt;br&gt;I don't suppose you know who we should contact now if we wanted to push the BOM issue? (feel free to send it to me offline if you're prefer). :-)</description></item><item><title>Getting exactly ONE Unicode code point out of UTF-8</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#420709</link><pubDate>Sat, 21 May 2005 10:52:19 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:420709</guid><dc:creator>Sorting It All Out</dc:creator><description>Now this is a question that I would make into an interview question, if only there were some way to do...</description></item><item><title>unicodeFFFE... is Microsoft off its rocker?</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#463555</link><pubDate>Sun, 11 Sep 2005 10:23:49 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:463555</guid><dc:creator>Sorting It All Out</dc:creator><description>This is an issue that has been around for a long time.&lt;br&gt;Back in February (geez, I really have been blogging...</description></item><item><title>Working hard to detect code pages</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#463573</link><pubDate>Sun, 11 Sep 2005 11:19:09 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:463573</guid><dc:creator>Anonymous</dc:creator><description>Yesterday, Buck Hodges was talking about how TFS Version Control determines a file's encoding: ...</description></item><item><title>Return of the Mark</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#527386</link><pubDate>Wed, 08 Feb 2006 11:05:56 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:527386</guid><dc:creator>Sorting It All Out</dc:creator><description>Well, the RIGHT-TO-LEFT MARK (and its cousin the LEFT-TO-RIGHT MARK), that is!&lt;br&gt;(apologies to those of...</description></item><item><title>7-bit UTF-8?</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#608510</link><pubDate>Sat, 27 May 2006 04:02:04 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:608510</guid><dc:creator>Sorting It All Out</dc:creator><description>A while back, regular reader 'Maurits' noted in the Suggestion Box: &lt;br&gt;&lt;br&gt;Just submitted my first PSS support...</description></item><item><title>Writing a Unicode file via perl ...</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#620988</link><pubDate>Wed, 07 Jun 2006 20:54:38 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:620988</guid><dc:creator>183</dc:creator><description>This is small piece of code and lengthy desc of how it works, for createing / writing a Unicode file via a perl script.</description></item><item><title>You can just byte me</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#890106</link><pubDate>Sat, 28 Oct 2006 23:35:58 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:890106</guid><dc:creator>Sorting It All Out</dc:creator><description>&lt;p&gt;Evan asked in one the many programming aliases: Hi: Anyone knows why there are 3 extra characters added&lt;/p&gt;
</description></item><item><title>No byte order marks when using encodings in StreamWriters?</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#1108173</link><pubDate>Mon, 20 Nov 2006 13:41:26 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1108173</guid><dc:creator>Sorting It All Out</dc:creator><description>&lt;p&gt;Sometimes the convenient shortcuts blind us to the functionality we actually need, forcing us to work&lt;/p&gt;
</description></item><item><title>Not a kernel guy » Забавное чтиво про Unicode BOM.</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#1136100</link><pubDate>Fri, 24 Nov 2006 08:00:42 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1136100</guid><dc:creator>Not a kernel guy » ???????????????? ?????????? ?????? Unicode BOM.</dc:creator><description>&lt;P&gt;PingBack from &lt;A href="http://blog.not-a-kernel-guy.com/2006/11/23/104" target=_new rel=nofollow&gt;http://blog.not-a-kernel-guy.com/2006/11/23/104&lt;/A&gt;&lt;/P&gt;</description></item><item><title>RichTextBox breaking ranks?</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#1460191</link><pubDate>Sat, 13 Jan 2007 11:06:28 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1460191</guid><dc:creator>Sorting It All Out</dc:creator><description>&lt;p&gt;The other day someone noticed something about the WinForms RichTextBox . What he noticed was that the&lt;/p&gt;
</description></item><item><title>El M??dem  &amp;raquo; Blog Archive   &amp;raquo; Referencia a Bush en Windows - el motivo</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#1712329</link><pubDate>Mon, 19 Feb 2007 09:22:11 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1712329</guid><dc:creator>El M??dem  » Blog Archive   » Referencia a Bush en Windows - el motivo</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://www.elmodem.com/archivo/2007/02/19/referencia-a-bush-en-windows-el-motivo/"&gt;http://www.elmodem.com/archivo/2007/02/19/referencia-a-bush-en-windows-el-motivo/&lt;/a&gt;&lt;/p&gt;
</description></item><item><title>If a bunch of specific Unicode characters can no longer live in the same apartment together, can they really claim that they needed their space?</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#2692394</link><pubDate>Thu, 17 May 2007 12:46:02 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:2692394</guid><dc:creator>Sorting It All Out</dc:creator><description>&lt;p&gt;So dipaksmistry asks (in an off-topic manner in response to this post): Hi, I have written a small programme.&lt;/p&gt;
</description></item><item><title>BOM BOM BOM « 就是愛程式</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#4994357</link><pubDate>Wed, 19 Sep 2007 11:21:57 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4994357</guid><dc:creator>BOM BOM BOM « ???????????????</dc:creator><description>&lt;P&gt;PingBack from &lt;A href="http://atedev.wordpress.com/2007/09/19/bom-bom-bom/" target=_new rel=nofollow&gt;http://atedev.wordpress.com/2007/09/19/bom-bom-bom/&lt;/A&gt;&lt;/P&gt;</description></item><item><title>Byte Order Mark - BOM</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#6934225</link><pubDate>Tue, 01 Jan 2008 13:14:01 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:6934225</guid><dc:creator>?????????????? - ?????????? ?????????????? » Byte Order Mark - BOM</dc:creator><description>&lt;P&gt;PingBack from &lt;A href="http://www.almashroo.com/articles/byte-order-mark-bom/" target=_new rel=nofollow&gt;http://www.almashroo.com/articles/byte-order-mark-bom/&lt;/A&gt;&lt;/P&gt;</description></item><item><title>Why are UTF-8 encoded Unix shell scripts *ever* written or edited in Notepad?</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#8152233</link><pubDate>Tue, 11 Mar 2008 16:23:28 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8152233</guid><dc:creator>Sorting it all Out</dc:creator><description>&lt;p&gt;Everybody hates Microsoft. Well, not everybody. But hating Microsoft seems awfully popular.... It seems&lt;/p&gt;
</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#9471291</link><pubDate>Thu, 12 Mar 2009 06:34:50 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9471291</guid><dc:creator>Yuhong Bao</dc:creator><description>&lt;p&gt;&amp;quot;NT 3.1 shipped with an ASCII only Notepad. In the fall of 1993, several applications were converted to use Unicode. At this time, Notepad started using the BOM. I can't tell you how this decision was made; my memory isn't that good. We also converted other applications like Cardfile and Paintbrush. These first shipped on NT 3.5.&amp;quot;&lt;/p&gt;
&lt;p&gt;More on this at:&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_new" href="http://msdn.microsoft.com/en-us/library/cc194798.aspx"&gt;http://msdn.microsoft.com/en-us/library/cc194798.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;BTW, look at how hard it was to actually insert a Unicode-only character back then. For comparison, today, I use a English Windows PC with Chinese IMEs installed and I use Notepad to create Chinese text files, and the code page of English Windows is 437, which meant that I have to save my Chinese text files as Unicode.&lt;/p&gt;</description></item><item><title>re: Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#9472580</link><pubDate>Fri, 13 Mar 2009 05:53:34 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9472580</guid><dc:creator>Yuhong Bao</dc:creator><description>&lt;p&gt;&amp;quot;NT 3.1 shipped with an ASCII only Notepad.&amp;quot;&lt;/p&gt;
&lt;p&gt;Do you mean ANSI-only?&lt;/p&gt;</description></item><item><title>When keeping things on a level Plane[ 1] doesn't work anymore</title><link>http://blogs.msdn.com/michkap/archive/2005/01/20/357028.aspx#9716289</link><pubDate>Tue, 09 Jun 2009 18:42:37 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9716289</guid><dc:creator>Sorting it all Out</dc:creator><description>&lt;p&gt;It has been over three years (in Every character has a story #4: U+feff (alternate title: UTF-8 is the&lt;/p&gt;
</description></item></channel></rss>