<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Harder intermediate forms of characters</title><link>http://blogs.msdn.com/b/michkap/archive/2006/05/14/597198.aspx</link><description>In the post Getting intermediate forms , I gave an example three character sequences that look the same and that are canonically equivalent according to Unicode: 
 
 ễ U+1ec5 LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE 
 ễ U+0065 U+0302 U+0303</description><dc:language>en-US</dc:language><generator>Telligent Evolution Platform Developer Build (Build: 5.6.50428.7875)</generator><item><title>The whole truth about MB_PRECOMPOSED and MB_COMPOSITE</title><link>http://blogs.msdn.com/b/michkap/archive/2006/05/14/597198.aspx#9644437</link><pubDate>Wed, 27 May 2009 17:02:24 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9644437</guid><dc:creator>Sorting it all Out</dc:creator><description>&lt;p&gt;As a by the way, this blog does NOT represent anything beyond my own personal thoughts. You could even&lt;/p&gt;
&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9644437" width="1" height="1"&gt;</description></item><item><title>Frost's The Form Not Taken</title><link>http://blogs.msdn.com/b/michkap/archive/2006/05/14/597198.aspx#9220171</link><pubDate>Mon, 15 Dec 2008 14:31:45 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9220171</guid><dc:creator>Sorting it all Out</dc:creator><description>&lt;p&gt;Over in the Suggestion Box, Aaron asked: Hi again - question about one of your favorite codepages - 1258&lt;/p&gt;
&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9220171" width="1" height="1"&gt;</description></item><item><title>re: Harder intermediate forms of characters</title><link>http://blogs.msdn.com/b/michkap/archive/2006/05/14/597198.aspx#598578</link><pubDate>Tue, 16 May 2006 06:53:42 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:598578</guid><dc:creator>Dean Harding</dc:creator><description>Igor: That's probably the font more than anything. I'd say that for #1 and #3, Uniscribe is finding a precomposed glyph 'a' with ogonek in the font, then &amp;quot;manually&amp;quot; putting the acute in place, while for #2 and #4 it's doing it the other way around.&lt;br&gt;&lt;br&gt;And my guess is that ogonek on the precomposed glyph looks slightly different to the non-precomposed ogonek.&lt;br&gt;&lt;br&gt;But that's just a guess...&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=598578" width="1" height="1"&gt;</description></item><item><title>re: Harder intermediate forms of characters</title><link>http://blogs.msdn.com/b/michkap/archive/2006/05/14/597198.aspx#598179</link><pubDate>Mon, 15 May 2006 21:12:42 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:598179</guid><dc:creator>Igor Tandetnik</dc:creator><description>Re: modern fonts ... &amp;nbsp;will treat all four strings as ... appearing ... equal&lt;br&gt;&lt;br&gt;Actually, reading your post with IE6 on WinXP SP2, #1 and #3 look a little bit differently from #2 and #4. In the former the ogonek is correctly attached to the letter 'a'. In the latter, the ogonek is shifted one pixel to the right.&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=598179" width="1" height="1"&gt;</description></item><item><title>re: Harder intermediate forms of characters</title><link>http://blogs.msdn.com/b/michkap/archive/2006/05/14/597198.aspx#598143</link><pubDate>Mon, 15 May 2006 20:16:31 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:598143</guid><dc:creator>Dan Manchester</dc:creator><description>Michael,&lt;br&gt;&lt;br&gt;Thanks for the great blog. Your mention of intermedate forms reminded me of something I saw recently when working with Vietnamese text on Windows.&lt;br&gt;&lt;br&gt;By way of background, I sometimes need to encode text via a legacy codepage. Word 2003's ability to do the needed conversion generally works out very well for me.&lt;br&gt;&lt;br&gt;However, on the occasion in question, Word was unwilling to encode many of the accented characters--for example, an &amp;quot;e&amp;quot; with a circumflex and an acute accent--found in my Vietnamese text. I figured that these characters had to somehow be supported by codepage #1258, so I investigated further.&lt;br&gt;&lt;br&gt;It turned out that the characters that Word wouldn't encode were generally pre-composed characters. However, after I manually decomposed them--for the aforementioned example, I swapped in an &amp;quot;e&amp;quot; with a circumflex and added a combining acute accent--Word produced a usable encoded version.&lt;br&gt;&lt;br&gt;It seems like Word could do this decomposition itself without too much trouble. Is that a feature that has simply never been added? Or are there complexities here that I'm not considering?&lt;br&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=598143" width="1" height="1"&gt;</description></item></channel></rss>