<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others)</title><link>http://blogs.msdn.com/michkap/archive/2007/05/14/2629747.aspx</link><description>(apologies to George Orwell, of course!) Val asks: Michael, I've been reading your "Striping Diacritics" post, and it's been a great help. I've also been comparing it with another version I've seen. This other version is similar to yours, except that</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>re: Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others)</title><link>http://blogs.msdn.com/michkap/archive/2007/05/14/2629747.aspx#2635201</link><pubDate>Tue, 15 May 2007 01:50:25 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:2635201</guid><dc:creator>Dean Harding</dc:creator><description>&lt;p&gt;The ultimate solution would be to build up your own table of &amp;quot;non-non-spacing marks.&amp;quot; Bonus points if you make it locale-specific (so &amp;#229; would be in the English one but not the Swedish). Of course, this may be a case of overengineering ;)&lt;/p&gt;
</description></item><item><title>re: Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others)</title><link>http://blogs.msdn.com/michkap/archive/2007/05/14/2629747.aspx#3156826</link><pubDate>Fri, 08 Jun 2007 10:21:49 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3156826</guid><dc:creator>Peter Karlsson</dc:creator><description>&lt;p&gt;Of course, in the Swedish case, a “proper” RemoveDiacritics(&amp;quot;w&amp;#252;&amp;#233;&amp;#229;&amp;#228;&amp;#246;&amp;#230;&amp;#248;&amp;#224;&amp;quot;) should produce “vy&amp;#233;&amp;#229;&amp;#228;&amp;#246;&amp;#228;&amp;#246;a”. But I guess implementing that could be a bit more difficult…&lt;/p&gt;</description></item><item><title>año del ano, a.k.a. This sentence has several non-skarklish Spanish flutzpahs....</title><link>http://blogs.msdn.com/michkap/archive/2007/05/14/2629747.aspx#3162504</link><pubDate>Fri, 08 Jun 2007 16:42:44 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3162504</guid><dc:creator>Sorting It All Out</dc:creator><description>&lt;p&gt;The primary title of this post is meant to be a warning. One that you probably won't get unless you know&lt;/p&gt;
</description></item><item><title>re: Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others)</title><link>http://blogs.msdn.com/michkap/archive/2007/05/14/2629747.aspx#3163717</link><pubDate>Fri, 08 Jun 2007 17:37:19 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3163717</guid><dc:creator>Michael S. Kaplan</dc:creator><description>&lt;p&gt;Hey Peter -- very good point. I'll have to talk about this soon....&lt;/p&gt;
</description></item><item><title>Normalize Wide Shut</title><link>http://blogs.msdn.com/michkap/archive/2007/05/14/2629747.aspx#4427194</link><pubDate>Fri, 17 Aug 2007 10:46:38 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4427194</guid><dc:creator>Sorting It All Out</dc:creator><description>&lt;p&gt;(Apologies to Stanley Kubrick, of course!) It was almost the very first blog post I ever wrote, back&lt;/p&gt;</description></item><item><title>I am not a nudist, but I do support stripping when it is appropriate, part 1</title><link>http://blogs.msdn.com/michkap/archive/2007/05/14/2629747.aspx#4736732</link><pubDate>Tue, 04 Sep 2007 10:31:06 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4736732</guid><dc:creator>Sorting It All Out</dc:creator><description>&lt;p&gt;The title is correct: I am not a nudist. I think people who are nudists are just fine and I enjoyed the&lt;/p&gt;
</description></item><item><title>re: Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others)</title><link>http://blogs.msdn.com/michkap/archive/2007/05/14/2629747.aspx#8519155</link><pubDate>Mon, 19 May 2008 18:34:59 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8519155</guid><dc:creator>Steven Sudit</dc:creator><description>&lt;P&gt;For another approach to normalization, take a look at: &lt;A href="http://www.codeproject.com/KB/cs/UnicodeNormalization.aspx" target=_new rel=nofollow&gt;http://www.codeproject.com/KB/cs/UnicodeNormalization.aspx&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;I'm not sure if it's faster or slower than the method mentioned in this blog, but it is smart enough to remove diacritics only when they're attached to Latin characters, which leaves such things as Han radicals alone.&lt;/P&gt;</description></item><item><title>re: Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others)</title><link>http://blogs.msdn.com/michkap/archive/2007/05/14/2629747.aspx#9021275</link><pubDate>Wed, 29 Oct 2008 03:38:27 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9021275</guid><dc:creator>Jan Kučera</dc:creator><description>&lt;p&gt;Any way for the .NET Compact Framework people out there? (as the normalization does not seem to be available for them...)&lt;/p&gt;</description></item><item><title>Wonder why something is not in the Compact Framework? The answer is in the question!</title><link>http://blogs.msdn.com/michkap/archive/2007/05/14/2629747.aspx#9021746</link><pubDate>Wed, 29 Oct 2008 10:04:29 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9021746</guid><dc:creator>Sorting it all Out</dc:creator><description>&lt;p&gt;Regular reader Jan Kučera, in response to Stripping is an interesting job (aka On the meaning of meaningless,&lt;/p&gt;
</description></item><item><title>re: Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others)</title><link>http://blogs.msdn.com/michkap/archive/2007/05/14/2629747.aspx#9026286</link><pubDate>Fri, 31 Oct 2008 12:38:05 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9026286</guid><dc:creator>mkukik</dc:creator><description>&lt;p&gt;In case of stripping diacritics from Latin characters:&lt;/p&gt;
&lt;p&gt;Combining above method with converting Unicode to Western codepage, strips even more diarcritics... Like Polish Łł&lt;/p&gt;
&lt;p&gt;However it is still not 100%&lt;/p&gt;
&lt;p&gt;string unicodeStringOrig = &amp;quot;SE:&amp;#197;&amp;#229;&amp;#196;&amp;#228;&amp;#214;&amp;#246;; PL:ĄąĆćĘęŁłŃń&amp;#211;&amp;#243;ŚśŹźŻż; SK:ľščťž&amp;#253;&amp;#225;&amp;#237;&amp;#233;&amp;#250;&amp;#228;&amp;#244;ňďĽŠČŤŽ&amp;#221;&amp;#193;&amp;#205;&amp;#201;&amp;#218;&amp;#196;&amp;#212;ŇĎ; HU:&amp;#235;ő&amp;#252;űŐ&amp;#220;Ű; ES:&amp;#209;&amp;#241;&amp;#191;; CA:&amp;#224;&amp;#232;&amp;#242;&amp;#231;&amp;#239;&amp;quot;;&lt;/p&gt;
&lt;p&gt;string unicodeString = RemoveDiacritics(unicodeStringOrig);&lt;/p&gt;
&lt;p&gt;Encoding nonunicode = Encoding.GetEncoding(850);&lt;/p&gt;
&lt;p&gt;Encoding unicode = Encoding.Unicode;&lt;/p&gt;
&lt;p&gt;byte[] unicodeBytes = unicode.GetBytes(unicodeString);&lt;/p&gt;
&lt;p&gt;byte[] nonunicodeBytes = Encoding.Convert(unicode, nonunicode, unicodeBytes);&lt;/p&gt;
&lt;p&gt;char[] nonunicodeChars = new char[nonunicode.GetCharCount(nonunicodeBytes, 0, nonunicodeBytes.Length)];&lt;/p&gt;
&lt;p&gt;nonunicode.GetChars(nonunicodeBytes, 0, nonunicodeBytes.Length, nonunicodeChars, 0);&lt;/p&gt;
&lt;p&gt;string nonunicodeString = new string(nonunicodeChars);&lt;/p&gt;</description></item><item><title>re: Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others)</title><link>http://blogs.msdn.com/michkap/archive/2007/05/14/2629747.aspx#9938091</link><pubDate>Thu, 17 Dec 2009 09:54:52 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9938091</guid><dc:creator>Amino</dc:creator><description>&lt;p&gt;Is it possible to work arround it, meaning addingDiacritics?&lt;/p&gt;</description></item></channel></rss>