<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>I'm not a Klingon (&lt;span style="font-family:pIqaD,code2000"&gt; &lt;/span&gt;) : sorting</title><link>http://blogs.msdn.com/shawnste/archive/tags/sorting/default.aspx</link><description>Tags: sorting</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>What is Title Case?</title><link>http://blogs.msdn.com/shawnste/archive/2009/08/18/what-is-title-case.aspx</link><pubDate>Tue, 18 Aug 2009 20:30:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9874351</guid><dc:creator>shawnste</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/9874351.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=9874351</wfw:commentRss><description>&lt;P&gt;Disclaimer: I'm not an English teacher (that's my mom), so I'm sure my description of title casing in English probably has exceptions/variations.&lt;/P&gt;
&lt;P&gt;Title casing has an interesting history in computer programming.&amp;nbsp; Programmers like to use CamelCase to make variable names more readable, and, particularly amongst developers native to some languages, there's an idea that title casing is interesting, such as in String.ToTitleCase(), and in Windows 7, LCMapString(LCMAP_TITLECASE).&amp;nbsp; Most title casing algorythms are linguistically bad, even in English.&amp;nbsp; For other languages it's worse.&lt;/P&gt;
&lt;P&gt;ToTitleCase() takes a very simple approach to title casing.&amp;nbsp; Maybe in the future it'll be smarter, but for now it just uppercases the first letter in a group of letters, and tries to pay attention to non-letters and word breaks.&amp;nbsp; It also tries to keep acronyms all upper-case.&lt;/P&gt;
&lt;P&gt;Even in English this is a simplistic approach.&amp;nbsp; The title of this post is "What is Title Case?"&amp;nbsp; Is is supposed to be lower case, but ToTitleCase() would mess it up.&amp;nbsp; Additionally unexpected word breaks or punctuation could trick the algorithm.&amp;nbsp; Even the acronym test isn't complete since it just expects all-upper case&amp;nbsp;and sometimes acronyms keep the lower case of the full title.&amp;nbsp; Also it messess up names like DiSilva or McConnell.&amp;nbsp; Contractions can also be messed up.&lt;/P&gt;
&lt;P&gt;Outside of English, ToTitleCase() rapidly gets silly.&amp;nbsp; In English we&amp;nbsp;capitalize everything except articles, short prepositions and some other short words.&amp;nbsp;&amp;nbsp;In German it's just like a normal sentence, with only nouns getting capitalized, so the English slightly over-eager capitilization behavior becomes very over-eager.&amp;nbsp; Other languages also can have letters before the main word, eg: l'État, so the ToTitleCase rules can mess&amp;nbsp;up those words as well.&lt;/P&gt;
&lt;P&gt;And then there're scripts/languages that don't even have an upper/lower case distinction, so&amp;nbsp;ToTitleCase gets pointless.&lt;/P&gt;
&lt;P&gt;Anyway, use care when using ToTitleCase().&amp;nbsp; It might work&amp;nbsp;in some cases, but don't expect it to work&amp;nbsp;linguistically, particularly&amp;nbsp;globally, particularly in non-English cases.&amp;nbsp; Also maybe we'll get smarter and figure out a more correct way to do it in the future.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;-Shawn&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9874351" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/System.Text/default.aspx">System.Text</category><category domain="http://blogs.msdn.com/shawnste/archive/tags/Custom+Cultures+_2F00_+Locales+_2F00_+CultureInfo/default.aspx">Custom Cultures / Locales / CultureInfo</category><category domain="http://blogs.msdn.com/shawnste/archive/tags/sorting/default.aspx">sorting</category></item><item><title>How come Substring(0, xxx) matches something, but StartsWith returns false?</title><link>http://blogs.msdn.com/shawnste/archive/2008/09/23/how-come-substring-0-xxx-matches-something-but-startswith-returns-false.aspx</link><pubDate>Tue, 23 Sep 2008 21:06:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8962659</guid><dc:creator>shawnste</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/shawnste/comments/8962659.aspx</comments><wfw:commentRss>http://blogs.msdn.com/shawnste/commentrss.aspx?PostID=8962659</wfw:commentRss><description>&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;&lt;FONT size=3&gt;&lt;FONT face=Calibri&gt;I was asked how a string can match a substring of another string, yet StartsWith can return false?&amp;nbsp; For example:&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;&lt;CODE&gt;string str = "Mu\x0308nchen";&lt;BR&gt;string find = "Mu";&lt;BR&gt;Console.WriteLine("Substring: " + (str.Substring(0,2) == find));&lt;BR&gt;Console.WriteLine("StartsWith:" + str.StartsWith(find));&lt;BR&gt;Console.WriteLine("IndexOf:&amp;nbsp;&amp;nbsp; "&amp;nbsp;+ str.IndexOf(find));&lt;/CODE&gt; 
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;returns this:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;&lt;CODE&gt;Substring: True&lt;BR&gt;StartsWith:False&lt;BR&gt;IndexOf:&amp;nbsp;&amp;nbsp; -1&lt;/CODE&gt; 
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;So if you test the first 2 characters with the search string, you'll see that they match, yet StartsWith() returns false, and IndexOf can't find it.&amp;nbsp; This is because the 0308 diacritic is considered part of the&amp;nbsp;u that it is modifying, so it won't be found.&amp;nbsp; In many languages diacritics like this are really different letters.&amp;nbsp; Since you don't expect a == z, then you wouldn't expect u == ü.&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;Doing the substring effectively "breaks" the character, changing its meaning.&amp;nbsp; Substring can even create illegal Unicode if it chops off part of a surrogate pair (eg: U+D800, U+DC00).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;A similar oddity would be characters with no weight like U+FFFD.&amp;nbsp; So if I have str = "A\xFFFD\xFFFD\xFFFD", then all of str.Substring(0,1) == str.Substring(0,2) == str.Substring(0,3) == str.Substring(0,4) == "A".&amp;nbsp; And in this case str.StartsWith("A") would be true.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;Another perhaps unexpected behavior would be unweighted characters (or ignored by a flag) at the beginning of hte string.&amp;nbsp; So if str="\xFFFD" + "A", then str.IndexOf("A") can return 1, yet str.StartsWith() will return true (even though IndexOf didn't return 0).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;Similar behaviors can be seen with LastIndexOf() and EndsWith(), and with the native Vista API FindNlsString and its variations.&amp;nbsp; In addition with the FindNlsString() API, the found substrings may be unexpected.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;SPAN style="COLOR: #1f497d"&gt;&lt;FONT size=3&gt;&lt;FONT face=Calibri&gt;&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8962659" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/shawnste/archive/tags/sorting/default.aspx">sorting</category></item></channel></rss>