Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Jonathan Payne asked if I had an international thought about the palindrome pseudo interview question at this site:
http://channel9.msdn.com/ShowPost.aspx?PostID=19171
I did. :-)
Using the new StringInfo stuff in Whidbey Beta 2:
bool IsPalindrome(string st) { StringInfo si = new StringInfo(st); int count = si.LengthInTextElements; if (count == 0) return false; for (int i = 0; i < (count / 2); i++) { string st1 = si.SubstringByTextElements(i, 1); string st2 = si.SubstringByTextElements(count - i - 1, 1); if (CultureInfo.CurrentCulture.CompareInfo.Compare(st1, st2) != 0) { return(false); } } return (true);}
bool IsPalindrome(string st) { StringInfo si = new StringInfo(st); int count = si.LengthInTextElements;
if (count == 0) return false;
for (int i = 0; i < (count / 2); i++) { string st1 = si.SubstringByTextElements(i, 1); string st2 = si.SubstringByTextElements(count - i - 1, 1);
if (CultureInfo.CurrentCulture.CompareInfo.Compare(st1, st2) != 0) { return(false); } }
return (true);}
Quickest way to handle all those cool issues like cultural sensitivity and combining characters and supplementary characters and such!
(Apologies to those who are offended by the South Park movie scene that inspired the title of this post!)
About a month ago, Daniel J. Smith asked me something that prompted me to say Dere are qvestions? In zat case...
Then last week, Martin Müller asked in the microsoft.public.dotnet.internationalization:
Recently I've stumbled across the fact that the CompareInfo for my default culture de-DE as well as for InvariantCulture considers "ss" and german "ß" (szlig) equivalent, which is not correct!For example, calling lassen".IndexOf("ß") yields 2 instead of 0.CultureInfo.InvariantCulture.CompareInfo.Compare("lassen", "laßen") returns 0, which is wrong, too.Using CompareInfo.IndexOf() without special CompareOptions gives the same incorrect results. When I use CompareOptions.Ordinal, however, IndexOf correctly returns -1 and Compare returns inequality. But CompareOptions.Ordinal cannot be combined with any other flag, so a case insensitive comparison isn't possible this way.This bug occurrs with IndexOf and Compare of both String and CompareInfo.Any comment on this or info when this will be fixed?
Well, I have a comment, but things are working as designed so nothing is going to be "fixed". I will explain....
In the German language, the Sharp S ("ß" or U+00df) is a lowercase letter, and it capitalizes to the letters "SS". Now Microsoft's casing tables only support simple Unicode casing, which does not include any rules that would change the size of the string such as this one. So doing a "ß".ToUpper() call will not return "SS".
(for more info on those casing rules, see CaseFolding.txt in the Unicode Character Database)
But in any case, collation can be a bit more flexible. Since the Sharp S is very much a German letter and not one widely used outside of German, it is included in the default table rules used by all locales (which allows German to be kept in the default table and it will be used by all locales that do not conflict).
But obviously on most locales, "ss" is what uppercases to "SS". Even on German, "ss" would uppercase to "SS".
So it is only logical to assume that in such a case, that if
"ss".ToUpper() == "ß".ToUpper() == "SS"
then
"ss" ≅ "ß"
at least for the technical purpose of facilitating the ability to treat these other cases properly.This why on almost all locales (including the invariant locale), "ß" looks so much like "ss".
This post is brought to you by "ß" (U+00df, a.k.a. LATIN SMALL LETTER SHARP S)And really, who elase would it be? :-)
Like I mentioned yesterday, I have talked a bunch of times about the way that different forms of strings that are canonically equivalent according to Unicode and which actually look identical visually exist in the world.
Yesterday, I mentioned it while I was talking about a few of the gotchas of WideCharToMultiByte. Today I thought I would talk about the other direction, the MultiByteToWideChar API.
First of all, almost all code pages are in Normalizaton Form C (a.k.a. precomposed) at all times (I will talk about the exceptions in a second). Of course Unicode (by which I mean UTF-16 Little Endian, which Microsoft always calls Unicode) can be either Form C (a.k.a. precomposed) or Form D (a.k.a. composite).
If you would like to choose, then you get that option; you can pass either the MB_PRECOMPOSED or MB_COMPOSITE flags. For the reasons of having data that is consistent with the rest of the platform, I would recommend the MB_PRECOMPOSED flag, but either one is legal (just not both).
There is also an MB_USEGLYPHCHARS flag. Now I already beat that particular horse to death when I answered the question what the &%#$ does MB_USEGLYPHCHARS do? So if you want to know more you can look there. You probably do not, at least I hope you do not....
Finally, there is the MB_ERR_INVALID_CHARS flag. The documentation says it all on this flag:
If the function encounters an invalid input character, it fails and GetLastError returns ERROR_NO_UNICODE_TRANSLATION.
Now after the MultiByteToWideChar topic covers these four flags, it gets confusing. It says:
For the code pages in the following table, dwFlags must be zero, otherwise the function fails with ERROR_INVALID_FLAGS. 50220 5022150222502255022750229529365493657002 through 57011 65000 (UTF7)65001 (UTF8)42 (Symbol) Windows XP and later: MB_ERR_INVALID_CHARS is the only dwFlags value supported by Code page 65001 (UTF-8).
For the code pages in the following table, dwFlags must be zero, otherwise the function fails with ERROR_INVALID_FLAGS.
50220 5022150222502255022750229529365493657002 through 57011 65000 (UTF7)65001 (UTF8)42 (Symbol)
Windows XP and later: MB_ERR_INVALID_CHARS is the only dwFlags value supported by Code page 65001 (UTF-8).
Call me crazy, but there probably was not a need to have the sentence before the table and the table conflict with the sentence after the table. It is kind of understandble, but as topics go it has the flavor of a WTF sentence, if you ask me!
It does end on a better note by defining what an invalid character is:
The function fails if MB_ERR_INVALID_CHARS is set and encounters an invalid character in the source string. An invalid character is either, a) a character that is not the default character in the source string but translates to the default character when MB_ERR_INVALID_CHARS is not set, or b) for DBCS strings, a character which has a lead byte but no valid trailing byte. When an invalid character is found, and MB_ERR_INVALID_CHARS is set, the function returns 0 and sets GetLastError with the error ERROR_NO_UNICODE_TRANSLATION.
Oh, and before that it talks about some security considerations (more on these another day).
I am forgetting something now. What was it?
Oh yeah, I was going to talk about the code pages that are not Normalization Form C.
Obviously there is UTF-7 (65000), UTF-8 (65001), and GB-18030 (54936). Since each of these code pages covers the entire Unicode repetoire, each can have characters in Unicode normalization Form C, Form D, or any combination thereof. Some of the other code pages in the table above also fall into this category, but in the case of these three and all the rest, the MB_PRECOMPOSED and MB_COMPOSITE flags are both at best ignored and at worst will cause an ERROR_INVALID_FLAGS to be returned. So you will want to not pass either flag with any of them.
But there is one code page that can have data in either composite or precomposed form -- it is the Vietnamese ACP, code page 1258. It has all of the following entries:
CC = U+0300 : COMBINING GRAVE ACCENTD2 = U+0309 : COMBINING HOOK ABOVEDE = U+0303 : COMBINING TILDEEC = U+0301 : COMBINING ACUTE ACCENTF2 = U+0323 : COMBINING DOT BELOW
The reason for doing this is that there was really not enough room in the code page, otherwise. Unfortunately, there are also some precomposed characters with these accents:
C0 = U+00C0 : LATIN CAPITAL LETTER A WITH GRAVEC1 = U+00C1 : LATIN CAPITAL LETTER A WITH ACUTEC8 = U+00C8 : LATIN CAPITAL LETTER E WITH GRAVEC9 = U+00C9 : LATIN CAPITAL LETTER E WITH ACUTECD = U+00CD : LATIN CAPITAL LETTER I WITH ACUTED1 = U+00D1 : LATIN CAPITAL LETTER N WITH TILDED3 = U+00D3 : LATIN CAPITAL LETTER O WITH ACUTED9 = U+00D9 : LATIN CAPITAL LETTER U WITH GRAVEDA = U+00DA : LATIN CAPITAL LETTER U WITH ACUTEE0 = U+00E0 : LATIN SMALL LETTER A WITH GRAVEE1 = U+00E1 : LATIN SMALL LETTER A WITH ACUTEE8 = U+00E8 : LATIN SMALL LETTER E WITH GRAVEE9 = U+00E9 : LATIN SMALL LETTER E WITH ACUTEED = U+00ED : LATIN SMALL LETTER I WITH ACUTEF1 = U+00F1 : LATIN SMALL LETTER N WITH TILDEF3 = U+00F3 : LATIN SMALL LETTER O WITH ACUTEF9 = U+00F9 : LATIN SMALL LETTER U WITH GRAVEFA = U+00FA : LATIN SMALL LETTER U WITH ACUTE
So you it looks like maybe you could have mixed "Form C" and "Form D" code page 1258 text, doesn't it?
Unfortunately, its not that perfect. There are two error patterns, marked below in RED:
0xc0 with MultiByteToWideChar/MB_PRECOMPOSED --> U+00c00xc0 with MultiByteToWideChar/MB_COMPOSITE --> U+0041 U+03000x41 0xcc with MultiByteToWideChar/MB_PRECOMPOSED --> U+0041 U+03000x41 0xcc with MultiByteToWideChar/MB_COMPOSITE --> U+0041 U+0300
and going the other way:
U+00c0 with WideCharToMultiByte/WC_COMPOSITECHECK --> 0xc0U+00c0 with WideCharToMultiByte --> 0x41 0xccU+0041 U+0300 with WideCharToMultiByte/WC_COMPOSITECHECK --> 0xc0U+0041 U+0300 with WideCharToMultiByte --> 0xc0
The pattern is clear, right? MultiByteToWideChar is not quite smart enough to precompose in Unicode what is composite in cp1258, and WideCharToMultiByte is not quite smart enough to keep composite what is composite in Unicode.
Ah well, nothing is perfect -- the Vietnamese code page is missing some characters used in Vietnamese, anyway.
But the real reason for these combining characters is to handle the many letters used in Vietnamese that have double diacritics on them -- the cases of dual representations are somewhat accidental, all things considered, in the face of the need to support letters like "ẳằẵắặầẩẫấậ" and so forth....
This post brought to you by "À" (U+00c0, a.k.a. LATIN CAPITAL LETTER A WITH GRAVE)
The Virama is a fascinating sign. It has a simple job -- it surpresses the inherent vowel that the preceding Indic letter contains.
I was very pleased once I understood this concept (I was dealing with Tamil at the time). And the collation rules also seemed quite intuitive to me -- a letter with its inherent vowel surpressed comes before that same letter that still has the vowel. It seemed intuitive because if the vowel was surpressed then it would "weigh less" than if it was not, right?
And I went out in the world with an understanding that I thought would spread to a dozen other scripts that had Viramas in them.
If you know the actual truth you probably have some insight into why I consider my notions of having lingistic aptitude to be delusions....
Like I said, in the Tamil script, it is U+0bcd, and it is known as the Pulli.
And க் (U+0b95 U+0bcd, Tamil Ka + Pulli) sorts before க (U+0b95, Tamil Ka) alone, in the Tamil language.
But on the other hand, in the Devanagrai script, it is U+094d, it is known as the Halant.
And क् (U+0915 U+094d, Devanagrai Ka + Virama) sorts after क (U+0915, Devangari Ka) alone, in the Hindi language.
Ah, but in the Bengali script my insight worked again! It is U+09cd, and it is known as the Hasant.
And ক্ (U+0995 U+09cd, Bengali Ka + Virama) sorts before ক (U+0995, Bengali Ka) alone, in both the Bengali and Assamese languages.
But my hopes are dashed in the Malayalam script, where it is U+0d4d, and it is known as the Chandrakkala.
And ക് (U+0d15 U+0d4d, Malayalam Ka + Chandrakkala) sorts after ക (U+0d15, Malayalam Ka) alone, in the Malayalam language.
And so on.
Any time I have talked to a native speaker of one of these languages, they have told me that the way that the language sorts simply feels natural to them. And I realize that the real problem was seeing what I thought was a technical reason for a set of principles that often do not have a logical reason that is so easily found.
It reminds me of section of that Douglas Adams book Mostly Harmless:
"I know that astrology isn't a science," said Gail. "Of course it isn't. It's just an arbitrary set of rules like chess or tennis or -- what's that strange thing you British play?" "Er, cricket? Self-loathing?" "Parlimentary democracy. The rules just kind of got there. They don't make any kind of sense except in terms of themselves. But when you start to exercise those rules, all sorts of processes start to happen and you start to find out all sorts of stuff about people. in astrology the rules happen to be about stars and planets, but they could be about ducks and drakes for all the difference it would make. It's just a way of thinking about a problem that lets the shape of the problem begin to emerge. The more rules, the tinier the rules, the more arbitrary they are, the better. It's like throwing a handful of fine graphite dust on a piece of paper to see where the indentations are. It lets you see the words that were written on the paper above it that has now been taken away and hidden. It lets you see the words that were written on the piece of paper above it that's now been taken away and hidden. The graphite's not important. It's just the means of revealing their indentations. So you see, astrology's nothing to do with astronomy. It's just to do with people thinking about people.
I think my attempt to find patterns in the chaos were an immature attempt to keep me from feeling foolish for being fascinated by a subject that is no more based on scientific principles than astrology is. But it is an interesting 'in" to learning about some aspects of language. Of which I have learned many.
This site isn't about science. Its just to do with a wanna-be linguist thinking about language.
And sorting it all out....
This post brought to you by U+0a4d, a.k.a. GURMUKHI SiGN VIRAMA
Yesterday and today, we had some customers who were visiting, to meet with people and talk about various international issues and features in Microsoft products.
One of the groups they wanted to talk to was some of the smart folks over on the Office team. I went along because you never know when I will get to hear about stuff that I might not have heard otherwise. Plus Chris Pratley was going to be there and I had not seen him in a long time. I figured it would be good to go and make sure he was still alive (and not virtual at this point like the Rachel Roberts character in S1m0ne). And I am happy to report that he was there. :-)
We also talked a bit about a few of the interesting features that exist in Word. Maybe you knew some of them. They are:
Now each of these issues can cause problems since these stealth changes do not always have consequences that are intuitive for users.
They will wonder why the language tags or the fonts or the keyboard layout is changing when they have not asked explicitly for anything to be changed.
Or why the keyboard will not do in Word what it will do everywhere else (especially if the made the keyboard themselves in MSKLC!).
Sometimes, in fact, some of these rules can conflict with each other. Which can be even less intuitive!
The answer to such complaints is that the road to intuitive application behavior is paved with options that some people do not consider to be all that inuitive. Not everyone understands every feature in Word, after all....
Though I often do my best to turn them off. I prefer behavior that is a bit more deterministic for the case of custom dates with competing settings using the switches. If you know what I mean....
This post brought to you by "ʆ" (U+0286, a.k.a. LATIN SMALL LETTER ESH WITH CURL)
I used it in a very confusing and obfuscated way in Normalization as obfuscation in C#. And then yesterday I used it again in my internationally savvy palindrome checker, in a slightly more intuitive manner.
It is the all new StringInfo class in Whidbey.
Now the old StringInfo class had only static methods -- in other words it was a walking FxCop violation.
And the main method it had was StringInfo.ParseCombiningCharacters, which was a static method that would take a string and return an array of int values, each one of which would be an index into that string that showed where a new text element was started. A text element could be a single letter, a letter and a diacritic, a letter and a bunch of diacritics, a hugh and low surrogate representing a surrogat pair, etc.
ParseCombiningCharacters is an incredibly useful method, but it is not very intuitive to use, certainly not and use effectively. The same goes for the other methods for dealing with text elements (GetTextElementEnumerator and GetNextTextElement) -- people were just getting confused.
But people have no problem understanding the need to be able to count entities based on what a typical user might think a character is. Once one explains what a text element is, they immediately understand the need for ways to make use of them.
So we had some meetings to talk about how to make the ways to work with text elements more intuitive, at least as intuitive as the concept of a text element itself. In the last of those meetings, someone pointed out that people usually had no problem understanding the semantic of the Substring method or the Length property of System.String. Maybe we could learn a lesson from that?
And viola, the SubstringByTextElements method and the LengthInTextElements property were born!
Each behaves just like their cousins, the Substring method and the Length property, but rather than being based on UTF-16 code points, they are based on text elements, or what the user might reasonably point to and call a character. The same thing that the Win32 CharNext and CharPrev functions do (at least, when we have not accidentally broken them!).
Now the method and property are useless if there is not some object that they can hang off of which has the string. People were leery about adding them directly to System.String since they really want to try keep that object as lightweight as they can (and some would even say they are not trying hard enough on that). That's when somebody remembered this class you could instantiate yet had no instance methods, this FxCop violation with a hat. And we added a constructor that takes a string and a StringInfo.String property to retrieve the string later if you wanted or change it without having to tear down the object.
Now we were rolling....
Internally, it just uses that incredibly useful but not-so-intuitive StringInfo.ParseCombiningCharacters and stores that System.Int32 array. That makes StringInfo.LengthInTextElements a simple call to Length on the array, and StringInfo.SubstringByTextElements is a simple tip-toe through the array, using the very start and length parameters that the method contains in order to know where and how far to go. So we get to be intuitive and pretty fast at the same time. and we get to get rid of that FxCop issue, to boot. Everybody wins!
This post brought to you by "¾" (U+00be, a.k.a. VULGAR FRACTION THREE QUARTERS)
It is all about perspective.
I am sure there are people who look at this DLL as being the answer to their prayers in terms of providing helpful interface to AVI capabilities.
But from my point of view, it kind of sucks. :-(
The other day someone with the handle PRR posted the following to the microsoft.public.platformsdk.mslayerforunicode newsgroup:
Problem description: If unicows.lib is included in the project, floating point control word may become invalid during program startup on Windows 98/95 machines (not tested on ME). Compiler platform: MS VS.NET 2003 Pro, Platform SDK Feb 2003, Unicows.dll 1.1.3790, Win XP Pro, P4@2.4G, 1G RAM, Steps to repro:
Problem description: If unicows.lib is included in the project, floating point control word may become invalid during program startup on Windows 98/95 machines (not tested on ME).
Compiler platform: MS VS.NET 2003 Pro, Platform SDK Feb 2003, Unicows.dll 1.1.3790, Win XP Pro, P4@2.4G, 1G RAM,
Steps to repro:
/nod:kernel32.lib /nod:advapi32.lib /nod:user32.lib /nod:gdi32.lib /nod:shell32.lib /nod:comdlg32.lib /nod:version.lib /nod:mpr.lib /nod:rasapi32.lib /nod:winmm.lib /nod:winspool.lib /nod:vfw32.lib /nod:secur32.lib /nod:oleacc.lib /nod:oledlg.lib /nod:sensapi.lib unicows.lib kernel32.lib advapi32.lib user32.lib gdi32.lib shell32.lib comdlg32.lib version.lib mpr.lib rasapi32.lib winmm.lib winspool.lib vfw32.lib secur32.lib oleacc.lib oledlg.lib sensapi.lib
#include "stdafx.h"#include <stdio.h>#include <float.h>int _tmain(int argc, _TCHAR* argv[]){ printf("%x\n", _control87(0, 0)); return 0;}
The problem was reproduced on machine, which has a clean install of Win95OSR2 + clean upgrade to Win98SE. It does not happen all the time. If program should return 9001f, reboot Win98 and try again. Note: It does not matter, whether project uses Multi-byte or Unicode charset. Note: As soon as unicows.lib is removed from project, program starts acting as expected.
The problem was reproduced on machine, which has a clean install of Win95OSR2 + clean upgrade to Win98SE. It does not happen all the time. If program should return 9001f, reboot Win98 and try again.
Note: It does not matter, whether project uses Multi-byte or Unicode charset.
Note: As soon as unicows.lib is removed from project, program starts acting as expected.
Now for the record, it is not MSLU that is doing this. In order to have maximum compatibility with every version of Win9x (including Windows 95), there is no dependency whatsoever on the C Runtime. So it is defnitely not setting the floating point stuff. It is actually harder to do this than I realized, but Phil Lucido helped me shed the dependency while still using stuff like structured exception handling....
Now I knew this issue had come up before but honestly could not remember what it was. Luckily, Ted (who unlike me remembered this issue) came to the rescue with the answer for PRR, and a workaround:
The thing that actually destroys the floating point is avicap32.dll which unicows.dll is dependent on. Several searches will come up with information about this.The solution is to create your own AVICAP32.DLL stub DLL that sits in the same folder as unicows.dll (if you don't rely on functionality in that DLL).
The problem is that unicows.lib is statically linked to many of the system DLLs that it has to call for the functions it wraps, and AVICAP32.DLL does indeed change these settings in the process. Whether you wanted it to or not.
Now this DLL only has two APIs that MSLU wraps: capCreateCaptureWindow and capGetDriverDescription. For all of the trouble that they cause with this floating point crap, I wish no one had noticed these two APIs that were missed for so long (they were added to MSLU on March 24, 2001 and I doubt anyone has actually used the wrappers since then, beyond the meager tests I wrote!).
Back then, I had toyed with the idea of delay loading all of the DLLs and functions being called, but somewhere in the 15+ DLLs and 550+ APIs it just seemed like an excessive amount of work. And it never ended up happening. It probably should have happened for this one DLL/two functions to work around the floating point problem, but it is not really worth rev'ing the DLL for that one change. I'll put it on the list of things to triage, if and when....
This post brought to you by "ಊ" (U+0c8a, a.k.a. KANNADA LETTER UU)A letter that is selcom seen on Win9x but well represented in the Tunga font that ships with Windows XP and Server 2003!
Ok, here is the updated code for that internationally savvy palindrome checker. It supports that interesting situation with ligatures like fl (U+fb02, a.k.a. LATIN SMALL LIGATURE FL) vs. an lf on the far side, originally suggested by our old friend Maurits (with comments):
////////////////////////////////////////////// IsPalindrome//// in : a string// out: true if the string passed in is a palindrome.//// NOTES: This function handles both canonical/compatibility// equivalences and grapheme clusters (a.k.a. text // elements) as defined by the Unicode Standard.////////////////////////////////////////////bool IsPalindrome(string st) { // A null string or a ZLS is not a palindrome if ((null == st) || (0 == st.Length)) return false; // Convert to NFKC and set up the text element detection object StringInfo si = new StringInfo(st.Normalize(NormalizationForm.FormKC)); int count = si.LengthInTextElements; for (int i = 0; i < (count / 2); i++) { // get the text elements for comparison string st1 = si.SubstringByTextElements(i, 1); string st2 = si.SubstringByTextElements(count - i - 1, 1); // see if the text elements on each side are linguistuically equivalent if (CultureInfo.CurrentCulture.CompareInfo.Compare(st1, st2) != 0) { // they are not, so it is not a palindrome. return(false); } } // both ends appear to be equivalent; it is a palindrome. return (true);}
////////////////////////////////////////////// IsPalindrome//// in : a string// out: true if the string passed in is a palindrome.//// NOTES: This function handles both canonical/compatibility// equivalences and grapheme clusters (a.k.a. text // elements) as defined by the Unicode Standard.////////////////////////////////////////////bool IsPalindrome(string st) { // A null string or a ZLS is not a palindrome if ((null == st) || (0 == st.Length)) return false;
// Convert to NFKC and set up the text element detection object StringInfo si = new StringInfo(st.Normalize(NormalizationForm.FormKC)); int count = si.LengthInTextElements;
for (int i = 0; i < (count / 2); i++) { // get the text elements for comparison string st1 = si.SubstringByTextElements(i, 1); string st2 = si.SubstringByTextElements(count - i - 1, 1);
// see if the text elements on each side are linguistuically equivalent if (CultureInfo.CurrentCulture.CompareInfo.Compare(st1, st2) != 0) { // they are not, so it is not a palindrome. return(false); } }
// both ends appear to be equivalent; it is a palindrome. return (true);}
Now, Maurits went on in his Channel 9 posting to discuss sort elements, or cases when two or more characters are to be given a single sort weight (kind of the opposite of an expansion like these ligatures). However, in my opinion these are not really suitable for a palindrome detection algorithm, as I don't think they are usually treated as letters except in the case where they are also treated as unique text elements (the case covered by the StringInfo code).
Any native speakers of languages with such constructs as the Spanish ch and the Hungarian dzs who think they should or should not be treated as a unit in trying to detect palindomosity should feel free to leave a comment to that effect. Also, if any of my collagues in the GIFT group agree or disagree here (and they are reading this!) they are invited to do the same (or stop me in the hall and accost me with this information!).
If I mistaken on this point, then a very interesting problem develops since there is really not an easy method for detecting such cases given the current collation function set (although I can imagine a few avenues of attack and we obviously have the underlying data if we had to support an IsPalindome function in Win32 NLS!). This might even make an interesting interview question one day for a very talented candidate.... :-)
Now, aside from all that, it is important to note that normalization makes some uses of text elements in this context completely unneccessary -- after all, either technology will treat U+0061 U+030a (a + combining ring) as identical to U+00e5 (a ring), one by conversion and the other by giving identical sort weights. Therefore, there is some overlap between the two technologies. However, there are some differences:
So it is fair to say that both technologies as provided in the Whidbey release are potentially useful in (of all things) the detection of palindromes, today and tomorow. I am sure the people who spec'ed, developed, and tested these feature are very proud for the technologial advances!
This post brought to you by "แซ์" (U+0e41 U+0e0b U+0e4c, a.k.a. THAI CHARACTERS SARA AE + SO SO + THANTHAKHAT)The last two codepoints of which make up a text element, but all three of which make up a unique sort element!
I took the test after seeing adamu's attempt.
93%.
Damn, I need to get a life.
This post brought to you by "𝔵" (U+1d535, a.k.a. MATHEMATICAL FRAKTUR SMALL X)The character that is saying "Welome home, Michael"
Back after Windows 2000 shipped, everyone though the word LOCALE was simply used too much, in fact that it was OVERused.
It forced Dr. International to ask "Will the Real Locale Please Stand?" and answer it with a mondo big table of Configurable Language an Cultural Settings. And it forced Windows XP to rename the various settings to other terms in the user interface.
I wonder if we ought to be doing the same with the word "DEFAULT" at some point.
At its lowest level, the meaning for the word is the same for all of them: "an option that is selected automatically unless an alternative is specified." This is good as far as it goes.
But the Default User Locale (returned by the GetUserDefaultLCID API) makes no sense since the user can only have one "user locale" at a time. If they specify an alternative by calling a Win32 API like GetDateFormat with a different LCID, then it is no longer a user locale being specified. Maybe it should just be called the "user locale."
Same deal for the Default System Locale (returned by the GetSystemDefaultLCID API) which if I try to change today forces a reboot. There is only one "system locale" at a time. If I specify an alternative by calling a Win32 API like GetLocaleInfo with a different LCID to get a different ACP than the CP_ACP via the LOCALE_IDEFAULTANSICODEPAGE flag, it is certainly not the default. Maybe it should just be called the "system locale."
Which only underscores that the Default ANSI Codepage that CP_ACP represents is not really a default either -- there is only one, until and unless you change the system locale. So default is not the right word to contrast with when you call WideCharToMultiByte with some other code page. Maybe it should be called the "system ANSI code page."
It is not just us, mind you. Moving into the USER subsystem area, they have the notion of a Default Input Language, which can be retrieved by the SPI_GETDEFAULTINPUTLANG flag and set by the SPI_SETDEFAULTINPUTLANG flag of the SystemParametersInfo API (which is incidentally classed as a "BASE" API in the Platform SDK even though it is exported by user32.dll; usually the term BASE refers to the stuff in kernel32.dll). But this is simply the initial HKL of any new thread that is started for a given user. Note that it can never be "overridden" and will always be the first input language; it can just be changed later. So it too is not really a default. Maybe they should call it the "initial input language" instead.
And then moving off into the area of user accounts and profiles, Windows has a Default User account which you never directly use for the purposes of login. You can get its directory with the GetDefaultUserProfileDirectory API. But that account's directory's contents are essentially used as a template for new user accounts. It is half of what we do when you click on that "Default user account settings" checkbox on the Advanced tab of Regional and Language Options. But it is not really a default -- maybe they should call it the "template account."
And then there is the .DEFAULT user section of the registry. It is the part of the registry under HKEY_USERS\.DEFAULT and it has in it the information used prior to logging in to the machine. You ever wonder what registry settings control the keyboard list in the logon dialog or the user interface language of the logon dialog or desktop theme of the logon dialog or the user locale of services running under the SYSTEM account or anything else initialized in that early stage of the OS? Its in that section of the registry, and the other half of what that "Default user account settings" checkbox does. But it is not really a default -- maybe they should call it the "system account" especially since it is the system account.
Looking at the world of MUI (Multilingual User Interface, another subteam inside of the GIFT org of which I am a member), they have their GetSystemDefaultUILanguage and GetUserDefaultUILanguage APIs, which suffer from the same problems of (respectively) the GetSystemDefaultLCID and GetUserDefaultLCID APIs. They are not really defaults. Maybe they should simply be called the "system UI language" and "user UI language".
But then we can't really change the names of these various APIs, for obvious reasons. So our defaulteventual answer is to just leave it like it is, even if some clever person in marketing reads this entry and decides to change all of the names of everything in the user interface like we did with the word "locale" in XP. :-)
Not the best defaultinitial plan, but backcompat is still king here!
This post brought to you by "ಡ" (U+0ca1, KANNADA LETTER DDA)
I am not going to claim our UI in Windows is so intuitive that we can trust that anything that is set is what the user really wants. In fact, I have stated many times that Regional Options is not intuitive.
But when a developer tells me that the reason they do not use the override information is that it may not be valid, in my opinion they are a bit thin. If you know what I mean.
After all, if the user never launches Regional Options then the overrides are identical to the original Windows data. If there are any differences, then somebody went into Regional Options and changed something. And they have a good faith basis for believing applications will pick those settings up. Not picking them up is kind of irresponsible in a client machine scenario....
Now my buddy Mike definitely points out a use of the NLS SetLocaleInfo function that is downright irresponsible, no question about it. After all, if ignoring the user's preferences is disrespectful, then supplanting their preferences wuth your own is downright obnoxious! What is up with some people?
Another pet peeve of mine related to all this was one that Dean pointed recently in his post Disabling ClearType in Reading Layout View. Now I agree that the ClearType settings are pretty hidden, but is the Word replacement any better? If you look at the poor documentation and the way it is buried in the registry in ways that are hard to find, the argument that the Desktop Control Panel settings are obscure is pretty specious. I would be a tremendous fan of anyone who ripped this code out, root and branch, and used the SystemParameterInfo function with the SPI_GETFONTSMOOTHING, SPI_GETFONTSMOOTHINGCONTRAST, and SPI_GETFONTSMOOTHINGTYPE flags. Consider this post a standing offer of a dinner somewhere nice that I will give to any Office developer who accomplishes that. :-)
The principle is simple -- follow the user preferences. If they did not feel strongly enough about changing them that they are untouched, then that too may be a preference. And a good developer does not ignore messages the user is sending to them....
This post brought to you by "®" (U+00ae, a.k.a. REGISTERED SIGN)
(No technical content in this post)
I love Typhoon, a Thai restaurant that has muliple locations, the closest one for me in Redmond about 4 miles away. They have a dish there called General's Noodles with the following description on the dinner menu:
Egg noodles with chicken, shrimp, fried wonton, sprouts, peanuts, sugar and lime.
I just love it. I have been in the habit lately when I eat there to order another one to-go and eating later or the next day. It is awesome food. I don't think it is a genuine Thai dish since I have never seen it in any other Thai restaurant. And believe me, I have looked. I asked them if they had a cookbook, but no such luck.
On the whole I would say it is my second favorite food in all the world (right after stuffed grape leaves, but that is a story for another day!).
Anyway, I was home yesterday, feeling a bit peckish around 5:00pm. And it was sunny out in Redmond. I had just left a foot of April snow in Cleveland. And like I said it is just 4 miles from here. And there is a cool bike trail next to SR 520 that is a straight shot for most of the way. I looked at my fully charged Pride Mobility 3-Wheel Victory Scooter and considered the fact that its 20-25 mile range was almost certainly on flat ground, not on hills. Would I be able to scoot there?
I decided I would.
(People are probably shuddering when they think about where this story may be going, especially in the context of the title of the post. Think of it as FORESHADOWING, a sign of quality literature!)
Now Pride Mobility scooters have a battery guage on them with colored circles on it -- red for empty, moving into yellow for near empty and then up through to green for fully charged. There is one red, one yellow, and then four greens, each one a little darker. The meter is most effective while you are scooting on level ground -- it is how you know if you really are full or not. Going uphill drops the meter down, and downhill makes it look fully charged when it is not.
But I jammed quickly, and I made it there is about 40-45 minutes, no problems at all. They messed up my order a little (one of the only flaws of an otherwise wonderful restaurant is the fact that they are not so good taking to-go orders over the phone). But I was in no hurry. A few minutes later I was off, heading home with my bounty. As I left, then even under load I was in the second highest green circle. I figured I was all set.
Of course that was to change.
Suddenly, just before I made it to the bike trail ( say about 5 miles into this ~8 mile journey), something happened. The power dropped down to between the last and second last green circle.
Damn.
I realized there might still be enough juice to make it home, though I remembered most of the trip home being uphill a little. I was a little nervous, but I figured I did not have too much choice so I should just go for it.
As I countinued, the meter was poking dangerously in towards the yellow circle, and then suddenly it hit the red and just stopped.
I got up, took a look at the freewheel lever, pulled it up, and started pushig the scooter. It weighs about 150 pounds, but I figured if I was holding on to it I would probably not fall. After about 50 yards I noticed there was a circuitbreaker reset button, and pressing it gave me back the juice. So I gratefully stopped pushing, turned of the freewheel, and started sooting. I carefully avoided making the scooter go fast enough for the meter to head down to the red circle as that seemed like a surefire way to lose power again, and I slowly plodded home (probably closer to 2mph than the scooter's 5mph maximum, at this point). It looked like I would make it back.
Or not. Just before 51st St., it decided any hill was too much, and I had to push it all the way back from then on. Mike tells me that it is 1.3 miles that I pushed it, plus a little bit of the time on 40th St. that I was pushing.
I had a lot of time to think as I was pushing the 150 pound scooter, mostly uphill.
Mostly wondering if I would ever be stupid enough to let this happen again. And then realizing I probably am that stupid. So as a mitigation strategry planning on how I hould get some extra batteries for it to carry with me for next time, like I can with the smaller scooter. And wondering how best to have someone take a look at the scooter and let me know what is wrong with it now (is one of the two batteries in trouble? Or is there another problem?).
At one point a biker stopped and asked if there was anything he could do. But I was almost to 40th St. and it seemed like there was no practical way he could help ithout abandoning his bike. I told him thanks very warmly, but that I was almost there and I would make it. I actually felt really good that someone just saw me having trouble and asked if they could help -- even if there is no practical way to do so, his motivations were purely along the lines of "someone is in trouble, how can I assist?" and that is a great thing about people sometimes.
Anyway, I made it home finally, though I was barely able to walk or even stand for most of the night. I am still a little shaky now but I am mostly recovered. It was probably the most exercise I have had in a while, even more than the show shovelling exercise from last week. I am not anxious to repeat it (and may not make a Typhoon jaunt in the scooter again), but like a time many years ago that extreme circumstances caused me to run on a beach in New Jersey when I was usually having trouble walking, it is good to know that I still have reserves that can be tapped, when needed.
And I still have those General's Noodles to eat! :-)
This post brought to you by "♿" (U+267f, a.k.a. WHEELCHAIR SYMBOL)
The Hijri calendar is not really subject to an easy alogorithm. As Dr. International pointed out back in August of 2000:
Perhaps Dr. International should provide some background to help explain why SQL Server refers to this as an Arabic style date that uses the Kuwaiti algorithm. The Hijri calendar is a very old and complex calendar, which has an issue when it comes to automating conversion between Gregorian and Hijri: there are specific days that the conversion can potentially be off by a day or two in either direction. The exact reason for this has to do with the proclamation of the new moon by religious authorities based on visibility of lunar crescent. Therefore, the natural temptation of programmers to want to automate everything must be resisted in this case. The Hijri calendar is very important to Saudi Arabia and other countries such as Kuwait, and thus this seemingly unsolveable problem must be solved. In an effort to solve this challenging problem, several years ago some of the top developers in Microsoft's Middle East Products Divison (MEPD) did extensive research into it. They had the longest timeline of information on the Hijri calendar as is used in Kuwait, and they took this information and did statistical analysis on it, finally arriving at the most accurate algorithm they could devise. This algorithm is used in many Microsoft products, including all operating systems that support Arabic locales, Microsoft Office, COM, Visual Basic, VBA, and SQL Server 2000. Whether you refer to this as the Hijri date, the Arabic style, or the Kuwaiti algorithm, you should understand that it is technically none of these things; it is simply the most accurate algorithm that Microsoft was able to derive using a large number of known Hijri dates. The actual determination of the new moon by religious authorities does not bow to a computer algorithm (nor should it, obviously!).
Perhaps Dr. International should provide some background to help explain why SQL Server refers to this as an Arabic style date that uses the Kuwaiti algorithm. The Hijri calendar is a very old and complex calendar, which has an issue when it comes to automating conversion between Gregorian and Hijri: there are specific days that the conversion can potentially be off by a day or two in either direction. The exact reason for this has to do with the proclamation of the new moon by religious authorities based on visibility of lunar crescent. Therefore, the natural temptation of programmers to want to automate everything must be resisted in this case. The Hijri calendar is very important to Saudi Arabia and other countries such as Kuwait, and thus this seemingly unsolveable problem must be solved.
In an effort to solve this challenging problem, several years ago some of the top developers in Microsoft's Middle East Products Divison (MEPD) did extensive research into it. They had the longest timeline of information on the Hijri calendar as is used in Kuwait, and they took this information and did statistical analysis on it, finally arriving at the most accurate algorithm they could devise. This algorithm is used in many Microsoft products, including all operating systems that support Arabic locales, Microsoft Office, COM, Visual Basic, VBA, and SQL Server 2000. Whether you refer to this as the Hijri date, the Arabic style, or the Kuwaiti algorithm, you should understand that it is technically none of these things; it is simply the most accurate algorithm that Microsoft was able to derive using a large number of known Hijri dates. The actual determination of the new moon by religious authorities does not bow to a computer algorithm (nor should it, obviously!).
Now, I am not even going to imply that what I am about to say were due to direct help from Microsoft, and I have no knowledge that suggests otherwise.
But earlier today, colleague Shawn Steele pointed me (and others) at an article entitled Satellite will help set Islamic dates which describes a fairly cool development, in my opinion:
The Organization of the Islamic Conference, the world's largest Muslim body, said Sunday it plans to launch an $8 million satellite within two years to take pictures of the moon to find lunar calendar dates. The 57-nation group said religious scholars would have access to accurate pictures of the shape of the moon instead of having to rely on naked-eye sightings, which have in the past created discrepancies between Muslim countries or led to mistakes. "Hopefully the satellite will stop the problems associated with lunar sightings," spokesman Ahmed Imigene said.
The Organization of the Islamic Conference, the world's largest Muslim body, said Sunday it plans to launch an $8 million satellite within two years to take pictures of the moon to find lunar calendar dates.
The 57-nation group said religious scholars would have access to accurate pictures of the shape of the moon instead of having to rely on naked-eye sightings, which have in the past created discrepancies between Muslim countries or led to mistakes.
"Hopefully the satellite will stop the problems associated with lunar sightings," spokesman Ahmed Imigene said.
It is ironic that the sort of problem that I would struggle with for the technical reason of wanting a purely algorithmic solution is one that bothers some religious authorities as well (many of whom are for obvious reasons not wanting to see even unintentional, innocent mistakes made). I think it is amazingly cool that that there are people whoare interested in leveraging technology to better aid the intent of the rules used by the religion.
There are understandably some who are unhappy with the plan, as the article goes on to state:
It was not immediately clear how many countries will use the technology to determine religious dates. There is already some criticism from religious officials in Saudi Arabia, which uses the lunar calendar. "The shape of the moon has to be seen from the ground," said Osama al-Bar, dean of the Custodian of the Two Holy Mosques Institute for Haj Research in Saudi Arabia.
It was not immediately clear how many countries will use the technology to determine religious dates. There is already some criticism from religious officials in Saudi Arabia, which uses the lunar calendar.
"The shape of the moon has to be seen from the ground," said Osama al-Bar, dean of the Custodian of the Two Holy Mosques Institute for Haj Research in Saudi Arabia.
Now I realize that this hope has all of the problems that the issue of instant replay versus umpire/referee calls in sports has had, with the additional burden of being a LOT more meaningful, if you know what I mean. Knowing how bitter the battles got about the instant replay issue, I can only imagine how many problems this may cause for people who truly believe there is something wrong with the plan.
The real problem (in my opinion) is that the original intent is not completely known. Even if the motivation for rules was known and the rules were made since they were the best at the time, then at this point there is still no way to know if those who made the rules would accept such an innovation or not. Thus it could be easily considered either pious or heretical, depending on how you look at it. And one would be hard pressed to argue the point either way, since it is a legitmate religious question.
My hope for this development, to help break the stalemate, is that eventually a careful combination of the techniques is used, to help assist the religious authorities. Every effort would be made to try and spot the shape of the moon, but the appropriate authorities would ideally have access to the data from the sattelite as an additional data point to assist them.
The issue actually reminds me of an issue in Judaism, interestingly enough. It has to do with the laws about Kashrut (כשרות). Kashrut (means "keeping kosher") are the Jewish dietary laws. Food that is allowed to be eaten is kosher (כשר), and food that is not is treif (טרפה). The rules I am referring to are the ones related to the method by which animals must be slaughtered for them to be able to be considered kosher. Described in this article:
Kosher Slaughter and Preparation Jewish law states that kosher mammals and birds must be slaughtered according to a strict set of guidelines, the slaughter (shechita) (שחיטה) being designed to minimize the pain inflicted. This necessarily eliminates the practice of hunting wild game for food, unless it can be captured alive and ritually slaughtered. A professional slaughterer, or shochet (שוחט), using a large razor-sharp knife with absolutely no irregularities, nicks or dents, makes a single cut across the throat to a precise depth, severing both carotid arteries, both jugular veins, both Vagus nerves, the trachea and the esophagus, no higher than the epiglottis and no lower than where cilia begin inside the trachea, causing instantaneous loss of blood flow to the brain and death in a few seconds. Any variation from this exact procedure could cause unnecessary suffering; therefore, if the knife catches even for a split second or is found afterward to have developed any irregularities, or the depth of cut is too deep or shallow, the carcass is not kosher (nevelah) and is sold as regular meat to the general public. The shochet must be not only rigorously trained in this procedure, but also a pious Jew of good character who observes the Sabbath, and who remains cognizant that these are God's creatures who are sacrificing their lives for the good of himself and his community and should not be allowed to suffer. In smaller communities, the shochet is often the town rabbi or the rabbi of one of the local synagogues; large factories which produce Kosher meat have professional full time shochets on staff. Once killed, the animal is opened to determine whether there are any of seventy different irregularities or growths on its internal organs, which would render the animal non-kosher. The term "Glatt" kosher, although it is often used colloquially to mean "strictly kosher", properly refers to meat where the glatt (גלת) (lungs) are carefully examined for adhesions (i.e. scars from previous inflammation). Large blood vessels must be removed, and all blood must be removed from the meat, as Jewish law prohibits the consumption of the blood of any animal. This is most commonly done by soaking and salting, but also can be done by broiling. An interesting fact, little-known outside of Jewish communities, is that the hindquarters of a mammal are not kosher unless the sciatic nerve and the fat surrounding it are removed (Genesis 32:33). This is a very time-consuming process demanding a great deal of special training, and is rarely done outside Israel, where there is a greater demand for kosher meat, since all meat sold in Jewish towns is required to be kosher by law. When it is not done, the hindquarters of the animal are sold for non-kosher meat.
Kosher Slaughter and Preparation
Jewish law states that kosher mammals and birds must be slaughtered according to a strict set of guidelines, the slaughter (shechita) (שחיטה) being designed to minimize the pain inflicted. This necessarily eliminates the practice of hunting wild game for food, unless it can be captured alive and ritually slaughtered.
A professional slaughterer, or shochet (שוחט), using a large razor-sharp knife with absolutely no irregularities, nicks or dents, makes a single cut across the throat to a precise depth, severing both carotid arteries, both jugular veins, both Vagus nerves, the trachea and the esophagus, no higher than the epiglottis and no lower than where cilia begin inside the trachea, causing instantaneous loss of blood flow to the brain and death in a few seconds. Any variation from this exact procedure could cause unnecessary suffering; therefore, if the knife catches even for a split second or is found afterward to have developed any irregularities, or the depth of cut is too deep or shallow, the carcass is not kosher (nevelah) and is sold as regular meat to the general public. The shochet must be not only rigorously trained in this procedure, but also a pious Jew of good character who observes the Sabbath, and who remains cognizant that these are God's creatures who are sacrificing their lives for the good of himself and his community and should not be allowed to suffer. In smaller communities, the shochet is often the town rabbi or the rabbi of one of the local synagogues; large factories which produce Kosher meat have professional full time shochets on staff.
Once killed, the animal is opened to determine whether there are any of seventy different irregularities or growths on its internal organs, which would render the animal non-kosher. The term "Glatt" kosher, although it is often used colloquially to mean "strictly kosher", properly refers to meat where the glatt (גלת) (lungs) are carefully examined for adhesions (i.e. scars from previous inflammation).
Large blood vessels must be removed, and all blood must be removed from the meat, as Jewish law prohibits the consumption of the blood of any animal. This is most commonly done by soaking and salting, but also can be done by broiling. An interesting fact, little-known outside of Jewish communities, is that the hindquarters of a mammal are not kosher unless the sciatic nerve and the fat surrounding it are removed (Genesis 32:33). This is a very time-consuming process demanding a great deal of special training, and is rarely done outside Israel, where there is a greater demand for kosher meat, since all meat sold in Jewish towns is required to be kosher by law. When it is not done, the hindquarters of the animal are sold for non-kosher meat.
Now I will be the first to admit that at the time these rules were codified, they were state of the art in the most humane method of slaughter that was really possible. However, I sincerely doubt that it is the most humane possible method today, given all of the technologies that exist. But there is no way to know if the original rules were only to do with picking humane methods (the first time I read about this explanation was a book by Samuel Dresner, a rabbi who freely admitted that he was speculating -- though he did have an awful lot of evidence in his speculation). So the real question is whether technological changes in the shochet's techniques should be allowed?
I am sure that if such a change were made, that some orthodox jews would refuse to accept them. The whole system of kashruth would changed as some would not accept the "Kosher" marks that others would (a minor issue today that would become much more significant).
So how to decide when technology should be used to help further tradition, and when it should just butt out? The intents of both sides of these kinds of debates are mostly just trying to help. And often they are all very pious people trying to do the best thing. But how can one know when one is doing the best thing?
This post brought to you by "؍" and "✡" (U+060d a.k.a. ARABIC DATE SEPARATOR, and U+2721 a.k.a. STAR OF DAVID)
There have been several interesting emails that I have gotten the last few days, relating to the Microsoft stance on the anti-discrimination bill.
Many people have seen the letter Steve Ballmer sent out (either because they are employees, or beacuse they read Scoble's posting of it, or through their own nefarious sources), and the various comments that people have put into their blogs about the issue. I don't really a "me too" post to add, but I will give an extra thought or two about it that perhaps will cause someone to see it all in a different light, or maybe a slightly different shade of the same light....
I read Vic Gundotra's thoughts about it, and found myself unconvinced about the attempt to move the issue from being a human rights issue to a moral issue.
I myself am not gay. That is a personal choice of mine. I have never known anyone to not respect that. For others it may be a moral choice, but I simply do not see it that way myself. I will respect the right of someone to believe it is immoral if that is their belief, it is certainly no worse and no better of a reason to make such a choice than just not being interested in giving it a try or being afraid to do so or whatever. It is a choice. And I respect the right of anyone to make that choice.
I am vigously opposed to the notion that such a choice should ever either help or harm my career in any way whatsoever. And certainly if I were gay I would feel the same way, perhaps even more vigourously since someone would essentially be discrminating against me. Either way, the notion that the way I live my life when I am doing nothing illegal should ever impact my career due to discrimination sickens me. If I am the CEO of a company with a strong policy against such discrimination, then how can I say that I personally feel that way and my company policies are shaped that way, but such a policy as law would not be appropriate for me to support?
I think about my own situation, as a different kind of protected class, being handicapped.
I love that I work for a company that either meets or exceeds all of the legal requiremnts related to my handicap, and I also love the support I am being given by a management team that wants me to be able to be a happy and productive member of the team. But I live in the real world where not all companies or management teams would give that level of support. And knowing that they are required to make at least some effort makes me feel safer as a person who is handicapped.
If Microsoft were to do the same thing for a bill related to my situation, I would probably be shouting my displeasure from the rooftops.
The moral issue is irrelvant to the issue at hand, because the bill does not legislate morality, It basically requires that you cannot discriminate against someone, even if those are your moral beliefs. If you do it at Microsoft, then you may well be fired. All people who are put in such a situation deserve such protections, even if they do not work for Microsoft.
Taking a step back for a moment, I will admit that I actually do have certain prejudices.
In the other direction.
I tend to assume that someone of a different race, gender, creed, or sexual orientation may actually have a better chance at being good at their job than someone who is not. This ias because of the wry fact that they are fighting a harder daily battle and it is much easier for them to either give up or to be drummed out by someone with preudices against them. The fact that they are still there has some small positive effect (that they havemanaged to avoid being forced out by those who tend to discrimate).
This prejudice of mine is deep seated and has probably been around since a good friend fought and won the battle be a neurosurgeon against an department chief who was inclined to feel that she had no place there, on the basis of her sex. I knew she was an excellent neurosurgeon, and when she asked if I would be comfortable if she scrubbed in when I was having surgery myself I told her I would be honored. And I was. The fact is that had she not been, that department head would have had her drummed out of the program. And I cannot say that all neurosurgeons are held to such high standards. Unfortunately.
Here in software, its not quite the same life or death kind of situation, obviously. And I have worked to make sure I would never allow my "prejudice" in favor of a candidate who has overcome a system that generally seems disinclined to help deserving people to get a fair shake to change a decision or cause me to prefer one candidate for a job or a project over another. Because even a "good" prejudice is wrong and it is crucial that I make decisions based on the facts and not any of my preconceived notions. In the end I must have real reasons to support my choices, not just to defend myself from getting in trouble but so I can live with myself.
I think the decision to not support the bill as a company that clearly does support the tenets that cause the bill to exist is wrong. It is a decision that shows that as a company we may have certain convictions, but that we lack the courage of some of those convictions.
Yesterday, someone named Ben posted the following comment to my post Invariant and Ordinal Redux:
I appreciate your enthusiasm for picking out common programming errors like this, but as a professional programmer, I find a lot of these internationalization parameters confusing. How do I know if I need to pass the NORM_IGNOREKANATYPE flag to CompareString? How do I know if I want LOCALE_USER_DEFAULT or LOCALE_SYSTEM_DEFAULT, or some other locale? I simply don't know. Unless I learn Japanese, or know someone who knows Japanese, I'll never know the answer. The trouble is that the APIs feel like they were written by linguists. Me? I just want to compare filenames, or compare entries in a hash table, or compare usernames, etc. I don't want to even have the choice of ignoring kana types. I just want the CompareStrings API do the *right thing* out of the box. If that is too hard for a single function, then let's write some API sets that are easy to use for common cases. I think this would be a more useful endeavor than to write articles about the nuances between CT_CTYPE3 and CT_CTYPE2. Sometimes less choice is better. Please please finish that list of do's and don'ts. Please please make a list of "If you want to sort like a dictionary, do this... If you want to put filenames into a hash table, do this..."
My initial reaction was to point out that the APIs were not written by linguists -- but the developers had expert advice from linguists when the functionality was exposed.
My second reaction was a technical one, thinking of which ones I had already covered (like What is my locale? Well, which locale do you mean? answering some that locale question) and which ones might make good future posts (like the care and feeding of NORM_IGNOREKANATYPE) and so on.
My third reaction was to slow down this "developer" in me trying to solve the technical problem and look to what was really being suggested. Unfortunately, Ben's supposition is correct -- the APIs are complicated, and there is too much functionality to try to distill into simple usage without having detailed articles about the nuances. Articles that could be read by the kind of devs who try to solve the problem you indicated.
In a very real and almost biblical sense, one can talk about "CompareString which begat lstrcmp and lstrcmpi in the USER kingdom, and was fruitful an multiplied in the SHELL kingdom and begat StrCmp, StrCmpI, IntlStrEqN, IntlStrEqNI, StrCmpN, StrCmpNI, StrIsIntlEqual, some of whom later begat StrCmpLogicalW. And in that kingdom functions which were not begat from CompareString also flourished like those that used the C rules -- StrCmpC, StrCmpIC, StrCmpNC, and StrCmpNIC. And in the kingdom of .NET the managed brother CompareInfo was also fruitful and begat the five overloads of String.Compare and in Whidbey begat the StringComparer class and the StringComparison enumeration. And CompareInfo.IsPrefix and its overrides begat String.StartsWith. And CompareInfo.IsSuffix and its overrides begat String.EndsWith. And..."
Of course what the SHELL folks and the BCL folks did showed that in attempting to simplify individual functionalities into single APIs, you cause an explosion of simple APIs that are also very tough to unravel what to use.
Topically modifying what Hal Holbook said on The West Wing (playing the cantankerous Albie Duncan) in the episode Game On:
It's not simple. It's incredibly complicated. I've been doing NLS work for over 10 years and there is no right answer to these questions and software development needs all the words it can get its hands on...
I could tell you when it is ok to use lstrcmp and lstrcmpi and StrCmpLogicalW. I could not even try to tell you how to navigate the rest of that stuff in the Shell or a lot of the stuff in .NET, even though a lot of it calls right into us. Because to me it is just a decision of whether one wants one's complexities to be horizontal or vertical, with the bonus of the vertical complexity (the NLS kind) being that all of the functionality is there, versus the individual McNugget that the developer was trying to surface in the simplified method, which will always be missing one or more of the functionalities that are possible, despite seeming to me to be a lot more complex....
So while I will give practical advice from time to time like (like "use the new OrdinalIgnoreCase type comparisons when trying to imitate the OS, because the OS does not know CompareString from Cholesterol"), the bulk of what I say will be exploring that vertical space of the NLS managed and unmanaged APIs and how best to use them to get the results you want.
Because the problem I have personally with the horiztonal space is that when you have to change behavior because the call did not do what you thought it did, the change is more than just passing a new flag; it is often calling a whole new function in a whole new way (just take the String.StartsWith method as an example -- if you want to do some operations you have to move to CompareInfo.IsPrefix, which has entirely different calling semantics (one is a static method that takes two strings, the other is an instance method on a string). Or if I want to change the STRINGSORT/WORDSORT behavior of StrCmp, I have to go figure out all the parameters of CompareString now, which if I had done in the first place I would not have been trapped in the Sargasso of SHLWAPI.
Hopefully this fits with the model people are expecting here. If not then maybe the Shell or BCL folks will step up and work to provide the uber-conversion charts to know when to call which of the 30 methods that are all designed to simplify the five methods that NLS provides (or in the unmanaged world the 30 functions designed to simplify the one function).
Simplification is just too complex for me. :-)
This post brought to you by "A" (U+0041, LATIN CAPITAL LETTER A)After Happy Days went off the air and everybody realized the Fonz was short, the letter behind "Aaaaay" had its reputation injured a bit andis looking to expand into new markets, like this blog!