Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Igor Levicki asked in the Suggestion Box:
Any chance to enlighten as whether Windows Explorer in Windows Vista will learn to sort numeric filenames like us humans do? A.k.a strcmp() for humans Regards, Igor
It's like deja vu all over again!
no need to wait for Vista, since it has been in the Windows Explorer's 'sort by name' functionality since Windows XP!
(more info in What is up with number sorting and What is up with number sorting, redux)
Now there actually were some interesting modifications to the algorithm for Vista that I will post about soon, but in the meantime, StrCmp for humans is a done deal. :-)
This post brought to you by . (U+002e, a.k.a. FULL STOP)
Back in the early days of the planning for the .NET Framework, everyone was convinced that the CultureInfo class really needed to base itself off of the settings on the underlying version of Windows. People were simply convinced that it would be too confusing if they changed the date format or whatever in Windows, and the CurrentCulture did not pick up the change there.
Maybe they were right, but to be honest I have always been skeptical as to whether this dependency would actually lead to other more confusing behavior in different situations.
With that anticipatory bit of foreshadowing in place, I will proceed with the post. :-)
Mauli's question was straightforward enough:
Hi all,I found this issue while using Vista RC1 as a client, and Windows 2003 as the server using Arabic-Saudi Arabia culture. Both have Whidbey SP1 (SP1.50727.360). On Vista, the calendar used for the ar-SA culture is different from previous versions. On the client, I pass in the ar-SA short date string of Gregorian date 2004-12-27 to the server. On the server side, this string gets converted from its Arabic form into a Gregorian date. However, since the Arabic calendar on the server is different from the client, I get back the wrong Gregorian date. It seems that we should fix the same version of .net framework should always return the same calendar for ar-SA, no matter what the OS. Is there something that I'm missing? Why wouldn't the same version of the .net framework always return back the same calendar? Using .NET, find the Arabic date range to use - I used this code snippet: DateTime d = new DateTime(2004, 12, 27);Console.WriteLine(d.ToShortDateString());DateTime d2 = new DateTime(2005, 1, 5);Console.WriteLine(d2.ToShortDateString()); Gregorian: 2004-12-27 to 2005-1-2005Arabic DateTime.ToShortDateString()Vista: "15/11/25" to "24/11/25"Windows 2003: "16/11/25" to "25/11/25"
Hi all,I found this issue while using Vista RC1 as a client, and Windows 2003 as the server using Arabic-Saudi Arabia culture. Both have Whidbey SP1 (SP1.50727.360). On Vista, the calendar used for the ar-SA culture is different from previous versions. On the client, I pass in the ar-SA short date string of Gregorian date 2004-12-27 to the server. On the server side, this string gets converted from its Arabic form into a Gregorian date. However, since the Arabic calendar on the server is different from the client, I get back the wrong Gregorian date. It seems that we should fix the same version of .net framework should always return the same calendar for ar-SA, no matter what the OS. Is there something that I'm missing? Why wouldn't the same version of the .net framework always return back the same calendar? Using .NET, find the Arabic date range to use - I used this code snippet:
DateTime d = new DateTime(2004, 12, 27);Console.WriteLine(d.ToShortDateString());DateTime d2 = new DateTime(2005, 1, 5);Console.WriteLine(d2.ToShortDateString());
Gregorian: 2004-12-27 to 2005-1-2005Arabic DateTime.ToShortDateString()Vista: "15/11/25" to "24/11/25"Windows 2003: "16/11/25" to "25/11/25"
Uh oh, the worry here is that version 2.0 of the .NET Framework returns inconsistent results when it is run on a Windows 2003 machine with an Arabic (Saudi Arabia) user locale versus a Vista machine with an Arabic (Saudi Arabia) user locale.
Suddenly people aren't worried about the .NET Framework being consistent with Windows any more, are they?
Well, that is exactly what changed here. In Windows 2003 there is only the Hijri calendar, but in Vista the Um Al Qura (secular Hijri) calendar I have mentioned previously has not only been added, but it is also the default. And the calendar setting choice is one of those properties you can override, whether you are looking at XP/Server 2003:
where the default calendar is "التقويم الهجري" (Hijri Era) or in Vista:
where the default calendar is "تقويم ام القرى" (UmAlQura). One could always override the Vista choice if one wanted to, going back to that existing Hijri calendar.
Now of course the .NET Framework also added this new calendar, thus you can get either the UmAlQuraCalendar Class or the HijriCalendar class. And the new default in the 2.0 version of the .NET Framework is the UmAlQura calendar, but this too can be overridden, whether explicitly in code or implicitly by it being the default user locale with a different calendar setting.
So, if you are running code in the 2.0 version of the .NET Framework that uses the Arabic (Saudi Arabia) culture then you will get consistent results except in the case where Windows instructs the .NET Framework to use a different calendar (which it must do on Server 2003 and earlier since there was not yet an UmAlQura calendar to use.
Now I will be the first to admit that the few people who customize their date formats may have been confused to not see the changes in the .NET Framework. But in my opinion their problem is a lot easier to explain to people than this sort of calendar issue, because it is easier to say that Windows and the .NET Framework are separate but the .NET Framework results will be consistent within the same version.
Which is all water under the bridge now, except for every once in a while when someone hits this kind of issue again. :-)
This post brought to you by ت (U+062a, a.k.a. ARABIC LETTER TEH)
I can actually say again and again and again and again that although 64-bit keyboards are important to Microsoft and to the GIFT team and how there will be an update at some point that it has not happened just yet....
The truth is that sometimes, some people just don't want to wait.
I just got email from such a person yesterday, in fact! :-)
Igor Levicki is that person, and in his post (How to build keyboard layouts for Windows x64?) he describes his spluenking efforts to find the 64-bit support that was partially completed but not yet finished in the Microsoft Keyboard Layout Creator (and as he noticed we did not ship the cross-compilers since we weren't supporting the platforms yet anyway!).
Now of course the real support plan will include the real setup, and will also include the WOW64 dll as well (Mr Levicki's does not, which will cause some headaches for 32-bit applications running on 64-bit). Also the version of the tools we would have shipped would have worked okay with the original settings, meaning that Igor figured out the diffs needed for the later tools, as well.
Note to anyone from Microsoft who is reading this: someone may want to try and find out if Mr. Levicki might want to look into interviewing with Microsoft.
And note to Igor Levicki -- if we ever find ourselves in the same (or a similar) place, let me know. I'd love to buy you a drink and we maybe talk about some of the cooler aspects your efforts! :-)
This post brought to you by ♔ (U+2654, a.k.a. WHITE CHESS KING)
From time to time, stores like Fred Meyer will re-arrange where the various things are located within the store. My local store did this just recently, in fact. Now this is an event that I am of two minds on.
One the one hand, it is as annoying as crap, even when the old layout of things was annoying or crowded, because I hate not knowing where everything is. Especially when I used to.
On the other hand, it is actually kind of nice because I get the chance to see stuff I may not have even realized was there, and if I ever really can't find something or don't have the time to look, I can ask someone who works there.
Enter Windows Vista. :-)
Back in the old days I was able choose whether I wanted to show or hide accelerators before the ALT key was pressed. To change the setting, it used to be that you could just right click on the desktop to click on Properties:
And slide over to the fourth tab of that mondo dialog:
And then click on that Effects... button to get to the setting we want:
the wording of which is "Hide underlined letters for keyboard navigation until I press the Alt key".
This is fairly buried and I am not sure that you could call how one got here to be entirely intuitive, but once you learn it you can get back to it easily. Just like things at the store and knowing how to find them the next time you need 'em.
In Vista, the big mondo dialog has been split up a lot. Let's try to find that setting again.
Let's try right clicking on the desktop again:
Uh oh, change!
Ok, that is a reasonable change, since personalizing it does have a more intuitive feel. Let's suspend judgment and see what happens when we choose it:
Uh oh, now I'm lost.
Wait, it used to be the Appearance tab, and the first item on the list is Window Color and Appearance so let's try that one. Not so bad yet, right? Just click on the hyperlink....
Aha, found that Effects... button. Cool, we are on our way, right? Just click and
Never mind, they moved it. Damn. Ok, we'll go back to that big Personalization switchboard and look for where they may have moved it to:
That Ease of Access hyperlink in the lower left hand corner seems promising, since those accelerator underlines are often considered to be accessibility aides, and Eases of Access has nice accessibility type of sound.
Plus nothing else seems to fit. :-)
Ok, one click, and
Ok, third from the end is the Make the keyboard easier to use hyperlink.
Now technically accelerators actually make it easier to use the keyboard rather than making the keyboard easier to use, a semantic distinction that I will discuss further another day. :-)
For now let's try it out.
Aha, we found it. And the wording has now changed to "Underline keyboard shortcuts and access keys."
Another fascinating semantic distinction -- and another topic to post about soon! :-)
Now I abbreviated my search a bit by pretending I was finding things more quickly than I actually did. In truth, I did not guess this quickly and I did click on a few different options first (mainly because it did not occur to me to look at that link in the corner right away; my eyes were staying near the place I had previously almost found the answer).
But in the end I found a lot of other stuff that will probably come in handy in the future. The functional shuffle helped encourage me to look around a bit. :-)
And if I didn't have time to look around? Well, I can go to the Windows Help and Support center and search for the original string that used to be in the UI and quickly get pointed in the right direction!
(I will follow up several linguistic issues that were raised here, plus the issues about changing this setting programatically, in future posts, soon!)
This post brought to you by ቜ (U+125c, a.k.a. ETHIOPIC SYLLABLE QHWEE)
Tony's question was straightforward enough:
I have a CMD file that attempts to read and parse the contents of a text file but fails due to the file being Unicode. Is there any way to get For to process the file as Unicode or do I need to copy the file to ANSI?I’ve attached Unicode and Ansi text files and if you execute the following “For” lines you’ll see that the Unicode version does not output any results.C:\>For /F %a In (readmeBuildAnsi.txt) Do echo %aC:\>echo UpdateUpdateC:\>For /F %a In (readmeBuildUnicode.txt) Do echo %aC:\>
What Tony has run across is one of those pieces of the console that is unapologetically non-Unicode, even when you run CMD with the /U flag that "Causes the output of internal commands to a pipe or file to be Unicode".
It is hardly alone here, as there are many pieces of the console that do not support Unicode.
Though Stephen Malcolm suggested a good workaround for this particular issue:
Did you try?for /f %a in ('type readmebuildunicode.txt') do @echo %aIt appears that the for command doesn’t recognize Unicode files natively but type does.
This is quite true -- the type command actually goes through special effort to support Unicode parsing of the files it processes, and it can be used to do the heavy lifting in many of these cases.
Stephen also pointed out another example of this kind of use of the type command, this time using the non-Unicode findstr.exe:
Another pain with Unicode files is trying to search them with findstr. This command doesn’t understand Unicode files but you can still you it by doing the following:type <file>|findstr <search_string>This works for the same reason, because type does recognize Unicode files.
The key here (and the reason that this is such an effective workaround, generically) is that what type does is put the text into one of the standard handles, and once it is there then any tool or command can have better luck processing it (because it will be converted out of Unicode and into the console's code page, which all of the non-Unicode command line tools and console commands can handle.
Of course the downside is that these non-Unicode tools will not be able to do meaningful processing on Unicode data outside of that code page; the workaround is simply making it easier for the non-Unicode tools to work as they always would (I would actually love a Unicode findstr!).
But I guess we'll have to wair for Monad....er, PowerShell, for a better Unicode story here in the bulk of the console's processing....
This post brought to you by U (U+0055, a.k.a. LATIN CAPITAL LETTER U)
It was over a year ago that I posted about how LOCALE_SABBREVLANGNAME is so not an ISO-639 code.
But perhaps the title of this post covers the situation a little more accurately, or at least a little more clearly....
The rules are simple enough (though perhaps more complex then I laid out originally), so I will just lay them out, here and now. These are actually descriptive, nor prescriptive, which is to say that I am describing how a bunch of decisions ended up being made. I am not describing some mystical set of rules in a data handbook or anything. :-)
RULE #1: If you take every single one of these three letter codes, then each language within the full set of locales must have the first two letters of the code uniquely represent the language. Thus EN must be English, AR must be Arabic, and so on. This is true for every locale that uses one of these individual languages, so that the Language Bar can have a two letter code to use.
RULE #2: If a language has multiple locales (e.g. the aforementioned EN and AR), then usually the ISO-639 TWO LETTER CODE will be used for the first two letters of LOCALE_SABBREVLANGNAME, with the third letter chosen to uniquely identify the locale. The exception to this is when the uniqueness rule in #1 are not met, in which case a change will be made to make them unique.
RULE #3: If a language has only one locale, such as Japanese or Korean, then usually the ISO-639 THREE LETTER CODE will be used for LOCALE_SABBREVLANGNAME. The exception to this is once again when the unique two letter rule in #1 is not met, in which case a change will be made to make it unique.
Thus, to give the example of a new locale in Vista, Uighur (PRC).... Uighur's two letter ISO 639 code is ug, and its three letter ISO 639 code is uig. since the Uighur language is not used for any other locales, and further since UIG and its first two letters UI do not conflict with any other language or locale, the three letter ISO 639 code is used here.
Now of course this approach is going to upset anyone who prefers Uyghur when the whole Uighur or Uyghur question is raised, but as a choice it is not designed to choose sides, it is simply using the ISO 639 three letter code, which happens in this case to not have its first two letters match the ISO 639 two letter code.
The end result is a code that is uniquely qualified to upset people who feel that their language or locale is being misrepresented in the Language Bar....
This post brought to you by ئ (U+0626, a.k.a. ARABIC LETTER YEH WITH HAMZA ABOVE)
It was last week in the post Variation on that theme of wanting more keys covered by MSKLC that I talked about Olivier's non-specific question about adding additional keys to the keyboard with MSKLC.
And then yesterday, Olivier gave the specifics in a comment:
In my specific case, I was talking about an Apple Japanese keyboard that has 3 keys not present in default layout: - Yen key (left to delete key) - 2 keys, one to left and one to right of space bar (used to switch input method) Under Windows, these keys currrently don't work. Hope this helps. Olivier
So now that he gave some specifics, I thought I would dig in a bit. :-)
For this post, I am going to focus on that Yen key that Olivier mentioned (the same key he refers to on the Japanese Apple keyboard exists on the Windows 106 key Japanese keyboard). Here is a picture of the one I have, just so we know what we are talking about:
Now you can compare this to the keyboard you find up on GlobalDev:
Hmmm.... that key does not really seem to be very well represented, does it? It definitely isn't that other key under the backspace, either.
And even though MSKLC has a Japanese keyboard on the list of existing keyboards, that one doesn't have this key either. Which is why MSKLC doesn't have this key, by the way -- because no existing keyboard ever maps the scan code. EVER.
So let's see if we can do something here about this, okay? :-)
We'll take that little app I wrote in Handling [Unicode] input in the console and use it with that Japanese keyboard to see what this key outputs:
E:\test\READ\Debug>readReadConsoleInput testCtrl-D to quit.# UC u/d VK SC State 0: U+0000 down 00ff 007d 0000 1: U+0000 up 00ff 007d 0000
Ok, the scan code is 0x7d. Now let's see what that 0xFF scan code is in winuser.h:
#define VK_ATTN 0xF6#define VK_CRSEL 0xF7#define VK_EXSEL 0xF8#define VK_EREOF 0xF9#define VK_PLAY 0xFA#define VK_ZOOM 0xFB#define VK_NONAME 0xFC#define VK_PA1 0xFD#define VK_OEM_CLEAR 0xFE/* * 0xFF : reserved */#endif /* !NOVIRTUALKEYCODES */
It looks like this is simply not a VK value, which explains why the code did not find a character!
Of course it is the keyboard layout DLLs that are created by MSKLC that have the job of mapping scan codes to virtual keys. So, now we know what to do, right? We just have to add the right entry to this table!
If you go into MSKLC and choose to save the US layout as a .KLC file (uscustom.klc), you can look at that file in Notepad.
Now just look at the end of a bunch of entries in the LAYOUT table, which look like this:
35 OEM_2 0 002f 003f -1 // SOLIDUS, QUESTION MARK, <none>39 SPACE 0 0020 0020 0020 // SPACE, SPACE, SPACE56 OEM_102 0 005c 007c -1 // REVERSE SOLIDUS, VERTICAL LINE, <none>53 DECIMAL 0 002e 002e -1 // FULL STOP, FULL STOP,
The first column is the scan code, the second is the virtual key minus the VK_, the third is the impact of the caps lock key, and the columns after that are the characters that show up.
So let's add one column just before the "53 DECIMAL" entry (which must be the last entry in the table). Since I am doing this with the US keyboard, I will use VK_OEM_8, a VK value that is not yet used (you can't duplicate scan codes or virtual keys):
7d OEM_8 0 00a5 00a6 -1 // YEN SIGN, BROKEN BAR, <none>
and now we'll compile this keyboard file (change the directory if you installed MSKLC anywhere special):
"C:\Program Files\Microsoft Keyboard Layout Creator\bin\i386\kbdutool.exe" -u -v -w -x uscustom.klc
It will create a uscustom.dll file which you can substitute in for the one MSKLC creates and puts in the uscustom\i386 directory. And you now have a keyboard that will make use of that key on the Japanese 106-key keyboard....
If you play around with the .KLC file later in MSKLC, the entry should remain intact; the only real limitation here is that you can't see the entry in the UI since there is no key there. Which is only the case because not even the Japanese keyboard handles it (this particular key's use is only mediated by the Japanese IME, currently!). Though I at one point suggested adding it to the US and other keyboards, I didn't get much support for it (sometimes I think I wouldn't have been able to get the VK_OEM_102 key added without a ton of proof that it was defined everywhere!).
Now things are a bit more complicated with keys that do not directly involve input, which is why I focused on this particular keyboard (and key). So try not to think of this post as opening up everything, okay?
In fact, there are many special issues involved with the numeric keypad and the cursor keys (and the additional shift state keys) that are involved here which make this more complicated (so I will talk about them another day, probably).
But you can think of yourself as being on the road now for pimping your keyboard beyond what MSKLC does....
Enjoy!
This post brought to you by ¥ (U+00a5, a.k.a. YEN SIGN)
Shell developer Ben Karas has been posting about the property system in the Shell in his new blog The Great Flying Tortoise, and one post in particular caught my eye, the one entitled PROPVARIANT Helpers #6 - PropVariantCompare[Ex].
The function actually has one of those possible situations that drives people who are building indexes crazy, that case where A < B and also B < A. Though thankfully this has more to do with the fact that variants are being compared and thus they are always compared in the context of the first item -- meaning that comparing A to B will use A's type, while comparing B to A will use B's type. And thus what in the world of CompareString and LCMapString is a bug that someone on the SQL Server team is justifiably unhappy about, can become an understandable issue that is not really going to be fatal....
As Ben mentioned, the RTM version of PropVariantCompareEx, while not containing an LCID (or locale name) parameter, will at least contain some flag values to control whether string comparsions are done with StrCmpLogicalW (the default), or the other Shell comparison functions like StrCmp, StrCmpC, StrCmpI, or StrCmpIC. Looking at the flag values (which have not been published yet) they currently appear to literally be flag values using different bits (when obviously you would never really be able to reasonably combine them).
I'll wait for PropVariantCompareEx2 that takes an LCID, or even better takes a locale name. :-)
In the meantime, Ben's blog is definitely one worth keeping an eye on, so I have added it to the blogs I read....
This post brought to you by ꑂ (U+a442, a.k.a. YI SYLLABLE NJURX)
You know that list of languages, the one that keeps getting bigger all of the time? You know, the one with اردو, മലയാളം, Qhichwa Simi, فارسی, isiZulu, ಕನ್ನಡ, नेपाली, Afrikaans, कोंकणी, Setswana, বাংলা, తెలుగు, ਪੰਜਾਬੀ, Lëtzebuergisch, and татарча on it?
Well, let's add Inuktitut to the list, because Microsoft has just released a Language Interface Pack for Inuktitut!
Some background info on Inuktitut (courtesy of Soren!):
Number of speakers: 30,000 Name in the language itself: Inuktitut (a.k.a. ᐃᓄᒃᑎᑐᑦ) Inuktitut is, along with English and French, the official language of Nunavut, the largest of the territories of Canada which was created in 1999. Nunavut is spoken by about 80% of the population there as well as all other areas in Canada north of the tree line, like the Northwest Territories where it is official language, too. In Nunavik, a semi-autonomous portion of Quebec, it has legal recognition and enjoys official support. While for a long time sharing the fate of most indigenous languages in the Americas, namely getting closer and closer to extinction, for Inuktitut the last census data indicate that the number of speakers has stopped declining and might even be increasing in Nunavut. Because of the huge area in which Inuktitut is spoken (see below), it has a big dialectal diversity. Some scholars even count Greenlandic as a variant, though it is more commonly considered a language of its own. Inuktitut is an agglutinative language in which a succession of different morphemes are added to root words to express for what other languages need several words or sentences. Fun facts: Inuktitut is spoken in one of the least densely populated areas of the world: While the area of Nunavut has the size of Western Europe its population is 30,000. Even Greenland has double the density. Inuktitut knows only three vowels (a, i, u), which can be pronounced short or long. How many words are there for snow in Inuktitut? Well, the whole "The Inuit have thousands of words for snow" story is more of an urban legend (and probably based on misunderstandings). But I spare you the linguistic details… Classification: Inuktitut belongs to the eastern group of Inuit, one of the two branches of the Inuit-Aleut (Eskimo-Aleut) language family. Script: Inuktitut is written either in the Latin alphabet (which was introduced to the region by Moravian missionaries) or the Inuktitut syllabary which is based on the Cree syllabary created by the missionary James Evans. This syllabary got its present form in the 1970s when it was adopted by the Inuit Cultural Institute in Canada.
Number of speakers: 30,000
Name in the language itself: Inuktitut (a.k.a. ᐃᓄᒃᑎᑐᑦ)
Inuktitut is, along with English and French, the official language of Nunavut, the largest of the territories of Canada which was created in 1999. Nunavut is spoken by about 80% of the population there as well as all other areas in Canada north of the tree line, like the Northwest Territories where it is official language, too. In Nunavik, a semi-autonomous portion of Quebec, it has legal recognition and enjoys official support. While for a long time sharing the fate of most indigenous languages in the Americas, namely getting closer and closer to extinction, for Inuktitut the last census data indicate that the number of speakers has stopped declining and might even be increasing in Nunavut.
Because of the huge area in which Inuktitut is spoken (see below), it has a big dialectal diversity. Some scholars even count Greenlandic as a variant, though it is more commonly considered a language of its own.
Inuktitut is an agglutinative language in which a succession of different morphemes are added to root words to express for what other languages need several words or sentences.
Fun facts:
Classification:
Inuktitut belongs to the eastern group of Inuit, one of the two branches of the Inuit-Aleut (Eskimo-Aleut) language family.
Script:
Inuktitut is written either in the Latin alphabet (which was introduced to the region by Moravian missionaries) or the Inuktitut syllabary which is based on the Cree syllabary created by the missionary James Evans. This syllabary got its present form in the 1970s when it was adopted by the Inuit Cultural Institute in Canada.
Isn't it fun to see one of those descriptions for the original issues behind snowclones? :-)
This post brought to you by ᐃ (U+1403, a.k.a. CANADIAN SYLLABICS I)
When I recently talked about Inaccurate localization can make you bust out laughing, I found myself thinking about one of the very early Metamagical Themas columns in Scientific American written by Douglas R. Hofstadter, where (in a later postcript in his book of the same name covering the column) he discussed a concept that is rather central to the concept of answering the question What is Localization?:
I wonder what literalists like John Case would suggest as the proper translation of the title of the book All the President's Men (a book about the downfall of President Nixon, a downfall that none of the people around him could prevent). Would they say that Tous le homes du Président fills the bill admirably? Back-translated rather literally, it means "All the men of the President". It completely lacks the allusion -- the reference by similarity of form -- to the nursery rhyme "Humpty Dumpty". Is that dispensible? In my opinion, hardly. To me, the essence of the title resides in that allusion. To lose that allusion is to deflate the title totally. Of course, what do I mean by "that allusion"? Do I wish the French title to contain, somehow, an allusion to an English nursery rhyme? That would be rather pointless. Well, then, do I want the French title to allude to the French version of "Humpty Dumpty"? It all depends on how well known it is. But given that Humpty Dumpty is practically an unknown figure to French-speaking people, it seems that something else is wanted. Any old French nursery rhyme? Obviously not. The critical allusion is to the lines:All the King's horsesAnd all the King's menCouldn't put Humpty together again.Are there -- anywhere in French literature -- lines with a similar import? If not, how about in French popular songs? In French proverbs? Fairy Tales?One might well ask why French-speaking people would ever care about reading a book about Watergate in the first place. And even if they did want to read it, shouldn't it be completely translated, so that it happens in a French-speaking city? Come to think of it, didn't Ioranto once remark that the French for Washington is Montréal?Clearly, this is carrying things to an extreme. There must be some middle ground of reasonableness. These are matters of subtle judgment, and they are where being human and flexible makes all the difference. Rigid rules about translation may lead you to a kind of mechanical consistency, but at the expense of all depth and charm. The problem of self-referential sentences is just the tip of the iceberg, as far as translation is concerned. It is just that these issues show up very early when direct self-reference is concerned. When self-reference (or reference in general, for that matter) is indirect, mediated by form, then fluidity is required. The understanding of such sentences involves a mixture of deriving the content and yet retaining the form in mind, letting qualities of the form conjure up flavors and enhance the meaning with a halo of not-quite-conscious pseudo-meanings, connotations, flavors, that flicker in the mind, not quite in reach, not quite out of reach. Self-reference is a good starting point for investigation of this kind of issue, because it is so much on the surface there. You can't sweep the problems under the rug, even though some would like to do so.
Now the actual column was titled "On Self-Referential Sentences" which is of course why much of this excerpt from the postscript talks about them, but many of the principles that are raised here this text actually relate to the real differences between translation (especially machine translation) and localization.
These are the concepts that good localization -- and good localizers -- can capture. It requires not only a good understanding of the item that is to be localized; it also requires a good understanding of both the source market and the target market.
The localization of software often has it slightly easier, at least in that the formal style of the item might contain less in the way of allusions and such. But to be honest one never knows what one is going to get, which is why a good localizer is needed to pick up the slack....
By the way, the actual column is highly recommended has many interesting examples, such as asking how one might take the sentence in the title and translate it into English.
This post brought to you by ট (U+099f, a.k.a. BENGALI LETTER TTA)
George asked via the Contacting Me... link:
I tried to use the Unicode method of creating half forms in Devanagari on Windows. It worked, but then once I did it the sorting seemed to not work correctly for the half form. What am I doing wrong?
George, you did nothing wrong, this one is all us.
First, I should explain for everyone else what we are talking about, what you meant when you mentioned 'the Unicode method of creating half forms in Devanagari'.
It starts with U+200c and U+200d, the ZERO WIDTH NON-JOINER and ZERO-WIDTH JOINER that I have discussed previously, and the effect that these characters can have in Indic scripts.
The effect is best described in the Unicode FAQ on Indic Scripts and Languages and its question #17 (I cannot find on Unicode charts the "half forms" of Devanagari letters (or any other Indic script). These characters are needed to form words such as "patni".)
The three forms, which you will be able to see if you have a conformant browser are:
त्न U+0924 U+094d U+0928 -- Devanagari tna using the tna ligature त्न U+0924 U+094d U+200d U+0928 -- Devanagari tna with a half ta and a full na त्न U+0924 U+094d U+200c U+0928 -- Devanagari tna with a full ta, a visible virama, and a full na
त्न U+0924 U+094d U+0928 -- Devanagari tna using the tna ligature
त्न U+0924 U+094d U+200d U+0928 -- Devanagari tna with a half ta and a full na
त्न U+0924 U+094d U+200c U+0928 -- Devanagari tna with a full ta, a visible virama, and a full na
(if you don't see three different forms then you can look at that Unicode FAQ link)
So that part is easy enough.
And one part of the collation story on Windows -- the fact that both ZERO WIDTH NON-JOINER and ZERO WIDTH JOINER both are characters that intentionally have no weight, is also there, as one would expect.
The place where all is not perfectly well is in the compression part -- and there are many defined compressions for languages like Hindi when consonants and independent vowels combine with Candrabindu, Anusvara, Visarga, and Nukta. George must have been trying to get a half form with one of these compression cases like with a nukta, which will work, although it will make it sort in a slightly incorrect way. :-(
As luck would have it, this is not going to be a very common problem since it will sort very close to where it should be and you would only notice the difference if you were doing a test for equality or had do many entries all together (like in a dictionary) where such differences are more easily noticed....
Good catch, George! Of course since the fix for this would require changing the results of strings that valid according to the IsNLSDefinedString function, it would mean a major version change (even if only for specific locales like the Indic ones). Which is the kind of change that can only usually be done in a major version. Really too late for Vista, but this is definitely something to look at for next version....
(In my mind this counts as the sort of bug that I am sorry to see us ship, though I understand why it happened; it solidified in my mind the importance of our close engagement with the Unicode Standard!)
This post brought to you by U+200c and U+200d (a.k.a. ZERO WIDTH NON-JOINER and ZERO WIDTH JOINER)
A recent question I received via email from a colleague who preferred to remain anonymous on the blog:
Hope everything is going well with you first of all...May I ask for your help on an NTFS technical question? I'm currently involved in some CIFS/NTFS compatibility related issue discussion and wondering what would be the first Windows release that supported UTF-16 and characters of beyond the BMP area?Based on the http://en.wikipedia.org/wiki/NTFS, it is Windows 2000 but I'm not quite sure if that's official or really correct.Would you please let me know if you have the info handy or point me to one of the public documents available at Microsoft web sites? (I was trying to do web search but I wasn't really able to find the info from www.microsoft.com...)Thanks very much in advance for your help and hope this isn't a trade secret that I'm asking for...
Well, since as far as I know I don't know any trade secrets about NTFS, we are probably safe on that count, at least! Just to make sure, I'll stick to stuff that anyone can verify themselves if they want.... :-)
Of course there is the info I just put up in this blog post for starters, and I'll go a step further and make it clear that you can use high surrogate and low surrogate code units in NT even before they were actualy defined (since none of the current or past incarnations of NT disallow unassigned code points).
The Wikipedia article is really quite misleading on this score with its text:
File names are stored in Unicode (encoded as UTF-16, although limited to the Basic Multilingual Plane in early versions before Windows 2000).
Well, I'll point out that whoever wrote this bit either confused NTFS with Active Directory (which is actually limited on this point until Windows XP/Server 2003 which is when surogate code units first received weight) or they simply don't understand NTFS and did not test creating such files on NT 4.0 or earlier.
In my ideal world, a future version of NTFS would actually (optionally) take into account both characters defined in Unicode and also Unicode normalization, but as far as I know there isn't anyone planning such a thing yet.
So if I absolutely had to describe NTFS in terms of a Unicode version, I'd say it uses a very early version of Unicode and it assumes that anything it believes to be unassigned code points it allows for forward compatibility. :-)
This post brought to you by / (U+002f, a.k.a. SOLIDUS)
You wouldn't think that being involved in software development would mean that one was involved in politics.
But then of course it turns out one would often be wrong, in that case. :-)
So let me see, situation #1 has to do with the way Windows Activation works if you have no Internet connectivity. You are given the opportunity to dial a call center that will walk you through the activation. By default the choices you are given will be for nearby call centers based on information you may have given about where you are.
Looks good so far, right?
Ok, now think about whether it may occur to you if there are any areas in world where a call center, even if nearby, may not be where you really wish to call. Like, whether or not you are okay with a place being there, you just wouldn't want to have to call such a place.
I'll give you a hint: if one is in Jerusalem, one may not necessarily want to call into the West Bank, or vice versa. Or at least not see it as the first choice.
Correcting such a situation seems very reasonable (in practice it can sometimes be more complicated than other times); even as people work to make such changes in this area, hopefully if any minor issues that manage to pop up here over time people will recognize that they don't have to call such a number, and forgive it. And this would be true in both directions, I would hope. There does not need to be limits on reasonable behavior on either side. If you know what I mean.
Of course one may also be able to imagine people not liking particular entries in the location list of the user interface like this one:
or this one:
or any of the other various entries, any one of which may contain one or more large or small groups of people who would like to see it removed from the list.
But to be honest I am a bit less sympathetic to this point of view, for really any country, region, location, or other on this list. I mean given Windows is a worldwide product, as long as someone, somewhere may be interested in choosing one of the 261 entries in the list that has its own GEOID in Vista, then it is allowed to be there.
With that said, Microsoft did learn the lesson about borders on the time zone map. But we had to draw the line at generic lists like this one....
This post brought to you by Ы (U+042b, a.k.a. CYRILLC CAPITAL LETTER YERU)
I was thinking about Bill Poser's Unintended Implicatures over on Language Log from earlier this month and it occurred to me how often this happens in Microsoft products.
I mean, can't almost the entire product lifetime of Microsoft Access and Jet database with people implicitly believing that they are unstable and easy to corrupt be based on the simple fact that both Access and Jet (in documentation, UI, and error messages) refer to:
What were they thinking? What are they still thinking? I mean, they still have not rushed to fix this, despite nearly a decade and a half of feedback on the topic. Just think, the whole industry might have complained about how over-cautious Access is, or even how paranoid. But the combination of error messages not complsining of corruption and documentation calling the process verification rather than repair could have made a huge difference.
When I have mentioned it to people on the Access team as high up in the food chain as Craig Unger (who was a PUM for Access at the time) or Richard McAniff (VP for Access and Excel), they did not disagree and said that people had noticed this in the past, but it is hard to change the momentum of what people think now. I definitely never got the sense that anyone was going to change anything....
Certainly Bill's thoughts a few days later on how Microsoft Redefines "Genuine" are yet another example of this phenomenon (though Bill did not call it such at the time). When I install Vista and before I have activated it, the UI claims it is not genuine. The same things happens if I do to the download site and try to download something that requires a "Genuine" copy of Windows.
Who decided to call Windows with an unknown status NOT GENUINE and Windows that has been verified as valid GENUINE? Isn't that naming scheme implying a guilty until proven innocent rather than the other way around? And as amusing this might be for an internal version of Vista that I just built myself, it is probably not nearly as likely to funny for someone who will buy the product once it is available externally.
I remember seeing a presentation about WGA in the early days (MSKLC was slated as requiring WGA to download while MSLU was not, and I wanted to see what they were talking about). I saw that my old friend from the early days (when I was a contract Access dev and he was in developer marketing) David Lazar (Director of Genuine Windows), and I talked to him after the presentation and pointed out the problem here in the language. He didn't disagree with me, and even pointed out others had suggested as much in the past, but I got the feeling that nothing was going to change (and nothing has, of course).
Now in these cases and others like them in Microsoft, I can hardly claim to be the sole voice of reason -- I have even been told point blank that others had noticed the unintended negative consequences of chosen language. And of course no one can call it unintended at this point since if you know about something then at some point it really can't be unintentional, even if it was unintentional at first.
I suppose it is hard to change the messaging and terminology later, though folks have no problem doing it when it is strategic (like NGWS to .NET, for example).
I could be cynical about it and assume it may just be that marketing does not like to change the messaging if it is not on their own terms and in their own time, and that product teams like to avoid changing terminology on existing features.
Or I could put on an anti-Microsoft hat and assume that even internally people at Microsoft think Jet does corrupt and users are pirates until proven genuine.
But I am just naive enough to think that it has got to be due to something deeper than that....
I just don't know what it is yet. :-)
This post brought to you by ᠭ (U+182d, a.k.a. MONGOLIAN LETTER GA)
A while back (well, in March of this year) I was talking about Traditional versus modern sorts, and I mentioned there that I wasn't going to talk about Traditional Spanish then, but that you should stay tuned for a future blog post.
Hopeully you were not holding your breath or anything, but let me be the first to welcome you to the future blog post!
The Traditional Spanish sort (0x0000040a) has a long and interesting history, including but not limited to the fact that Mexican Spanish (0x0000080a) actually went right along with it even after the rest of the world including Spin had moved on. In fact, this lasted until Windows 2000, when the sorting for Mexican Spanish was silently move over to match the rest of the locales in the Spanish speaking world, a fact that most people never seemed to notice (people in Mexico might have!).
When the actual change took place is more complicated to figure out then one might think, due to the fasciatingly complex machinations of the Windows development process. And it was made even more complicated by the fact the Windows 2000 beta data was at one point snapshot'ed by Jet 4.0, though the way they and later SQL Server flattened non-unique collations hides the fact that they inherited a Traditional Spanish Mexican sort.
Lucky them, lucky us. :-)
Now the original change to add an LCID for new collation before LCIDs contained the notion of a SORTID is one that was able to embarrass me in the past, and has left Windows with an interesting wart where both 0x0000040a and 0x00000c0a are both enumerated as regular LCIDs even though all of their data other than the name and the sort are basically identical. A whole generation of applications trying to show lists of locales have to know to strip that 'Traditional' one out.
And when you consider how we silently slid Mexico over, we likely should have done the same with Spain, too (did anyone notice how we palmed the Mexican card?).
In Vista, the addition of EnumSystemLocalesEx with its LOCALE_ALTERNATE_SORTS that returns locale names alongside the venerable EnumSystemLocales with its LCID_ALTERNATE_SORTS led to an interesting question. I mean, as Julie pointed out to me the original concept of alternate sorts came out of the postmortem of the Traditional/Modern Spanish issue.
Perhaps Vista, with its new major version for collation, was the best time to move 0x0000040a over into the alternate sorts column, since that is what it actually is, what alternate sorts were really designed for?
And this is the way we went.
The documentation for EnumSystemLocalesEx points it out rather directly:
If dwFlags specifies LOCALE_ALTERNATE_SORTS, the callback function will be called for every locale that represents an alternate sort order. For example, Spanish (Spain) defaults to international sort order, but traditional sort order is available as an alternate sort. German (Germany) defaults to dictionary sort order, but there is an alternate phone book sort order available.
The question of what to do with EnumSystemLocales was one that was hotly debated between those who thought that the functions returning consistent results was most important versus those who thought that the backcompat was most important versus those who think both oif these points are equally important requiring both functions to return the old result.
One could argue that backcompat has never been too terribly important here for sorting updates and alt. sort updates. Whether one remembers (0x00010427) Classical Lithuanian or (0x0000080a) Mexican Spanish that sorts like Traditional Spanish or the (0x00010412) Korean and (0x00010411) Japanese Unicode sort orders in Windows that I talked about here and here, it is obvious that we have had no problem just making the change in a new version it's the right thing to do.
Why we stood on ceremony for Modern Spanish and even gave it another LCID is beyond me, though in the end the plan for Vista is to have both enumeration functions in Vista treat Traditional Spanish as an alternate sort and not include it in the regular locales.
Which (by the way) is also the reason for how long it took to do this post. You see, I put this decision on the list of decisions that whether right or wrong I could see some major bug being put in that would require the plan to change, so I essentially waited to see if the decision would withstand the big bug (which it did, yesterday).
In total fairness to the bug I coded the 'fix' to revert the old funtction's behavior and it was tested, while the whole conversation about what to do was discussed and triaged and rediscussed and retriaged. I exerted minimal inluence on any of the conversations though I agreed with every correct and intelligent point that was brought up (there were many) and said I would go with whatever decision people wanted to go with (as an ultimate test of the sanity will prevail theory).
And you want to know what? I think that in the end, it did.
So, it is now official -- as of Vista, Traditional Spanish is seen as an alternate sort in Windows, and next week when I am talking to Julie before Vienna Teng opens for Madeleine Peyroux at the Moore I can tell her that we finally cleaned that little issue up. :-)
This post brought to you by ñ (U+00f1, a.k.a. LATIN SMALL LETTER N WITH TILDE)