Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Where were you at 10:54am on February 28, 2001?
It was a Wednesday, if that helps you remember.
Or, if you are more event oriented, it was the date and time that the "clocked at 6.8" Nisqually intraplate earthquake happened.
I was at work, in Building 9 at Microsoft.
I had come in early that day because I was excited about finding the fix for a bug in the Microsoft Layer for Unicode on Win9x Systems, which was going to be announced soon. In fact Cathy and I got the approval to not turn in our slides for our stodgy sounding Unicode on Downlevel Windows talk at the 18th Internationalization and Unicode conference in Kowloon Bay (HongKong). We broadly hinted that it was something e couldn't talk about yet but they wouldn't regret sticking with us (our slides were technically late at that point, but since both of us were on the committee for the conference we had some ability to influence and plead to not be replaced by a backup talk.
But alas, I digress.
Anyway, I was working on this bug that I had found a fix for, and wanted to make sure it wouldn't cause performance problems.
Going on with me personally, the Multiple Sclerosis was (really just in the few months prior) starting to have a more marked effect on my balance, as I had been moving from an 'occasionally falls down" place to a more "depends on the cane to not fall down all the time" place. and I had moved from mere disequilibrium (where I wouldn't feel unsteady but the ground would suddenly come up at me) to a more overt feeling of unsteadiness that I could no longer ignore but was doing my best to not pay attention to.
People in the hallway were making noise all of the sudden. So I grabbed my cane and got up to investigate.
Everyone described what was happening with us all pretty much poking our heads out of our office doors, basically standing in our door frames because of some vague notion that this would be safer.
We were a bunch of n00bs when it came to earthquakes, and all of my previous seismic experience (time spent in Japan and in Southern California) usually involved my being somewhat intoxicated and/or romantically entangled, so it wasn't like I had much to add anyway.
I didn't feel anything different, though.
My world had been going topsy turvy all the time now. Though I did have one comment on the matter that I made to the people looking out into the hallway:
Now you know, now you know what it's like to live in my brain.
We had no injuries; the epicenter was far away from us.
And we did all get back to work after that, and much later when an "emergency procedure" manual showed up in all our offices, even the page explaining what to do in case of volcanic eruption (call reception, don't leave the building or touch lava) had a slight edge to it beyond the obvious humor, since if we could feel an earthquake did lava seem so very out of the question?
Of course those manuals are gone, so we have no idea what to do In Case of Lava....
But alas, I digress again.
Anyway, there you go -- my experience of the Nisqually intraplate earthquake of 2001. It was good it happened late enough that people were around or I may not have even noticed the ~46 seconds that everyone in the Pacific Northwest got a little bit of what it was like to be me....
You're welcome, of course.
Ever since I wrote About the Y1C problem, which really isn't too much of a problem (except maybe in North Korea)...when I said I'd probably say a word about the Japanese calendar, I knew I'd eventually be saying something about the Japanese calendar.
The fact is that the Microsoft implementation of the Japanese Imperial calendar has always bothered me. For many different reasons.
Thus this new series....
Now my dislike is not because of the principal issue I mentioned in Long live the Emperor, a not-as-uncommon-as-it-should-be complaint I hear from non-Japanese developers about the calendar who, despite being type of nerds who would never truly take violent rhetoric to the level of suggesting Microsoft predict the date of the death and/or end of rule of the Emperor nevertheless unknowingly advocate for the Emperor to leave office.
The fact that the problem continues to crop up is a reflection on the people making the complaint, not of the calendar itself.
And also not because of the principal issue I mentioned in Y oh Y does YYYY sometimes mean YY, you ask?, because although it can sometimes be confusing, the hypothetical need to support eras of over 100 years is proven wrong rather readily.
In both cases, perceived inconsistencies that in fact mirror actual usage of the calendar do not bother me; they only bother people too caught up in the technical issues to understand or even notice those usage issues.
And yet the problems I do have with this calendar and in their own way kind of brought up in both articles.
For a hint, I'll suggest thinking about the frustrations I have with the Umm-Al Qurah calendar of Saudi Arabia that I discuss in Long term planning is not always done.
If one recognizes some of the limitations of using a religious calendar like the Hijri one to deal with civil matters to such a degree that you create a whole new calendar like Saudi Arabia did here, it seems almost irresponsible to have the period it supports be so short that common scenarios like machine certificates and such will fail when you try to use such a calendar.
In the case of the Japanese calendar, my concern is not really on Japan though. It is on Microsoft.
It just seems ridiculous that when we have millennia of attested data (and even more unattested, legendary data) on the Japanese monarchs (e.g. see the List of Japanese monarchs in Wikipedia) that we limit ourselves to the last four eras (平成, 昭和, 大正, and 明治).
Why wouldn't we stretch this out a little further than that?
Like at least to where the historical Gregorian years are reasonable to use....
I mean, it is more than just ironic that you can use the GregorianCalendar to refer to dates that exist before the calendar existed yet you cannot use the JapaneseCalendar to refer to dates that were well within its range
The design of the calendar, however, which considers the current era to be #4, they left no architectural room for the up to 121 previous eras for the up to 117 previous rulers.
I suppose they could work around this by taking advantage of the fact that a System.Int32 is being used for the era-related methods and use negative numbers for the ones prior to "era #1" though this is less than ideal for a bunch of reasons. If I ever chose to fix this bug I'd create a new class and do it right from scratch.
But there are other reasons I dislike our implementation of the JapaneseCalendar class, which I will continue to talk about in part 2....
Over in the Suggestion Box, Geoffrey Coram asked:
I'm the "lead developer" for the e-mail application nPOPuk. I recently added some code to help a Russian user on a Windows CE machine: apparently, the KOI8-RU codepage is not installed on WinCE, so messages with charset="koi8-ru" were thoroughly corrupted.
So now I'm thinking about my app, which comes in Unicode and ANSI versions, and wondering:
1) Is there an easy way for the ANSI version to tell the user, "hey, you typed a Unicode character in the message body"? I typed some Cyrillic in the window, and when the app sent WM_GETTEXT, it got back a bunch of question marks.
2) The user can select the charset (UTF-8, ISO-8859-1, KOI8-R, etc.) for sending the message; is there an easy way to tell the user, "hey, there are characters in your message that aren't available in the charset you selected"?
I suppose I could (a) stop compiling the ANSI version and (b) force all messages to be UTF-8, but that seems draconian.
This is one to take one piece at a time.
At least in the body of email messages, the ability to have text that uses different random encodings that the mail client will support has been a long-standing principle.
Not every message coming to an email client is limited to a single encoding that it understands.
Of course whether one extracts the context via RTF functions or HTML functions or some other means will largely depend on the client, though generally HTML seems to be the one that all mail clients support to some extent.
Although in theory you could support an email using any encoding using HTML, in fact there is a device-based limit in mobile devices since it may not be able to convert/parse/display every encoding.
If you are using Platform Builder to build an image for a device, you may have more flexibility here but even that only goes so far. Messages using KOI8-RU and other such code pages can suffer here, and there really isn't a good answer in the platform (though if it is a limited number of code pages one could just ship the tables for a few others)....
As to that second question, in order to read it in an app, you can just try to convert it to Unicode one way or another. If you are unable to do that conversion, you can definitely warn the user that you are unable to parse the text....
THE MONGOLIAN (CYRILLIC) LANGUAGE INTERFACE PACK FOR WINDOWS 7 IS LIVE!
You can download the file for the 32-bit version or the 64-bit version.
Like the Turkmen LIP, it does not currently have a download page for reasons that are not the fault of any of the people i respect and which I don't feel like getting into....
The Mongolian Windows 7 LIP is produced as part of the Local Language Program sponsored by Public Sector.
A LITTLE BACKGROUND INFORMATION ON MONGOLIAN:
Number of Speakers:
Cyrillic script: ~2,500,000Mongolian script: ~3,000,000 (difficult to obtain reliable figures on it)
Name in the language itself:
Монгол хэл
Khalkha Mongolian is the National language of Mongolia and is known to 90% of the population. It is now written using the Cyrillic alphabet, although in the past it was written using the Mongolian script. An official reintroduction of the old script was planned for 1994, but this has not yet taken place as older generations encountered difficulties (also, the lack of consistent and widespread support on computers has also had some impact here).
Fun Fact:
There is a tradition of giving names with unpleasant qualities to children born to a couple whose previous children have died, in the belief that the unpleasant name will mislead evil spirits seeking to steal the child. Muunokhoi 'Vicious Dog' may seem a strange name, but Mongolians have traditionally been given such taboo names to avoid misfortune and confuse evil spirits. Other examples include Nekhii 'Sheepskin', Nergüi 'No Name', Medekhgüi, 'I Don't Know', Khünbish 'Not A Human Being', Khenbish 'Nobody', Ogtbish 'Not At All', Enebish 'Not This One', Terbish 'Not That One'. This tradition is one that is also familar in other cultures such as in ethnic Judaism.
Click here for more information on the Mongolian Language.
Classification:
Mongolian belongs to the Mongolic languages. The delimitation of the Mongolian language within Mongolic is a much disputed theoretical problem, one whose resolution would probably require a set of comparable linguistic criteria for all major varieties.
Click here for more information on the Mongolian classification.
Script:
As previously stated, in Mongolia (a.k.a. Outer Mongolia) the Cyrillic script is used, while in China (a.k.a. Inner Mongolia) the traditional Mongolian script is used. The literacy rates have been much higher in Mongolia since the switch to use the Cyrillic script, due to a large push to increase literacy. The fundamental change (from a "top to bottom, right to left" script to a "left to right, top to bottom" script) and the fact that that the latter is so much easier to support in technology may be having the same problems as other "formerly vertical scripts" have seen, though sufficient formal study is currently lacking.
Click here for more information on the use of the Cyrillic script with Monglian, and click here for more information on the use of the Mongolian script.
Microsoft-specific:
Mongolian is yet another locale for which we made the wrong decision in its LOCALE_SNAME/CultureInfo.Name value. By choosing mn-MN rather than mn-Cyrl-MN, we were left with adding another Mongolian and whether to name it with a consistent yet dfferent script mn-CN or an inconssistent but more accurate mn-Mong-CN. we went with the latter, but in retrospect we should have added the script since more than one script is used for the language. Lesson learned. :-)
I spoke about some of the technolgical challenges previously in Looking at life a bit more vertically, for a moment.... As with other scripts that have large tehnical challenges in user interface usage (e.g. Tibetan), I find myself uncomfortable with the impact that technology may well be having on the long term directions of growth. In the case of Mongolian, the fact that (for example) every version of Microsoft Access from the last ten years with the Mongolian Baiti font that now ship in Windows can do things like the following for both display and input controls:
may start helping the future of Mongolian where it is applicable, even if the UI issues in Windows itself are not addressed.
Enjoy!
THE WINDOWS 7 TURKMEN LANGUAGE INTERFACE PACK IS LIVE!
You can download the 32-bit version right over here and the 64-bit version right over here.
It does not currently have a download page for reasons that are not the fault of any of the people i respect and which I don't feel like getting into....
It can be installed on Windows 7 SP1(you must have SP1 installed!) with either Russian or English resources, and either 32-bit or 64-bit (just pick the right download, of course!).
The Turkmen Windows 7 LIP is produced as part of the Local Language Program sponsored by Public Sector.
A LITTLE BACKGROUND INFORMATION ON TURKMEN:
4 million
Name in the langauge itself:
türkmençe
Turkmen is the national language of Turkmenistan. It is spoken by approximately 3 million people in Turkmenistan, and by approximately 380,000 in northwestern Afghanistan and 500,000 in northeastern Iran.
Like other Turkic languages, Turkmen is characterized by vowel harmony. In general, words of native origin consist either entirely of front vowels (inçe çekimli sesler) or entirely of back vowels (ýogyn çekimli sesler). Prefixes and suffixes reflect this harmony, taking different forms depending on the word to which they are attached.
Click here for more information about the Turkmen language.
Turkmen belongs to the group of South Turkic languages in the Turkic branch of the Altaic language family. It shares this group with Turkish and Azerbaijani.
Click here for more information about Turkmen classification.
Turkmen only started to appear in writing at the beginning of the 20th century, when it was written with the Arabic script. Between 1928 and 1940 it was written with the Latin alphabet, and from 1940 it was written with the Cyrillic alphabet. Since Turkmenistan declared independence in 1991, Turkmen has been written with a version of the Latin alphabet based on Turkish.
Click here for more information about the Turkmen script.
It is unclear whether it was intentional or not, but despite being based on the Turkish alphabet, the Turkmen locale on Windows does not do Turkic casing. If this is wrong, someone should tell us so we can fix it some day.
Sometimes locale data bugs can go a long time without anyone reporting the bug to Microsoft.
The many potential reasons for this observable fact may be an interesting topic to blog about someday.
Interesting by my definition, that is. By which I mean I could actually enjoy writing it!
But not today.
Today, I'm going to talk about an interesting subgroup of bugs in locale data.
First, we're going to travel to scenic Afghanistan. Where Dari is the most spoken language, by a long shot.
We have the Dari locale prs-AF, which we created a Language Interface Pack for in Windows 7 (I mentioned this in I Dari you! Heck, I Double Dari you!).
Now the Dari status as "the last Windows 7 LIP" is quite interesting, actually.
Because a predecessor LIP, also aimed at Afghanistan, was Pashto (ps-AF) -- which had a similar "LIP" status, as I mentioned in an earlier blog (The last XP LIP? We'll head it off at the Pas[hto]).
Now we didn't end up doing a Pashto LIP fo Vista or Windows 7, so in some ways one could think of the Windows 7 Dari LIP as an XP Pashto LIP replacement.
However, if you found yourself nostalgic about Pashto going away, don't worry; a little bit of Pashto lives on in Dari.
The Dari locale has the Pashto month names!
As you can see here by looking at them both in the Locale Builder:
(you can click on the images to see bigger versions of them)
Now this is something I have had reported to me from a couple of different sources, including someone who responded to me by emal after my Dari LIP blog.
But this problem is not unique to Afghanistan.
Or to month names.
For an example, you can head to Nigeria.
We just shipped a few Windlows 7 LIPs for Nigeria, as I mentioned in In Nigeria? With these three LIPs out, maybe Windows 7 was your idea!,
Well, if you look at the ig-NG (Igbo) and yo-NG (Yoruba) locales, you may notice something.
They share more than just a country.
They share day names!
They are the Yoruba day names, showing up in both locales. I only say that for the sake of completeness since I'm sure you all spotted the fact that these aren't Igbo names already and just hadn't sent the mail yet reporting the bug. Right?
(once again, you can click on the images to see bigger versions of them)
Now on the bright side, for both of these cases, in the community of people who are using computers, knowledge of both of the languages in question (at least enough to know the day names of the month names) is not as uncommon as it might be in some places.
And that might be part of the reason for the mistake here.
It is an interesting one, either way. Why would it be missed?
So you may have noticed that back in October that I blogged Unicode 6.0.0 is [virtually] released!.
Hopefully no one was holding their breath waiting for everything to get finished and written up. Because at the end last week, the full release happened. :-)
From the release notification:
Mountain View, CA, February 17, 2011 - The Unicode® Consortium is pleased to announce the publication of the final text of the core specification for Unicode 6.0. The Unicode 6.0 core specification includes information on scripts newly encoded in Unicode 6.0, as well as many updates and clarifications to other sections of the text. The release of the core specification completes the definitive documentation of the Unicode Standard, Version 6.0
In Version 6.0, the standard grew by 2,088 characters. Over 1,000 of these characters are symbols used for text exchange on mobile phones. The Unicode Standard now also includes the recently created official symbol for the Indian rupee. After computers and mobile phones update to Version 6.0, the rupee sign will be available for use like the $ or € now.
In addition, this version adds many CJK Unified Ideographs in common use in China, Taiwan, and Japan,as well as characters for African language support, including extensions to the Tifinagh, Ethiopic,and Bamum scripts. Three scripts are supported for the first time: Mandaic, Batak, and Brahmi.
In October of 2010, the other portions of Unicode 6.0 were released: the Unicode Standard Annexes, code charts, and the Unicode Character Database. This allowed vendors to update their implementations of Unicode 6.0 as quickly as possible.
For more information on all of The Unicode Standard, Version 6.0, see http://www.unicode.org/versions/Unicode6.0.0/
You can think of this as the "close parenthesis" to end the phrase....
For the record, I'm not claiming Ski Johnson is behind the scam in question; thus far he can mainly be seen to be guilty of lying to some promoters. He is the face of these things though
You know, I had a feeling that tonight was gonna be a good night, that tonight was gonna be a good good night.
I was going to be attending a Black Tie Gala Fundraiser for Cancer @ the W Hotel in Seattle.
Jazz For Life Foundation and the American Cancer Society were going to bring us an evening of entertainment and charity. Special appearances by many celebrities, like Michael Douglas. And James Earl Jones. And the Seattle Seahawks. And The Housewives of DC. And The Housewives of Beverly Hills. And a host of other celebrities, including Grammy nominated Ski Johnson -- one of the mainstays of Jazz for Life.
This guy:
There would also be a live auction there, with Items from the Vampire Diaries, a Signed book by the Kardashians, a scooter from Ducati Seattle, a Diamond Necklace by Diamonds by Maria, and much much more. With an open bar, LIVE auction, dancing, appetizers, celebrities from around the nation and Seattle’s top socialites, public figures, and more, this was the kind of event I hadn't ever really been to in Seattle before.
In LA? Sure. But not in Seattle.
My friend Doug and his wife were going to go be going, with VIP tickets. But it turned out they would be out of town so he offered the tickets to me. I snapped them up quickly. I with Ellen, Pam with Evan. We were gonna have a lot of fun. I figured I'd buy something at the auction, at least -- so I'd be helping out the ACS myself, too.
The story kept growing outward, in concentric circles....
Free tickets were going to be given to people with really inspirational cancer stories.
But then things started happening.
Like they changed the venue just weeks before the event (to the Grand Hyatt). Hardly unprecedented, but certainly out of the ordinary.
And the announcements of new celebrities and new items up for auction were getting more and more outrageous.
Then the wheels started to come off the wagon.
As KOMO reported on Wednesday, it turned out that the celebrities whose presence had been promised were not going to be coming.
Most of the sponsors have bailed, as well as the donors. The American Caancer Society was indeed to be a recipient, but for a wide variety of reasons (including percentage of the money going to charity?), they were not a sponsor, or a host.
And when asked to comment about all of it, Ski had apparently not been entirely truthful about the celebrities, even if he was telling the truth now. Something that no one really believed at this point anyway.
How is it that they describe cancer -- immature cells withsporadic behavior? Exactly.
And there were lots of other shenanigans, like about the need the promoters had to have expensive items to be auctioned sent to them on the East Coast, just before the event. Even from locale Seattle sponsors.
And I have been hearing worse things from others, including some of what the "W" folks have said off the record. They are mum now on paper, but also relieved to have gottn out from under this travesty.
Now most of the sites that previously talked about the event have now marked it as canceled. And the note has been removed from the Jazz For Life site, too (the hotel still says there will be an event, though reportedly Paypal has frozen the money in the JazzForLife account so that the refunds can happen and there is no longer a way to buy tickets -- as if anyone would at this point!).
Though Jazz For Life did take down their info on this event, they do list similar events from the past on their site, and I can't help wondering if those were also fraudulent along similar lines.
Their own version of the even on Facebook (here) still lists the event as being at the W. Too bad there is no way to buy tickets, huh?:-)
Somehow I doubt Ski will even show up. It is almost tempting to pop by and see if anything at all happens now that the rest of ths sponsors and donors have pulled out.
Early on I almost wrote a blog about the whole phrase "Black Tie Gala Fundraiser for Cancer" being silly since of course no one would fight for cancer.
But I decided that would be kind of immature, and let it go at a few comments to colleagues at work about the language issue -- like isn't it great that someone is finally taking up cancer's cause?
Silly, I know.
Ironic that Ski Johnson, Jazz For Life, and others may have done a lot for cancer, as people put in their claims to PayPal to get their money back and don't end up donating anything to the American Cancer Society.
Doug put his claim in for his refund. He and his wife will donate the refunded money directly to ACS and I hope more people do that.
As for me and Ellen and Pam and Evan? We're still gonna do something tonight.
And I gotta feeling that tonight’s gonna be a good night. That tonight’s gonna be a good good night.
Now that Ski isn't scamming us anymore, at least.
Blogs like When the roof got raised, and why and Number format and currency format are not always the same) and Why does the percent stuff have so many restrictions?(the former two talking about the growing pains involved in extending locale support as new languages brought new requirements years ago, and the latter talking about a limitation documented here that is architecturally fixed in Windows 7 and may one day get its data fixed if we are lucky, point out that NLS is a reactive business.
We have something out there, it turns out to not be enough, and so things are changed. Enhanced. Stretched. Modified.
Other times, it is silly to touch things at all. There are times that a language has a similar concept that is different enough that trying to make it work within existing support that "fixing" it just makes no sense.
Like for one thing, consider LOCALE_S1159 and LOCALE_S2359, the per-locale AM and PM indicators.
In a language like BengaliBangla (ref: Even in India, the language is actually known as Bangla (not Bengali)), have the following set in the locale:
LOCALE_S1159 পুর্বাহ্ন
LOCALE_S2359 অপরাহ্ন
If you know Bangla you might see the problem here.
Let's look at these two words in the larger context in which they exist:
This is a multi-part problem, of course.
Now in general terms someone in Bengal or a Bangla-speaking part of Assam or Bangladesh from that table along with a time is the kind of thing one would want in a time format.
One would not generally do so much with AM or PM after the time in these places.
I emailed with friend Omi Azad about it for a bit and he confirmed that the use of these terms would simply be more intuitive; forcing everyone into the 12 hour clock we use with these two less than perfect terms is far from ideal.
The folks in India and Bangladesh are not alone here, either -- Malay has a similar issue (they would use pg for the morning, tgh for 12 to 4pm, ptg for 4-7pm, and mlm for after 7pm) which has the same problem when itcomes to dding it to our time format notions.
By its very nature this would be a much bigger change, making the architectural investments to support:
Here in the US we have such terms though I can't say I'd expect them in a formatted time string.
Even after confirming with Ben and Shihab and Omi and Goldie that some or all of these terms are used, it is still not entirely clear to me whether they would be expected in a long time format, or whether instead this conceptual jump is due to Bangla people moving to the nearest conceptual analogue that they have to our AM/PM and identifying it, since AM/PM wouldn't naturally occur to them if it isn't exactly how they would look at the world.
But since a similar construct is use in the US and other places, this new architecture would make sense, as would going out and trying to get all the data for it across all those locales.
Though obviously this would pretty unlikely at this point.
Bengalis who wanted such a mechanism for time formatting are probably going to have to keep writing their own code, alongside a 24-hour clock.
Or go back in time 10-15 years and make the case then, of course.
Okay, let's assume that change is not going to be heading our way.
There is another problem and I was having it in my reading research on this problem in my elementary "learning Bengali" books and that when I started describing my troubles Omi pointed out with those AM/PM strings that appears to exist in our Bangla fonts. In his words:
হ ্ ন is currently হ্ন but has to be হ্ণহ ্ ণ is currently হ্ণ but has to be হ্ন
So when the font is fixed they will look like পুর্বাহ্ণ & অপরাহ্ণ
So the idea is that the HNA and HNNA conjuncts in the Bengali fonts are perhaps reversed?
If he's right that would explain the trouble I was having.
I was going to check with Goldie too, but she is in Mexico and asking her to be typing in Bengali script seems like a little much. I'll wait til she gets back to ask her....
In the meantime, I'm wondering how many people might be typing words the wrong way to get the right appearance, and how much that might muck around with search in the meantime.
This had me thinking about an extensive discussion I had six years ago with someone from Ethiopia about the fact that they did not have time zones but they had a different notion that they used to describe time that amounted to something wi8th many of the same effects related to how hey thought of time compared to when the sun was up (given that Ethiopia is reportedly the hottest place in the world year round I can easily imagine they would have such a mechanism!).
Maybe I'll ask Scott Hanselman if he has any thoughts about that issue.
And now I am wondering how much of the data in our locales is trying to map what people want on an architecture imperfect to representing what people use -- causing our locales to kind of "speak with an accent" the way as person might speak with an accent because he is using the phonemes he grew up with while speaking a language with different phonemes....
It's an old joke, but perhaps a few if you haven't heard it before....
An English professor wrote the words:
A woman without her man is nothing.
on the chalkboard and asked his students to punctuate it correctly.
All of the males in the class wrote:
A woman, without her man, is nothing.
All of the females in the class wrote:
A woman: without her, man is nothing.
One thing is perfectly clear, though:
Punctuation is powerful.
Perhaps it is a bit hasty to pass that NORM_IGNORESYMBOLS flag?
Though on the other hand if I pass NORM_IGNORESYMBOLS | NORM_IGNORECASE it has the benefit of allowing me to tell all of the kids in the class that they are wrong, and flunking them.
Just don't forget that the space is also a symbol.
awomanwithouthermanisnothing
Welcome to Thailand, everyone! We didn't need the spaces anyway....
You get my point. Everyone is always so quick to ignore stuff that may be interesting or important. We could all do with a little bit less of that.
It's one of the reasons the support for collation in Windows is so unsuited for search - because its only choices are to only see with the distinctions or to ignore all of them so completely that a user is almost punished for when they are specifically looking for the distinctions. When what search really needs is both -- to be willing to ignore distinctions of all sorts but to never forget they are there, and to prefer them when you see them....
I had someone ask me
I don’t have a big picture on the whole process of enabling a new language, even though I know a little here or there. For my learning purpose, do you have something that I can read to teach myself?
Now there really isn't an explicit single place that I know of where such info is kept, so I took a similar item on my "blog request list" and promoted it to right now so you can read the response right here. :-)
Now these steps are going to be described in a narrative since that is how my blogs often work, but the actual process is done by different people and the order often reflects the built-in multitasking that any multi-person project can bring to the mix. So don't think of this blog as providing an ordered recipe or directions.
Here we go!
STEP ONE: The Reading
The most basic level of enablement is the display of text, which means a font that has the glyphs for the language's letters in it.
I wouldn't really claim that the language was fully enabled or anything, but if I can read documents in it when I explicitly choose that font then it is a good first step.
And this step enables the reading, which is great. But the next step is another crucial one on the way to full enablement:
STEP TWO: The Writing
Put simply, there needs to be a keyboard or an IME.
Other methods exist like handwriting recognition and speech recognition, but those tend to show up much later in the lifetime of as language's support in computers. So for present purposes we can assume a keyboard or an IME.
The quality of the input method is one that I would usually make on a more global basis (since if it is a part of Windows it is available for use to a frightening number of people), but for the purposes of this blog on language enablement, I'll say that the perceived quality of an input method to an individual customer is directly proportional to how easy it is that they find it to use.
I'll get into that issue further another day, I just mention it here so people can keep in mind how much the fundamental process of language enablement is sabatoged at its root if any of these first three steps is mesed up.
Of course it is also worth noting that these first two steps can be done by anyone, without even getting real help from Windows. Microsoft and many third parties have provided tools to help woth both fonts and keyboards.
But in the context of Microsoft being the one doing the enabling, we should start talking about the things Microsoft can do for enabling a language that goes beyond these things.
STEP THREE: Underlying Rendering Support
Even that basic display needs a lot more behind it to do seemingly simple scenarios automatcally like
So the proper rendering support via Uniscribe and DWrite and in some cases GDI font linking is important. The only thing cooler for language display than having a good font is not having to choose that font explicitly, so skipping this third step is ill advised.
Obviously there are other little items in this step like adding the Unicode character names to Character Map that don't affect rendering per se but definitely make working with fonts and characters easier.
Additionally, once the next two steps are done, additional rendering support can be expanded to handle features like digit substitution or font linking based on system locale or writing system differences implicit in different locales, and so on. All of the various pieces have to be in place for these last few fancy items to work properly, no matter how much of the work is done earlier in preparation for when the step appears to people using the system.
STEP FOUR: Underlying Script Support in NLS
This step may already have been done ages ago if the script was already supported and all the requisite characters have their properties in the OS tables, but often times new languages that Windows has never supported might require whole scripts or specific individual characters that have just been added to Unicode to be added to the system as well.
It is easy, but important, to have this support.
STEP FIVE: Underlying locale support in NLS
I have a colleague who is grimacing at the way I have added "locales" to a discussion of "language enablement" but the next part of the enablement process involves sorting and date formats and calendars and language names, and so on. And all of these items are stored on a per locale basis. So that person is likely just going to have to suck it up and get over it
STEP SIX: The Localization
Now this step has many substeps within it, but for the moment we'll treat it as one big chunk.
Once of this support is there, people can start seeing the user interface itself making use of the enabled language!
Ok, so there we go.
Now looking at the steps I gave:
At which point would you declare a language to be enabled?
There are several teams that consider enablement to happen once their work is done, especially when other steps aren't planned.
In most cases Microsoft in general won't claim they have enabled a language unless a supportable chunk of steps 1-5 are present.
But how much support is relative: not every language is intended to go through every step.
There are even languages that were originally intended to go through the whole series of steps that ran into problems along the way; at that point all of the support can be yanked out but in many cases the partial support will be left in and shored up so that proper support is what will be seen.
If you look at Windows you can probably find languages essentially stopped at each of these steps.
In fact, anyone who can name one language that only goes so far as each step will win the prize today!
Allow me to intentionally misquote a West Wing episode I enjoyed:
Every once in a while, every once in a while, there's a bug report with an absolute right and an absolute wrong, but those reports almost always include blue screens of death. Other than that, there aren't very many un-nuanced bug reports in writing a blog that's way too big for ten words. I'm the author of Sorting it all Out, not the author of the readers who agree with me.
That was fun. :-)
It is true that I often start blogs with "simple questions" which turn out to have complicated answers.
And when the answers are simple they usually are simple in a bad way: like the word NO.
This is one of those blogs....
The question, as you have probably gathered, was simple:
Customer wants to automatically use UTF-8 when saving files with Notepad instead of ANSI by default.
The answer is indeed that no, it isn't possible. This default is hard-coded into Notepad.
They made the decision in 1993 when Notepad was added to NT 3.1, and have stuck to their guns -- even after UTF-8 support was added in 1998-1999.
Sorry.
Now as a workround, you could try the following:
But in the end, there is no way to keep Chloe on the Wolves' Highway. That might be why she was shot and killed by ranchers.
And why users will seldom follow the directions here, either....
International domain names -- one of those times that we really are all in this together, a time that "I don't have time to fix this" really isn't a good answer.
I figured I should talk about that for a bit....
So anyway, the question I got from a rather anxious developer via email the other day was:
I have a lot of code that depends on functions like getaddrinfo, getnameinfo, gethostbyname, and gethostbyaddr. How do I get them to support internationalized domain names?
The answer is both simple and complicated.
Complicated because the answer could (in theory) very different depending on whether the server is on the intranet (where one would use UTF-8) or the Internet (where one would use Punycode).
And complicated because there isn't a whole lot of infrastructure to have the system figure out which is which and which to use in native code (the managed story is a little better here but it has its own pitfalls; I will cover those another day).
For now I'll just talk about the intranet story (the Internet story will be for another another day).
The most important step, one that is pretty much universally a good design practice for many reasons but especially here is to move off the non-Unicode functions like the ones our anxious developer named. If one has anything outside of ANSI (or even ASCII in some cases), the Unicode (or UTF-8) version are required here, as the following table points out:
Function you should be using instead
DnsQuery_W (or DnsQuery_UTF8)
DnsValidateName_A
DnsValidateName_W (or DnsValidateName_UTF8)
DnsNameCompare_A
DnsNameCompare_W (or DnsNameCompare_UTF8)
DnsHostnameToComputerNameA
DnsHostnameToComputerNameW
GetAddrInfoA
getnameinfo
GetNameInfoW
GetNameInfoA
GetAddrInfoExA
GetAddrInfoExW
gethostbyname
GetAddrInfoW
gethostbyaddr
WSAAsyncGetHostByName
WSAAsyncGetHostByAddr
WSALookupServiceBeginA
Now as luck would have it, deciding whether to use the "W" version of the function or the UTF-8 version (for the functions that support both) is pretty simple -- just use whichever format you have the text in already.
And as further luck would have it, for just about all of the functions on this list, the replacement is easy and straightforward for the call itself. Of course you may need to move the code to use Unicode, and it's important to not just convert it from the CP_ACP or whatnot (otherwise you haven't really fixed anything!, but that's not too bad.
You can think of this first step as the most obvious part of all of the work. I'll get into some of the more complicated aspects in the future, with maybe some additional fun details related to Active Directory to make things really interesting (that will be on yet another another day -- or with a topic like AD more than one other day!).
Now once you start getting into the EAI side (i.e. the email side) it gets both insanely simple and insanely complicated too. But eventually, on some other another other day (once again multiple other days, most likely), I'll hit that topic too.
The truism that explaining a joke allows you to get though it doesn't give you the funny has a lot of truth to it.
In this case, if you didn't watch M*A*S*H you may have trouble discerning what is behind the riddle of the title of this blog.
It involved an episode where Margaret "Hot-Lips" Hoolihan, while drunk, reveals:
I probably shouldn't be telling you this, but Frank Burns is a lipless wonder.
And as with all jokes that must be explained, the people who now get it don't think it is very funny.
You can think of the issue as an extension of ELKs aren't roaming where the servers are from over five years ago.
Because although it is true that ELKs -- the infrastructure data added to XP to support the creation of LIPs -- was [mostly] added to Windows Server 2003 (I gave the full list in Why Bengali keyboards can't be found on XP 64 bit), the truth is that all of this ELK insertion work into what essentially represented the server code base, cool as it was, did represent a specific type of ELK, one not found in nature.
A LIP-less ELK.
We don't release LIPs for Windows server products.
We get questions on this all the time, questions like this one:
Good morning all,
My customer has the following question about a language pack for Windows server 2008 R2 that does not appear on the list of supported language pack in the technet link below. Is this list authoritative or is there another way I can find this language pack for them. Any assistance would be greatly appreciated.
I am having no luck finding an o.s. language pack for Windows Server 2008 R2 Hindi language. Am I overlooking something? I also looked out here and it doesn’t look like there is one for Window Server 2008 R2 operating system: http://technet.microsoft.com/en-us/library/dd744369(WS.10).aspx
Regards
This person got a quick answer, even if it was not the answer being sought:
WS08R2 didn’t release a LP in Hindi.
Here are the languages we did release LPs for:http://www.microsoft.com/downloads/details.aspx?FamilyId=03831393-eef7-48a5-a69f-0ce72b883df2&displaylang=en
English, German, Japanese, French, Spanish, Chinese Simplified, Chinese Traditional, Korean, Portuguese (Brazil), Russian, Portuguese (Portugal), Dutch, Swedish, Polish, Turkish, Czech, Hungarian, Arabic, Danish, Norwegian, Finnish, Hebrew, Greek, Thai, Ukrainian, Romanian, Slovakian, Slovenian, Croatian, Serbian Latin, Bulgarian, Lithuanian, Latvian, Estonian
Now the answer was slightly off target since it was about Language Packs, not LIPs. But as we know everyone makes that mistake (for more info on that issue, see When terminology affects satisfaction and Out of touch? No, just out of scope...), so I won't dwell on that part, so much. But the truth is that for those LP languages, server resources are translated.
Of course, the truth is that from a customer standpoint, this separate treatment of client and server is one that is seldom fathomed. It appears to be arbitrary and pointless, especially in a world where there are people who tend to use the server product as their client.
At one time, the standard developer desktop for Office developers was indeed Windows Server 2003, a fact that at one point got in the way of the ability of Office to work with ELKs and try out LIPs. I have no idea if an updated policy like this still exists over in Office, but either way it serves to underscore the fact that this is a real phenomenon, even inside Microsoft. And one that can affect job productivity, no less!
Now there are many problems with the notion of a LIP on a server.
First of all, a principle behind LIPs of "translating the most visible UI" fails in interesting ways since many visible UI pieces that are server-specific aren't translated.
And then there is the scenario.
If you don't accept the "server as client" scenario's premise, then it is quite easy to consider the principle of making computers more available to people who don't speak one of the major languages we handle fully (like English or German or Japanese) to be way out of scope for the server family of products.
But maybe it is an incorrect assessment to deny the premise.
i mean, I often run the server as a client because I don't tend to go for the fancy client features like Aero and glass and UI transitions, and those features are off by default in server. So it takes less time to get the machine in the state I want it if I start from sever.
I don't know for sure but that probably biases me on the issue. Does anyone else have n opinion about the "running the server as a client" scenario?
In many cases it isn't the question that is complicated; it is the impact of surrounding features that make the answers so complicated!
A while back, the question that was asked was:
Hello,
I’m trying to get the file version for mshtml.dll at runtime, when I call GetFileVersionInfo (from shell\osshell\version\filever.cpp) with either the full path (C:\windows\system32\mshtml.dll) or just the binary name, it always gives me: 8.00.7600.16385 (win7_rtm.090713-1255) – the IE8 RTM version, regardless of which TP (test pass, think of it as a service pack for IE) is on the machine.
According to the filever tool, the version I actually have on my machine is:
>filever /v C:\Windows\System32\mshtml.dll<snip> FileVersion 8.00.7600.16625 (win7_gdr.100629-1617)
How can I get GetFileVersionInfo give me this version?
Thanks!
Okay, the way the question was asked was complicated the time, too. :-)
But at its simplest level the question was just "how do I get the version number?" since the wrong answer was (apparently) being returned.
Anyone want to take a guess as to what might be going on, what might cause a[n apparent] lie to be told here?
Hint: Ask yourself why I might care about the answer here as a way to figure out what the answer might be....