Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
It is hard to believe it was a year ago today that I asked What's up with handicapped parking in WA state?
(One misunderstanding I want to clear up about that post -- my hope that those folks parking in the spaces being contractors was not due to any bias against contractors; it was my hope that people who park like jerks won't be here long. Lots of contract resources at Micrsoft are actually quite awesome and I am very hopeful it was a very small minority of them who are insensitive jerks!)
Anyway, today I was reminded about this post when I was pointed to the following article: Citizen crusader against scofflaws.
She is definitely a couple of levels above me in that she is actually doing more about the problem than just pointing it out -- she is getting these people ticketed.
The interesting part of the article for me:
For her efforts, she's been cursed and spat upon. Rowan University banned her from the campus. The public schools have forbidden her to step foot on their property during school hours.
Let me get this straight -- a university which has the responsibility to provide these spaces for people who have a medical need for them and which is clearly having trouble keeping up with the folks flouting the law is addressing the problem by banning from campus the person doing their job?
Hmmm.... you won't ever see me at Rowan University.
And school systems can't handle getting people to legally park?
I used to end up going to pick up my ex-fiance's daughter at school, and I have to tell you there is no way I could have managed it without that handicapped space. If someone were illegally parked there they'd be lucky if I just left them a note -- if I were in a bad mood I'd have called the cops, and if I were in a worse mood I'd just key their freaking car (luckily no one was ever parked there illegally!).
Anyway, do me a favor -- don't park in a handicapped space if you do not have the state-provided credentials.
And if you can't do that and feel the need to be dishonest here, then show the people, or the person, trying to keep you honest the appropriate amount of respect.
Or, in short, if you are being a jerk you should cut it out.
Mary Ann Cotrell is fighting the good fight.
Yes, according to the Slut-O-Meter, 'collation' has a 57.08% sluttiness rating.
And that is before THIS post makes it into the index; it may be worse soon....
Sorting out promiscuity, I guess.
Thanks (I think) to Mark Liberman for letting me know about this one? :-)
Earlier today, in reponse to my post [Localized] Date/Time format tokens, regular reader Serge Wautier commented:
Oddly enough, according to your screenshots, the Turkish translation seems to use the same letter (s) for hours and seconds!
Let's take a closer look:
Geez, he's right.
It gets worse if you look at all of the choices one has for date formats:
Why that is downright confusable, isn't it?
Hmmm... makes me wonder what happens if we look at the Turkish locale in Turkish (the above is looking at English (US) in Turkish):
Ok, that is a bit less confusable, isn't it? :-)
It also explains why a localizer may not notice the difference when they are reviewing what their change looks like -- how likely would they be to check other locales? And the wide range of them?
This kind of problem is a side effect of the wide range of possibilities that is hard to completely test. For another example, here is the Arabic user locale list -- note the appropriate use of parentheses:
Ok, now let's move over to the UI language list:
How do you say Oops! in Arabic again? :-)
Now the original issue with Turkish can certainly complicate string parsing, and if you try to change the order of the time parameters you will likely run into problems (luckily this is very seldom done!).
Perhaps it is a good thing that this is only a UI feature and not a programmatic one?
This post brought to you by "s" (U+0073, a.k.a. LATIN SMALL LETTER S)
When functions like GetDateFormat and GetTimeFormat work with format picture strings, there is a very limited set of tokens that is considered legal:
Picture Meaning d Day of month as digits with no leading zero for single-digit days. dd Day of month as digits with leading zero for single-digit days. ddd Day of week as a three-letter abbreviation. The function uses the LOCALE_SABBREVDAYNAME value associated with the specified locale. dddd Day of week as its full name. The function uses the LOCALE_SDAYNAME value associated with the specified locale. M Month as digits with no leading zero for single-digit months. MM Month as digits with leading zero for single-digit months. MMM Month as a three-letter abbreviation. The function uses the LOCALE_SABBREVMONTHNAME value associated with the specified locale. MMMM Month as its full name. The function uses the LOCALE_SMONTHNAME value associated with the specified locale. y Year as last two digits, but with no leading zero for years less than 10. yy Year as last two digits, but with leading zero for years less than 10. yyyy Year represented by full four or five digits, depending on the calendar used. Thai Buddhist and Korean calendars both have five digit years. The "yyyy" pattern will show five digits for these two calendars, and four digits for all other supported calendars. yyyyy Behaves identically to "yyyy". gg Period/era string. The function uses the CAL_SERASTRING value associated with the specified locale. This element is ignored if the date to be formatted does not have an associated era or period string.
Picture Meaning h Hours with no leading zero for single-digit hours; 12-hour clock. hh Hours with leading zero for single-digit hours; 12-hour clock. H Hours with no leading zero for single-digit hours; 24-hour clock. HH Hours with leading zero for single-digit hours; 24-hour clock. m Minutes with no leading zero for single-digit minutes. mm Minutes with leading zero for single-digit minutes. s Seconds with no leading zero for single-digit seconds. ss Seconds with leading zero for single-digit seconds. t One character time-marker string, such as A or P. tt Multicharacter time-marker string, such as AM or PM.
And it is easy enough to see them in Regional and Language Options:
Of course, if you have ever worked with a language version of Windows other than English, you may know that is only part of the story.
Many other language versions of Windows will have different letters defined for the format strings according to Regional and Language Options (for examples you can look here).
But note that this is not locale data, and there is no way to query GetLocaleInfo for what tags are being used in Regional and Language Options.
Now trial and error with the LoadString function being called on the Regional Options binary (intl.cpl) finds the tags rather easily:
Although this has not changed since the functionality was added to Windows, there is of course no guarantee that this will be true of future versions. Just in case you wnted to start using it, be sure to keep that fact in mind. :-)
Now it is hard to imagine the functionality going away since it would break all of the users who have seeing those localized format strings for so long.
And it is perhaps even harder to imagine the format strings returned by GetLocaleInfo or consumed by GetDateFormat and GetTimeFormat changing either, since this would break all of the existing applications.
The "market" for using these localized format pictures is also pretty limited -- basically people writing Regional Options replacements. And how many people are really doing that, anyway?
The one time such a feature might be interesting is in a custom UI language, but at the point where such a thing as that is supported, these strings in Regional Options will of course be availabl to localizers just as they are today (hopefully with some instructions on how to translate these particular strings!).
Of course, not all languages would translate these particular token strings, though it is completely understandable why the English strings may be too confusing for some languages.
But this is one of those interesting features that is well known to many people who use the localized versions of Windows that no one else really knows about....
This post brought to you by "d" (U+0064, a.k.a LATIN SMALL LETTER D)
Yesterday, someone named Mike asked me, via the Contacting Michael... link:
Hey there, I have a problem. I have a program that uses EM_SETCUEBANNER to add grey hint text to an edit control, something like "Enter search here". After installing east asian language support however, this has stopped working. I'm doing a presentation in a week where I will need to specifically point out the cue banner, and show japanese text working... Any help? Thanks
Hey there, I have a problem. I have a program that uses EM_SETCUEBANNER to add grey hint text to an edit control, something like "Enter search here".
After installing east asian language support however, this has stopped working. I'm doing a presentation in a week where I will need to specifically point out the cue banner, and show japanese text working...
Any help?
Thanks
Now as I point out in that contact link, I really am not Product Support, and I don't want people to think that they are going to get timely answers to support questions by using that link. If I posted stats on the number of them that I don't respond to you might think me heartless (unless you read them, in whuch case you would understand why I am not really going to be able to help people fix their Win98 installs or their wireless support).
In this particular case, the fact that it is an issue that affects internationalization and may well be of general interest to a whole bunch of people has kind of pushed me to make an exception....
I built a small test application to confirm that the issue does indeed exist, and being the impatient sort who realized that in Vista when international support is always enabled that the complete loss of the EM_SETCUEBANNER message would kind of suck, I took my small test application and ran it on Vista, a little bit scared....
Luckily, the problem has been fixed in Vista.
Unfortunately, there really is no workaround at the moment, in the shipping versions of Windows. The choice has to be explicitly made between language support and cue banners.
Which kind of explains the other reason that I made an exception in this case -- that communal sense of guilt that Microsoft employees sometimes feel when well-intentioned features conflict and cause a quite unintentional bug. Even when it was caused, discovered, found, and fixed before it was even known to such an employee....
Sorry about that, Mike. Even if something can be done here eventually, there is no way it can happen by next week. :-(
This post brought to you by "ጯ" (U+132f, ETHIOPIC SYLLABLE CHWA)
It was over a year ago that I pointed out in the post Keyboards: hardware vs. software how disconnected our team (which owns most of the keyboard layouts) and the hardware team (which owns most of the actual keyboard hardware) were.
And how impressive it was that we managed to be in sync so often, given that disconnect.
But it is possible I may live the rest of my life without being able to understand why almost every keyboard layout has a key which, when typed, will produce | (U+007c, a.k.a. VERTICAL LINE) yet printed upon the face of the key is ¦ (U+00a6, a.k.a. BROKEN BAR).
What's up with that?
It turns out that every single byte code page other 874 of the Windows code pages supports U+00a6, and every single Windows code page bar none (pardon the pun) supports U+007c.
And just about every font that has one has the other.
Even though in most cases (to get back to keyboards) almost every keyboard prints one on theface of a key but the matching layout has the other input.
So why this disconnect?
And more importantly, why does it persist?
And most important of all, why don't people complain? In either direction?
I suspect it is because no one really cares.
Or maybe is just that two guys can walk into a bar. Even if it looks like it is broken. Since it turns out they may still be serving drinks....
This post brought to you by "|" (U+007c, a.k.a. VERTICAL LINE)
Do you know what UNICIDE is?
A) A typo, with the actual word that was supposed by there being Unicode.
B) Clubbing someone to death with a unicycle.
C) A new cleaning product that will change your life.
D) Stabbing someone to death with the sharp edge of a Unicode character.
E) None of the above.
This post brought to you by "†" (U+2020, a.k.a. DAGGER)
Earlier today in the post On an upgrade, we maintain, I talked about how we have a very strong desire to retain a user's locale preferences on upgrade.
It may just be that extra three inches from the ground that would make people feel a little uncomfortable about (see the post for more on that reference!), but we really want to keep the results as consistent as possible.
But at the close of the post I said there was an exception to this rule.
That exception?
People made many guesses and one even came close, but no one guessed it.
So I thought we could try and reason it out. And I will use The Poor Man Institute's approach to using SCIENCE to do so, described as follows:
SCIENCE is nothing to be afraid of - it is merely a method of inquiry which makes use of empirical data about the world and fits it into an abstract, predictive model. For example, suppose you ask me the question: “what is the volume of an average human being?” This is a very stupid and pointless question, exactly the sort of question I would expect someone like you would ask. Why do you care? If I refuse to answer your question, you may become violent, so I will attempt to do so, quickly, by making a few simplifying approximations. First, in order to make the math simpler, I will assume that the average person is a uniform sphere, 3 feet in diameter. Why, when I look at the problem that way, it turns out that I’m really quite extraordinarily tall and svelte! Indeed, I’m far too attractive a physical specimen to have to answer your damn fool questions, so I roll you out the door like a beachball full of cottage cheese and have the chicks from “Coyote Ugly” over for a week-long orgy. All thanks to SCIENCE!
So, by using these powerful analytic tools, let's see what we can find out....
Think for a moment about what are the types of problems that have really galvanized people's willingness to ignore their usual tendencies. I can think of three:
1) SECURITY: Obviously if there were some sort of security concern, there might be a reason to make a specific targeted change. There were, however, none of these related to locale settings. So that would not be it.
2) TERRORISM: Obviously the terrible events of 9/11 have been the source of many changes in the way people live their lives. However, the implied timeline of change would really predate the exception. So that would not be it, either.
3) Y2K: The average person probably thinks of this whole thing as the biggest wet firecracker in history, though even from my relatively uninvolved position I was indirectly to tests which, had they not been run in order to find problems, would have resulted in disastrous situations. Just as Jonah was hated for hie false prophecy (even though it was his warning that caused the people to clean up their act and thus be able to defeat the prophecy!), many consultants were not given the thanks they deserved.
(Of course many other bilked their clients and used scare tactics to make heavy profits, so it all evens out, I guess!)
In any case, it was indeed the Y2K issue that led to the exception being made. The short date format was automatically updated to use a four digit year on upgrade (using the special /U switch to intl.cpl's unattend format), as follows:
rundll32.exe shell32,Control_RunDLL intl.cpl,,/U
The general feeling of unease simply seemed to make such a change feel worthwhile as the year 2000 loomed near and a version of Windows was about to ship....
So there you have it -- the exception that proces the rule, worked out with the help of SCIENCE! :-)
This post brought to you by "U" (U+ff35, FULLWIDTH LATIN CAPITAL LETTER U)
It may seem to you like deja vu all over again, But I am going to once again quote a bit of Mostly Harmless by Douglas Adams, just as I did in this post:
Ford Prefect hit the ground running. The ground was about three inches farther from the ventilation shaft than he remmbered it, so he misjudged the point at which he would hit the ground, started running too soon, stumbled awkwardly and twisted his ankle. Damn! He ran off down the corridor anyway, hobbling slightly. All over the building, alarms were erupting into their usual frenzy of excitement. He dove for cover behind the usual storage cabinets, glanced around to check that he was unseen and started rapidly to fish around inside his satchel for the usual things he needed. His ankle, unusually, was hurting like hell. The ground was not only three inches farther from the ventilation shaft than he remembered it, it was also on a different planet that he remembered, but it was the three inches that caught him by surprise. The offices of the Hitchhiker's Guide to the Galaxy were quite often shifted at very short notice to another planet, for reasons of local climate, local hostility, power bills or taxes, but they were always reconstructed exactly the same way, almost to the very molecule. For many of the company's executives, the layout of their offices represented the only constant they knew in a severely distorted personal universe. Something, though, was odd. This was not in itself surprising, thought Ford as he pulled out his lightweight throwing towel. Virtually everything in his life was, to a greater of lesser extent, odd. It was just that this was odd in a slightly different way than he was used to things being odd, which was, well, strange. He couldn't quite get it into focus immediately.
Ford Prefect hit the ground running. The ground was about three inches farther from the ventilation shaft than he remmbered it, so he misjudged the point at which he would hit the ground, started running too soon, stumbled awkwardly and twisted his ankle. Damn! He ran off down the corridor anyway, hobbling slightly.
All over the building, alarms were erupting into their usual frenzy of excitement. He dove for cover behind the usual storage cabinets, glanced around to check that he was unseen and started rapidly to fish around inside his satchel for the usual things he needed.
His ankle, unusually, was hurting like hell.
The ground was not only three inches farther from the ventilation shaft than he remembered it, it was also on a different planet that he remembered, but it was the three inches that caught him by surprise. The offices of the Hitchhiker's Guide to the Galaxy were quite often shifted at very short notice to another planet, for reasons of local climate, local hostility, power bills or taxes, but they were always reconstructed exactly the same way, almost to the very molecule. For many of the company's executives, the layout of their offices represented the only constant they knew in a severely distorted personal universe.
Something, though, was odd.
This was not in itself surprising, thought Ford as he pulled out his lightweight throwing towel. Virtually everything in his life was, to a greater of lesser extent, odd. It was just that this was odd in a slightly different way than he was used to things being odd, which was, well, strange. He couldn't quite get it into focus immediately.
Upgrading Windows is a big deal.
I mean, an operating system is not the sort of thing that most people pay a lot of attention to, it is a foundation piece.
Buying a new copy of Windows and then installing it on a machine you already have can be a traumatic experience.
So one decision that was made a long time ago in Windows is that large parts of what is stored in the HKEY_CURRENT_USER section of the registry (which represents the user's settings) would survive the otherwise traumatic process of the upgrade.
Of course the user locale settings make up a small part of that preserved section.
Perhaps it is something that some people would not notice, especially in a situation where the whole world is changing that way. But in a way I guess we'd like to feel that if we did fail to preserve those setting that it could be a bit like Ford Prefect's situation with the unexpected location of the ground....
If you know what I mean? :-)
(Ok, now for a little Windows trivia!)
Of course there is one exception to this principle, one time that we did make a change to existing settings. Does anyone know what that one time is?
This post brought to you by "∂" (U+2202, a.k.a. PARTIAL DIFFERENTIAL)
Well, at the very least keep MSLU apps out of the VMWare shared folders? :-)
The other day, Brian asked in the microsoft.public.platformsdk.mslayerforunicode newsgroup:
I have a strange situation, seemingly involving MSLU on VMWare. First, let me say that I have a fully unicode enabled program, using MSLU to run on 95, 98, and NT. It works well on all platforms, including Japanese, Chinese, and Korean 98, ME, and all versions of 2k and XP. I can display unicode characters in titles, menus, dialogs, and everywhere else needed, so I believe my application is built and linked correctly. Now, with that said, here is my problem... I am running a Windows XP VM as a guest OS in VMWare 5.5, with XP also as the host OS. My app runs fine if I run it from the desktop or a local folder in VMWare, but if I run my app from a shared folder in VMWare, I get garbled strings. It took me a while, but I've traced this down to MSLU. It seems that when running under a shared folder, my LoadUnicowsProc() function is called, the unicows.dll is loaded, and from then on, MSLU is trying to translate strings, with disasterous results. Symptoms range from garbage in strings to truncated window titles similer to one reported in this post: http://groups.google.com/group/microsoft.public.platformsdk.mslayerforunicode/browse_thread/thread/97e1f5b8fe16617d/246a7f1e2e2cc65a?q=vmware&rnum=2#246a7f1e2e2cc65a My understanding is that Unicows DLL should never have to be loaded on XP. I wonder what the difference is here? I use VMWare all the time, with excellent results even on Win95, 98, ME. Without the source to the MSLU Loader, it's a bit difficult to figure out why it would be trying to load the DLL in this situation. Does anyone (MichKa) have even Pseudo-code for what the loader is looking for before loading the DLL? I imagine it's at least calling GetVersion() or GetVersionEx(), both of which seem to be returning the proper values. The problem goes away when I do the following: Copy the exe from the shared folder to the local machine. Simply remove Unicows.lib from the link list and recompile I've even replicated the problem in a scaled-down program containing only a WinMain. Let me know if you're interested in seeing it. Any suggestions would be appreciated.
I have a strange situation, seemingly involving MSLU on VMWare.
First, let me say that I have a fully unicode enabled program, using MSLU to run on 95, 98, and NT. It works well on all platforms, including Japanese, Chinese, and Korean 98, ME, and all versions of 2k and XP. I can display unicode characters in titles, menus, dialogs, and everywhere else needed, so I believe my application is built and linked correctly.
Now, with that said, here is my problem...
I am running a Windows XP VM as a guest OS in VMWare 5.5, with XP also as the host OS. My app runs fine if I run it from the desktop or a local folder in VMWare, but if I run my app from a shared folder in VMWare, I get garbled strings. It took me a while, but I've traced this down to MSLU. It seems that when running under a shared folder, my LoadUnicowsProc() function is called, the unicows.dll is loaded, and from then on, MSLU is trying to translate strings, with disasterous results. Symptoms range from garbage in strings to truncated window titles similer to one reported in this post:
http://groups.google.com/group/microsoft.public.platformsdk.mslayerforunicode/browse_thread/thread/97e1f5b8fe16617d/246a7f1e2e2cc65a?q=vmware&rnum=2#246a7f1e2e2cc65a
My understanding is that Unicows DLL should never have to be loaded on XP. I wonder what the difference is here? I use VMWare all the time, with excellent results even on Win95, 98, ME. Without the source to the MSLU Loader, it's a bit difficult to figure out why it would be trying to load the DLL in this situation.
Does anyone (MichKa) have even Pseudo-code for what the loader is looking for before loading the DLL? I imagine it's at least calling GetVersion() or GetVersionEx(), both of which seem to be returning the proper values.
The problem goes away when I do the following:
I've even replicated the problem in a scaled-down program containing only a WinMain. Let me know if you're interested in seeing it.
Any suggestions would be appreciated.
My response was of course grounded in Tester's Axiom #1, and said:
VMWare compatibility testing is not a scenario we ever covered, so of course there could be bugs (and it appears there are?). The workaround -- keep apps out of the shared folder....
VMWare compatibility testing is not a scenario we ever covered, so of course there could be bugs (and it appears there are?).
The workaround -- keep apps out of the shared folder....
Brian did then respond back with more information, including info from the VMWare folks:
That would be fine, as my program is normally installed under "program files", except that the same problem occurs when a document is opened (by double-clicking in explorer) from the shared folder. The application, because of it's association with the document extension, starts up but its working directory is the shared folder and the problem still exists. So, now the workaround is: don't put programs or documents in the shared folder. So, what's the shared folder for anyway!! I brought this up here because it is unclear to me which piece is broken. There's obviously an undesirable interaction between VMWare and MSLU, and considering the popularity of both, it seems someone would be interested in fixing this. I just don't know who. I'm not sure how interested Microsoft would be to work with VMWare on a resolution. Looking at the VMWare forums, there are loads of similar issues, with symptoms in other apps that are identical to my own problems (running programs, opening documents). However, nowhere did I see anyone else track it down to an interaction with MSLU. I opened the same issue with VMWare and got this response: ----------------------------------------------------------------Dear Brian,We are currently tracking this issue in Vmware Bug # 60617. I have added your comments to the bug and we will be in touch with you if we need any additional information. Unfortunately there is no current workaround or fix for this issue except to avoid using the shared folder as a working directory. You will be notified if there are any updates or workarounds for this issue.---------------------------------------------------------------- Anyway, I just thought I'd bring it up. I know MSLU does some tricky stuff behind the scenes. Obviously, something it is doing is giving VMWare fits.
That would be fine, as my program is normally installed under "program files", except that the same problem occurs when a document is opened (by double-clicking in explorer) from the shared folder. The application, because of it's association with the document extension, starts up but its working directory is the shared folder and the problem still exists.
So, now the workaround is: don't put programs or documents in the shared folder. So, what's the shared folder for anyway!!
I brought this up here because it is unclear to me which piece is broken. There's obviously an undesirable interaction between VMWare and MSLU, and considering the popularity of both, it seems someone would be interested in fixing this. I just don't know who. I'm not sure how interested Microsoft would be to work with VMWare on a resolution. Looking at the VMWare forums, there are loads of similar issues, with symptoms in other apps that are identical to my own problems (running programs, opening documents). However, nowhere did I see anyone else track it down to an interaction with MSLU.
I opened the same issue with VMWare and got this response:
----------------------------------------------------------------Dear Brian,We are currently tracking this issue in Vmware Bug # 60617. I have added your comments to the bug and we will be in touch with you if we need any additional information. Unfortunately there is no current workaround or fix for this issue except to avoid using the shared folder as a working directory. You will be notified if there are any updates or workarounds for this issue.----------------------------------------------------------------
Anyway, I just thought I'd bring it up. I know MSLU does some tricky stuff behind the scenes. Obviously, something it is doing is giving VMWare fits.
Now we actually went through a few iterations on methods to do this detection, which happens in both unicows.lib (the MSLU loader, used by C/C++ applications) and unicows.dll (MSLU, used directly by applications like VB that cannot use the loader).
(In this case, Brian is using an MSLU loader override, which means he is using unicows.lib. Getting the latest version of the .LIB may be helpful here, but of course knowing which version of the .LIB is being used is also a good idea)
There is no compatibility issue where a mismatch between .LIB version and .DLL version could cause problems, other than if one of the bugs that caused us to change methods came up, but that is kind of expected, I think. There is a benfit to using the latest version, as usual.
We also has the benefit of being produced by the Windows team within the Windows source tree, which gave us a lot of chances to see all of the good and bad ways that people tried to do this very thing in their own programs, both inside the Windows source and from many reports of applications inside and outside of MS....
Anyway, some more detail about MSLU....
A long time ago, MSLU used GetVersion to do its version checking. However, the ability added to XP that allows you to run an application as if it were a different version was able to raise a pretty huge set of problems.
So it became important to look for a way to detect the version that was not quite so subject to being changed by the whim of someone intentionally asking the OS to lie about its own version.
After all, only MSLU should be allowed to lie for MSLU apps, right? :-)
At this point, the Microsoft Layer for Unicode is itself in maintenance mode. So there is not a high chance that the .DLL will be able to be modified to fix such a problem. There is a bit more flexibility on the .LIB file, but more info would be needed from VMWare to be able to consider proceeding in such a direction....
This post brought to you by "±" (U+00b1, a.k.a. PLUS-MINUS SIGN)
I have definitely talked about digit substitution many times since I started with this blog.
And then I posted about my disillunsioned realization that Uniscribe was simply not doing as much as it could in the post Digits -- there is no substitute.
Unfortunately, it gets worse.
The ScriptRecordDigitSubstitution function, while seeming perfectly innocent and useful, has an interesting note in its Platform SDK topic:
Note that context digit substitution is supported only in Arabic and Persian locales. In other locales, context digit substitution is mapped to no substitution.
You may think when you see this text that it was euphamistically talking about all Arabic script locales.
I could make fun of your idealistic misperception, but it's what I thought too, so that would not be in my best interests. Instead I'll just commiserate with you for a bit. :-)
That is right, the ScriptRecordDigitSubstitution function is using ConvertDefaultLocale (that 'internal' function I have talked about before) and then getting the PRIMARYLANGID from the result.
And then it uses whether that little dance returns LANG_ARABIC or LANG_FARSI to know whether to support 'context' style digit substitution (it also puts that value into SCRIPT_DIGITSUBSTITUTE.TraditionalDigitLanguage for your viewing pleasure).
This is really kind of disappointing; just as with that post that ruined my idealistic view of digit substitution intially, Uniscribe is potentially ignoring a user preference -- like if I wanted Thai digits and context style substitution, shouldn't I be able to have that?
To look at from a more "glass is half full" point of view for a moment, there is definitely a lot of room for improvement here in the future!
In the meantime, you can muck about a bit with the SCRIPT_DIGITSUBSTITUTE structure that ScriptRecordDigitSubstitution returns and modify it before calling ScriptApplyDigitSubstitution. You do not have complete freedom to fix everything, but you have the opportunity to help make up for some of the shortcomings of ScriptRecordDigitSubstitution I talk about here, at least.
This will not help with all of the problems, but at the very least it will work around this Arabic/Farsi only thing. And help meet the linguistic expectations of users since we took the trouble to ask them to detail those expectations in Regional Options.
Which by the way might explain why I am so torqued about this problem -- we are the ones exposing the settings, which means it is the GIFT team that is sitting in role of 'user car salesmen' who make promises that the car itself has no plan to deliver.
But to put a positive spin on that, it is an even better motivator for change, at some point.
To put a slightly more positive spin on the whole situation, Avalon (a.k.a. Windows Presentation Foundation) does a much better job, in part due to the hard work of some of the same people who wrote the original Uniscribe 'logic' (to use a term loosely!). Which is I think good proof that we are getting better here! :-)
This post brought to you by "୧" (U+0b67, a.k.a. ORIYA DIGIT ONE)
The communicative property of addition clearly does not apply to combing marks in Unicode.
Or at least it is not supposed to.
I mean, A + B is not the same as B + A, in any situation where that order is meant to enforce how they are placed in relation to each other.
Anyway, regular reader Mike Dunn asked in the Suggestion Box about an exception to this:
After reading your post about putting lots of diacritics on a letter, I wondered what determines the order that they appear in. I looked at the sequences 0065 0302 0303 and 0065 0303 0302 using Tahoma on XPSP2 in Notepad and Word 2000, and in both cases the diacritics appear in the same order (tilde above the circumflex). This is the right order for Vietnamese, but if I were writing IPA, I would want the circumflex on top. Can the order be changed with control characters?
After reading your post about putting lots of diacritics on a letter, I wondered what determines the order that they appear in.
I looked at the sequences 0065 0302 0303 and 0065 0303 0302 using Tahoma on XPSP2 in Notepad and Word 2000, and in both cases the diacritics appear in the same order (tilde above the circumflex). This is the right order for Vietnamese, but if I were writing IPA, I would want the circumflex on top. Can the order be changed with control characters?
That, my dear Mike, is an excellent question. One that (now that you asked it) I was very curious about the answer. Why do these two sequences:
look the same, anyway?
Both U+0302 (COMBINING CIRCUMFLEX ACCENT) and U+0303 (COMBINING TILDE) have the same canonical combining class value -- 230, which means 'Above'. So there is no valid Unicode-type reason for them to re-order.
Now it is true that one character is a encoded as a precomposed sequence in Unicode and one is not, but still!
I was determined to find out what was going on.
Luckily, down the hall is the best freaking font team in the world, so all I had to do was head down the hall to ask somebody.
Hmmmm.... seems like a lot of people are out right now. I made it all the way down to Nick's office, where he was talking to Mushegh. Aha, maybe they would be able to help.
I started by apologizing to them, since although I do not consider them to be "the dregs" in any kind of quality sense, they ended up being treated as the dregs due to the distance between my office and theirs. They smiled, which I took as a good sign. And then I asked them about the above....
This is actually a known issue, It is a side effect of a bug in the way that the code was looking for precomposed forms (on the assumption that a precomposed version is more likely to look correct if it exists). The bug was causing precomposed characters with the wrong order for combining sequences to sometimes be found....
The good news is that Nick himself had checked in the fix for this bug in Vista, which now does things correctly:
It has not been backported to the prior versions of Windows, though that is the sort of thing which can of course be considered and triaged appropriately....
Now the other part of the question -- how to force the right behavior on the downlevel platforms, there were not too many ideas forthcoming.
Obviously if you are building the font you decide what precomposed characters will exist in it -- you can even have none exist and rely on the attachment points and such to build up the right character.
If you are not doing the font building yourself, you would have to find a way to break up the sequences without changing the display, which can be a real challenge (no one thought of anything offhand).
One way that I did find was putting together U+1ebd U+0302 (LATIN SMALL LETTER E WITH TILDE and COMBINING CIRCUMFLEX ACCENT), although I found it would work in some fonts (such as Segoe UI) and not so well in others (such as Tahoma). See below if you have these fonts both installed:
ẽ̂ẽ̂ẽ̂ẽ̂ẽ̂ẽ̂ẽ̂ẽ̂ẽ̂ẽ̂ ẽ̂ẽ̂ẽ̂ẽ̂ẽ̂ẽ̂ẽ̂ẽ̂ẽ̂ẽ̂
If you do not have Segoe UI installed then it will not look good, so don't bother reporting that as a bug!
So anyway, I headed back to my office and decided to perhaps not just rely on office locations to decide where I visit first -- because sometimes the best people to talk to would otherwise be dismissed as the dregs, and neither Mushegh nor Nick qualify as the dregs in my book. :-)
This post brought to you by "ễ" (U+1ec5, a.k.a. LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE)
Regular reader Ivan Petrov asked in the Suggestion Box:
Hi Michael ;-) Would you tell us something more about Microsoft Custom Locale Builder!? Thank you in advance. Regards, Ivan.
Hi Michael ;-)
Would you tell us something more about Microsoft Custom Locale Builder!?
Thank you in advance.
Regards, Ivan.
Now this is the sort of thing that people like Cathy and I (not to mention others) have been talking about for quite some time without giving specifics.
The very first time it was implied in even a vague sense in a public presentation, it was in a "far off future" category without naming it or anything else.
I can say that we have made progress since then. :-)
But beyond that, there are not currently any other details to share other than the ones that are already known -- e.g. based on custom cultures in .NET but can be used as custom locales in Vista.
I can also state that the name will not be the one that Ivan asked about, since the scenarios it is meant to cover cannot have such a self-limiting name. :-)
But I promise that when there is something that I am allowed to say, I will say it.
So stay tuned, and look to an example Shawn has given of what I think is going to be the more common sort of use for this feature....
Or even better, Shawn did go ahead and post a screenshot of the new tool in this post. So you can consider it something official now -- you can even see the name in the title bar of the screenshot.
This post brought to you by "Ḟ" (U+1e1e, a.k.a. LATIN CAPITAL LETTER F WITH DOT ABOVE)
The other day, Shou-Ching Schilling (LAM) asked me via email:
Hi, you have helped me with many keyboards questions before, so I thought you might know the answer to this one or know who else to contact. I am doing some testing on Arabic keyboards. Sometimes the layout for ( ) and < > (or anything that have left and right version) are in the same order as in English in the label and sometimes they are not. Are there some rules or a different set of resources I can refer to?
Hi, you have helped me with many keyboards questions before, so I thought you might know the answer to this one or know who else to contact.
I am doing some testing on Arabic keyboards. Sometimes the layout for ( ) and < > (or anything that have left and right version) are in the same order as in English in the label and sometimes they are not. Are there some rules or a different set of resources I can refer to?
It all starts because there is the idea of mirroring explained in UAX #9 (The Bidrectional Algorithm). Section 6, entitled Mirroring, goes as follows:
The mirrored property is important to ensure that the correct character codes are used for the desired semantic. This is of particular importance where the name of a character does not indicate the intended semantic, such as with U+0028 "(" LEFT PARENTHESIS. While the name indicates that it is a left parenthesis, the character really expresses an open parenthesis — the leading character in a parenthetical phrase, not the trailing one. Note that in some contexts, some of the characters that have the mirrored property are sometimes not rendered with mirrored glyphs. A higher level protocol can limit mirroring action (rule L4) to a subset of those with the mirroring property. See also Section 4.3 Higher-Level Protocols. Except in such cases, mirroring must be done by an application of rule L4, to ensure that the correct character code is used to express the intended semantic of the character. Implementing rule L4 calls for mirrored glyphs. These glyphs may not be exact graphical mirror images. For example, clearly an italic parenthesis is not an exact mirror image of another: "(" vs ")". Instead, mirror glyphs are those acceptable as mirrors within the normal parameters of the font in which they are represented. In implementation, sometimes pairs of characters are acceptable mirrors for one another: for example, U+0028 "(" LEFT PARENTHESIS and U+0029 ")" RIGHT PARENTHESIS or U+22E0 "⋠" DOES NOT PRECEDE OR EQUAL and U+22E1 "⋡" DOES NOT SUCCEED OR EQUAL. Other characters such as U+2231 "∱" CLOCKWISE INTEGRAL do not have corresponding characters that can be used for acceptable mirrors. The informative Bidi Mirroring data file [Data], lists the paired characters with acceptable mirror glyphs. A comment in the file indicates where the pairs are "best fit": they should be acceptable in rendering, although ideally the mirrored glyphs may have somewhat different shapes.
The mirrored property is important to ensure that the correct character codes are used for the desired semantic. This is of particular importance where the name of a character does not indicate the intended semantic, such as with U+0028 "(" LEFT PARENTHESIS. While the name indicates that it is a left parenthesis, the character really expresses an open parenthesis — the leading character in a parenthetical phrase, not the trailing one.
Note that in some contexts, some of the characters that have the mirrored property are sometimes not rendered with mirrored glyphs. A higher level protocol can limit mirroring action (rule L4) to a subset of those with the mirroring property. See also Section 4.3 Higher-Level Protocols. Except in such cases, mirroring must be done by an application of rule L4, to ensure that the correct character code is used to express the intended semantic of the character.
Implementing rule L4 calls for mirrored glyphs. These glyphs may not be exact graphical mirror images. For example, clearly an italic parenthesis is not an exact mirror image of another: "(" vs ")". Instead, mirror glyphs are those acceptable as mirrors within the normal parameters of the font in which they are represented.
In implementation, sometimes pairs of characters are acceptable mirrors for one another: for example, U+0028 "(" LEFT PARENTHESIS and U+0029 ")" RIGHT PARENTHESIS or U+22E0 "⋠" DOES NOT PRECEDE OR EQUAL and U+22E1 "⋡" DOES NOT SUCCEED OR EQUAL. Other characters such as U+2231 "∱" CLOCKWISE INTEGRAL do not have corresponding characters that can be used for acceptable mirrors. The informative Bidi Mirroring data file [Data], lists the paired characters with acceptable mirror glyphs. A comment in the file indicates where the pairs are "best fit": they should be acceptable in rendering, although ideally the mirrored glyphs may have somewhat different shapes.
Ok, so basically what it means is the glyphs will be expected to "flip" under some circumstances.
And they even give you a BidiMirroring.txt file in the Uncode Character Database that gives you a simple data file you can use for the mirrorings.
Seems easy enough, right?
Well, if you just said 'yes' then you probably have not thought too much about the consequences of characters that will simply flip depending on the context of what is around them.
As one learns growing up, peer pressure of that sort is seldom easy.
But okay, I'll take you at your word.
Now let's add keyboards to the mix.
(many of the examples I give below use the Hebrew keyboard, but the same basic issues come up with the Arabic, Persian (a.k.a. Farsi), and Urdu keyboards, ignoring the multilingual nature of the Hebrew layout with its uppercase English)
If you up to the Windows Keyboard Layouts site to look at the Hebrew keyboard, you will get a small dynamic layout that can be used to display the following five available "shift" states:
Note first how the square brackets [ and ] (U+005b and U+005d) actually seem to flip on the keyboard layout depending on whether you are in the English or the Hebrew "mode" of the layout?
Well, remember that what is displayed will entirely depend on the context of what is around it, and then try to type a word like שלום surrounded by parentheses.
Incidentally, that is the word for hello, goodbye, and peace in Hebrew. By the time you play with this for a bit you will not know what is coming or going on this keyboard and you will want to be left alone in peace. Which makes it a great word for our current purposes.
If you are using a US keyboard you would type A K U O to get the word (just to save some of the experimentation). So now, armed with all of this knowledge, try to type the following in Notepad:
שלום (שלום) שלום
Then for giggles, flip the reading order and see what it looks like. Then start over with the other reading order and try to type it again.
And this is an easy word since get a completely Hebrew context around the parentheses when you are done. Imagine what would happen if you had to type something that ended with a parenthesis....
It is amazing how little what is painted on the faces of a Hebrew keyboard layout has to do with what appears to be typed. While this may be something that a native typist in a languge can understand, it is clearly learned behavior as there is no way on earth to consider any of this to be intuitive.
Especially considering the fact that the parentheses ( and ) (U+0028 and U+0029) are only on the "English" shift states of this keyboard so you don't even get the behavior approaching intuitive that some might argue the brackets have.
Imagine if that Ultimate Keyboard were more than fictional -- if it is hard typing certain types of punctuation with a static keyboard, imagine how much harder it would be to handle one based on a constantly changing one -- especially at the end of text. I hope that it would stay stable, for that reason....
This of course indirectly answers some related non-fictional questions around the OSK (On-Screen Keyboard) and the Tablet PC Soft Keyboard. You probably would not want to try to make them change whether the mirrored or unmirrored glyph should appear, based on what would be about to be typed.
Perhaps you disagree. Hey, no worries, people often disagree with me.
So how do you determine which glyph to show, since you are so sure I am mistaken? :-)
You will find that you are basically re-implementing the Unicode Bidirectional Algorithm so you can tell what level you are currently at. For any text, any time a cursor is inserted somewhere, any time you are using a Hebrew, Ararbic, Persian, or Urdu keyboard.
Oh wait -- what about when you are typing parentheses from the French keyboard in the middle of Arabic text? I guess you had better make this happen at all times, for all keyboard layouts.
How comfortable would you be trying to create such a system?
And then ask yourself how comfortable people would be trying to type in such a situation, again?
I am going to go lie down for a bit, my head is hurting and I need some שלום.
This post brought to you by "﴾" (U+fd3e, a.k.a. ORNATE LEFT PARENTHESIS)(As you may have guessed, the ornate parentheses are nor mirrored. If you ever meet me in person feel free to ask me why!)
The other day, Tracey asked me:
hi, i was searching the internet for how to buy bulk /cases of trader joe products when i came across your site. im so totally addicted to their fruit laces and organic fruit leathers. at 27 cents each they are a real bargain compared to other stores. but i hate driving 40mins to gedt them each week or two and having to feel like a fool while they count up 30 of each flavor. they always laugh and say i must have a real sweet tooth. i embarrass easily i guess but anhow it would be so much easier to just pay at once for a whole case. so my question is how do u go about ordering from them? through the company or direct to the store location?
Cool, another Limonata-related post!
I actually call ahead and ask them to set aside the number of cases I want, and it works quite well. I don't even have the excuse of a 40-minute drive since they are just a few minutes down the road, and they never give me a hard time.
But they do their orders every day and they'll set aside what you ask them to. In the absolute worst case, if you call after they have made the daily order, you will have to wait two days instead of one.
In my experiences in both Washington state and California they are always quite helpful. :-)