Blog - Title

December, 2011

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    We call 'em ShortestDayNames for a reason; they're as short as they're EVER gonna be!

    • 5 Comments

    Once upon a time, Windows had two kinds of day names.

    There were the actual day names, which could be queried via the LOCALE_DAYNAME* constants, and eventually the DateTimeFormatInfo.DayNames Property.

    And there were the abbreviated day names, which could be queried via the LOCALE_SABBREVDAYNAME* constants, and eventually the DateTimeFormatInfo.AbbreviatedDayNames Property.

    The second group was in theory more suited to calendars and such, where space could be quite constrained.

    Now when I say abbreviated day names, I should have put it differently.

    And called them "abbreviated" day names, instead.

    Or perhaps <air quotes>abbreviated<air quotes>, if you know what I mean!

    Because for many locales, the informants giving us data just shrugged and responded simply:

    We don't abbreviate these!

    Not such a great story for mythical calendars, mind you, but their principal argument drowned out the theoretical scenario quite handily.

    After all, we weren't in the localizable calendar business back then!

    We did have a MonthCal common control. And a DTPicker common control. But their built-in calendar "grid" pieces were not localized (and incidentally only supported Gregorian) -- a long drawn out point of contention, as I mentioned in WinForms DateTimePicker and MonthCalendar do not support culture settings.

    It made abbreviated names pretty freaking theoretical!

    Until Vista, that is.

    That is when we saw the updates described in my blog New in Windows Vista: updates to clock and calendar.

    Underneath those changes, we realized that the LOCALE_SABBREVDAYNAME* constants simply would not do. And there was no algorithmically way to reliably create them

    Thus were born the LOCALE_SSHORTESTDAYNAME* Constants, and of course the DateTimeFormatInfo.ShortestDayNames Property.

    But before finally settling on those names, it is important to point out the original name.

    From an evidentiary standpoint, the defense lawyer might object by claiming this would be circumstantial but A.D.A. Ben Stone would say "goes to state of mind".

    The judge would agree.

    Way too much Law & Order in my past, obviously -- but the idea of Ben Stone trying my Law & Order: World-Readiness Unit is too seductive to ignore!

    We called them:

    LOCALE_SONELETTERDAYNAME*

    initially, and

    LOCALE_STWOLETTERDAYNAME*

    when data started coing in that was often two letters and people started complaining on technical and suability grounds about the name. We even ntertained

    LOCALE_SUPERSHORTDAYNAME*

    at one point!

    And when the request for data went out to all of our subsidiary program mangers, we gave the name prominent placement so our intent would be obvious.

    But sometimes they would simply copy and paste the full day names, saying

    We don't abbreviate these!

    And our answer was unambiguous and unchanging, in the manner of "tough love":

    Look, we've wanted the MonthCal to be localizable/localized for years now, and we are doing that. But in order for it to succeed, we need very very short day names.

    You are welcome to ignore the directive, but in that case we are going to truncate the names for you.

    Consider this data request to be your chance to decide how things will be truncated!

    For some locales they realized how dangerous that would be. Like Hebrew, with day names of like יום ראשון and יום שני and so on, where truncation would make them look the same, they realized the risk of trying to call our bluff!

    Some of them continued to push back, insisting that the needed two letters or maybe even three in select cases, e.g. for Hungarian:

    H K Sze Cs P Szo V

    and we shrugged and said fine, though on occasion they may find themselves truncated. But we knew that if they were calendars that even the truncations would be okay in such cases, since no user would ever expect calendars to re-order their day names!

    When you get down to it, in a calendar grid, no English speaking user will find a top row of:

    S M T W T F S / M T W T F S S

    vs.

    Su Mo Tu We Th Fr Sa  /  Mo Tu We Th Fr Sa Su

    to be completely incomprehensible, unless they don't know what a calendar even is!

    A lot of this has been previously discussed, in blogs like {recycled joke here} It could have been called LOCALE_SSINGLESERVINGDAYNAME*, but the focus was on how we were trying to "tame" the language experts and informants, all of whom we needed to provide data. And also how we lost a little bit of the mission and the scenario ourselves until we realized some of the requests we were getting just were not reasonable....

    However, since that time in the days of Longhorn when we wrote the code or Vista when we shipped it, people have been using and enjoying the LOCALE_SSHORTESTDAYNAME* Constants, and of course the DateTimeFormatInfo.ShortestDayNames Property, and a new phenomenon has cropped up many times.

    Someone will insist on the need to have these LOCALE_SSHORTESTDAYNAME*/ShortestDayNames values to never be more than two characters, or occasionally even one character (even though almost none of them are one character by now!).

    We have to adopt a "talk to the hand" attitude here -- and tell them that the LOCALE_SSHORTESTDAYNAME*/ShortestDayNames are as short as they can reasonably across every language.

    And that they are just going to make sure that they design their feature with that reality in mind.

    We don't call them the AlmostButNotQuiteShortestDayNames for a very very good reason -- because they are not!

    Luckily, we point out just as we did with the informants/experts, though on occasion they may find themselves truncated, we know that if they were calendars that even the truncations would be okay in such cases, since no user would ever expect calendars to re-order their day names!

     

    There are some [as of yet unconfirmed] reports of a few ShortestDayNames that are actually incorrect -- if there are such then those would be definitely considered bugs. But that issue would be completely unrelated....

  • Sorting it all Out

    What do Samoa and Seattle have in common? The sun won't shine on either on Dec 30th!

    • 14 Comments

    Tomorrow.

     No sun in Seattle is fairly common this time of year.

    I've lived here since late '96 and I can count the number of sunny December 30thson one hand, and have fingers left over.

    It is harder to engineer in Samoa, though.

    However, for this year, they have achieved the goal!

    And the sun won't come out tomorrow, December 30th, if you are in Samoa.

    As I previously mentioned back in May (in What do Ian Anderson and Samoa have in common? They're both tired of Living in the Past!), Samoa is gong from the end of the time zone list to the beginning.

    The only way to do that? Just skip the day....

    M3 Sweatt talks about the sevral important issues in the update, in Advisory: Windows Cumulative Update for Samoa, as they skip Friday and change their time zone, including lots of important details:

    • the official name in Windows of the new time zone;
    • other changes in the semi-annual update, listed in KB633952;
    • some important side effects to consider about when to install the update if the region is important to you;
    • other important resources....

    I thought I'd mention one other interesting issue here.

    As M3 mentions:

    Technical Changes

    The change will be a move from from UTC -11:00 to UTC+13:00, and a change in the display name for UTC +13:00 time zone (Nuku’alofa, Samoa). So, on the next clock tick after Dec 29, 2011 at 23:59:59, Samoa’s UTC offset becomes UTC +13:00. And the next clock tick will be is Dec 31, 2011 00:00:000. Cartographers will have some challenges dealing with all the updates to maps, moving the International Date Line to 171 degrees longitude west of Greenwich.

    Those string changes have consequences in up to 35 different languages (I don't think those strings are localized for LIP languages).

    And of course all those products that duplicate the effort as I happened to talk about in You don't waste my time if you 'reinvent the wheel' but you often waste yours! may wish they had read that blog.

    For good measure and to be safe, they may wish that had read it a good five to six months before I wrote it!

    My biggest worry (from where I sit) will be for any component not properly updated, either in all strings or in just the non-en-US strings....

    Feel Free to say something if you run into problems with either any other language or any other product after installing this update (and pretty much any time zone update, especially for other products).

    And if your birthday is December 30th and you live in Samoa, then look on the bright side -- in some sense you don't get any older this year.... :-)

  • Sorting it all Out

    I won't Dari or double Dari you to knock our Persian calendar support...

    • 2 Comments

    It was not too long ago (by some scales of time and duration, at least!) that the following question found its way into my inbox:

    Why all famous calendars in the world are included in Microsoft Windows in the Date section, except Hejri Shamsi (Iranian Calendar)?! Does it have any political reason? Or it is only because you are not aware of the importance, accuracy and history of this calendar? Hejri Calendar has two branches, one is Hejri Ghamari (Lunar Hejri) and the other is Hejri Shamsi (Solar Hejri) and just the first one is included in Windows while the second one is more accurate.

    This is a very complicated question.

    And when i say complicated, I don't just mean complicated.

    I mean complicated!

    If you know what I mean.

    Hell, even if you don't know what I mean!

    Okay, we'll start from here. From now.

    .Net supports the PersianCalendar class, which the documentation describes thusly:

    The Persian calendar is used in most countries where Persian is spoken, although some regions use different month names. The Persian calendar is the official calendar of Iran and Afghanistan, and it is one of the alternative calendars in regions such as Kazakhstan and Tajikistan.
     
    Dates in the Persian calendar start from the year of the Hijra, which corresponds to 622 C.E. and is the year when Muhammad migrated from Mecca to Medina. For example, the date March 21, 2002 C.E. corresponds to the first day of the month of Farvardeen in the year 1381 Anno Persico.
     
    The Persian calendar is based on a solar year and is approximately 365 days long. A year cycles through four seasons, and a new year begins when the sun appears to cross the equator from the southern hemisphere to the northern hemisphere as viewed from the center of the Earth. The new year marks the first day of the month of Farvardeen, which is the first day of spring in the northern hemisphere.
     
    Each of the first six months in the Persian calendar has 31 days, each of the next five months has 30 days, and the last month has 29 days in a common year and 30 days in a leap year. A leap year is a year that, when divided by 33, has a remainder of 1, 5, 9, 13, 17, 22, 26, or 30. For example, the year 1370 is a leap year because dividing it by 33 yields a remainder of 17. There are approximately eight leap years in every 33-year cycle.

    Mind you, this already complicated description glosses over things a scosh, as befits anyone trying to compress a calendar system as old as this calendar - at times lunar, at times lunisolar, at times solar, at times starting in the "true Spring", at times starting at the birth of Cyrus the Great, and at times of the emigration of Muhammed, at times using Farsi/Persian, at times using Dari (which is mostly Farsi for month name purposes though I am really oversimplifyin here), at times with Pashto names, and...

    I give up. See the Iranian calendars article for more detail on the variations over the millennia. Note that the lunar calendar has been out of favor since 1925, which has two interesting consequences for usa:

    • Using the lunar calendar for dates after 1925 would involve extrapolating from theoretical dates that depend observations to determine months -- not unlike the Hijri calendar, but without the data in modern times to get the right dates;
    • using the solar calendar for dates before 1925 could involve converting from dates usually recorded using the Lunar calendar, unless one is looking at data that does the conversion already.

    Both problems could have been addressed -- but it would be complicated to do so -- perhaps moreso than anyone would want to tackle here, and it may have just been considered out of scope.

    Ranom factoid: during the .NE 2.0 beta, the PersianCalendar class was named the JalaliCalendar class, but this was fixed prior to RTM. The Jalali term is more accurately used to refer to the Persian lunar calendar, so this "breaking" change between Beta and RTM was a good one, in the sense that it was a correct one.

    I kind of wish the analogious bug in the ThaiBuddhistCalendar class (described in The 'Thai Buddhist' calendar isn't) had been found during the .NET 1.0 beta and saved us all a much more serious conversation about a really complicated "correctness vs. backcompat" bug that may have to be tackled some day!

    To date, we don't even have a KB articke or doc bug on this, AFAIK.

    This is most interesting for Microsoft in general and Windows/.Net in particular since:

    • we do not implement the lunar calendar at all, and
    • we do not implement the solar calendar on Windows (which also leads to problems like those discussed in Behold the [non-fa-IR default] PersianCalendar class), and
    • the problems with the [non-Arabic, not-Divehi] Hijri calendar I mentioned in The example was wrong, but the point of the example was spot on! would also apply to many non-Persian users of the PersianCalendar, and
    • there are some mildly inconsistent reports of problems with the PersianCalendar in .Net, currently unverified but which may in some cases have to do with a few of the many differences that Wikipedia article I mentioned discusses.

    Note that the third problem can't be solved ntil/uless the secdond problem is -- we can't pick the "right" language if we don't know what culture to use.

    So in the end, we are really underwhelming in our support for this small sheaf of calendars asociated with Persian -- wih both the solvable issues and the [at this point] largely unsolvable ones....

  • Sorting it all Out

    You don't waste my time if you 'reinvent the wheel' but you often waste yours!

    • 2 Comments

    The other day, a colleague asked me a question that disturbed me:

    Hi Michael,

    I want our product to get out of the business of maintaining timezones and leave this up to the OS. However, I realized that the system only provides one language at a time, so if the administrator language is English and the end user’s language is French the time zone names would show up as English. Err. Do you know how other teams have solved this problem (without maintaining their own list of timezones)?

    Thanks

    Okay, there are a couple of problems here.

    The huge push to bring time zone to the world of the Windows Multilingual User Interface, something I mentioned in pasing in Keyboards and time zones have something in common, is a huge and expansive architecture that gives you the ability to do quite a bit.

    Now by default, a service running as LOCALSYSTEM is limited to one language.

    But simple use of the EnumUILanuages function will give you a list of every single language for which resources are available. From there, you just have to set the process and/or thread UI language and then you can load up whatever you like -- thus a sever component communicating with clients can use SetProcessPreferredUILanguages or SetThreadPreferredUILanguages (as I discussed in New for Windows 7: The PROCESS to keep MUI from being THREADbare...).

    And you can thereby get the appropriate strings for any language available -- mostly via the documented Win32 APIs for MUI and time zones that will give you almost everything you need (time zone enumeration itself still requires the registry which is nevertheless fully documented, but everything else has API coverage).

    For earlier versions like Server 2003 I'm comfortable telling them to upgrade if they want the feature, and you should too! (a lot of engineering went into this, after all!)

    Perhaps it feels burdensome to require UI languages be installed to get this feature, but this hardly seems like an undue burden to get the feature....

    When I think about the original question, the part that worries me the most is that the act of "getting out of the business" implies that right now they are in the business, with one of the single most expensive items to maintain -- time zone data. The sheer weight of standards tracking and potential geopolitical issues with names and zone definitions? Maintaining them is staggering -- why would anyone want to duplicate that effort?

    And how many features have had to be postponeed version after version that those resources could have been applied to?

    I'm being facetious there, actually -- that effort currently is duplicated a bunch, and those resources are in fact wasted, many times over.

    Though my hope is that soon after I send this link, there may be one less case of duplication on the horizon.... :-)

  • Sorting it all Out

    Thanks for making it work; what happens now?

    • 0 Comments

    So a question got asked late last year:

    Hi,

    My customer reported an issue associated with XmlTextReader.ReadChars. The call to XmlTextReader.ReadChars enters an infinite loop:
     
           While iRetour = BUFFER_SIZE
               'Infinite Loop After this comment...
               iRetour = myXmlTextReader.ReadChars(buffer, 0, BUFFER_SIZE)
           End While

    The problem happens in .NET 2.0.  The same code works fine in .NET 4.0.

    Is it a bug? Any input is greatly appreciated.

    Regards,
    Allen

    This was indeed a bug in .NET 2.0, one that was found some time ago.

    Unforyunately, the whole red bits/green bits thing came up -- and some of the most important compatibility rules that the BCL (Base Class Libraries) faced were:

    • You cannot break binary compatibility by causing recompilation to fail, and
    • You cannot change the  behavior that breaks the documented semantic of the code,

    In this class the basic determination went like this:

    1) The general code pattern was not recommended;

    2) Chaining the semantic of ReadChars in the existing red bits was a bad idea.

    Of course this leads to a different issue -- now that as of .NET 4.0 the code works, the question that is raised is in wondering what the code does now (in the case that used to fail but which now works).

    Do you think this was a good change?

    Extra credit for anyone who understand how to repro the failure case in 2.0, and who knows how the change was implemented....

  • Sorting it all Out

    What I'd do with my 'Microsoft 20% time'

    • 12 Comments

    I was asked a question by a former colleague of mine.

    She and I both used to be collaborating consultants, me as a programmer and her as a designer -- so me without her would have been functional web sites that no would use and her without me would have been beautiful web sites that no one could use.

    Anyway, we hadn't done any work together in years, but we kept in touch, now and again.

    She reads this Blog, and admits (somewhat guiltily) that we are probably in touch less often because she feels she can tell what's going on with me.

    Anyway, about two weeks ago, she asked me a question:

    I know you don't work for Google, so this could be an apples vs. oranges thing, but you seem like you've been an idea man all these years. If Microsoft had a "20% time" policy like Google's, what would you be doing with it?

    To say that this one caught me unawares is probably an understatement!

    In the weird time of open-ended vacation, I found myself taking it as serious inquiry, and a chance to really think about the question.

    First, to get past the obvious -- the surface problems with 20% time are easily summed up:

    and yes - all it takes is the realization that the average Google employee is working a ton of hours....

    For a more serious take on the issue, I find Scott Berkun's Thoughts on Google’s 20% time, including the pages to which he links, to be somewhat required reading.

    The quick answer is that for large sections of my time at Microsoft, my "20% time" has been a lot more than 20%.

    Projects like:

    • The Partial Replica Wizard in Access
    • The Wizard Build Tool project in Access
    • The Office Add-Ins Framework in Office
    • The Access/SQL Server Replication Conflict Resolver
    • The VBA for VB Wizard
    • Microsoft Layer for Unicode
    • Microsoft Keyboard Layout Creator

    and so on, all essentially proposed and largely architected and mostly designed and principally developed (and occasionally tested!) by me.

    Not to mention the thousands and thousands of blogs within this Blog, which for at least 70% of which were done unofficially and outside of what could nominally be thought of as work hours, even when they were technically quite useful for work I would later do.

    So, my 20% time at Microsoft? It's been more like 40% time, at least!

    Okay, this kind of avoids the issue a little though.

    Okay, let's put all of the above aside for a moment.

    I mean, in a job where virtually everything I'm doing is planned and sanctioned and reported on to my superiors, it's easy to imagine a fantasy world where Google apples become Microsoft oranges.

    I mean, if I were really going to devote "1/5 of my time to work on projects of my own choosing", what would I do?

    Okay, here is what I would l love to be doing with 20% of my time at Microsoft:

    1) Release MSKLC 1.5, as I described here.

    2) Architect a plan to expand calendar support (with full parsing and formatting) in South Asia, Africa, South America, East Asia, and basically the largely ignored world.

    3) Become actively involved in the Giving Campaign at Microsoft --  in particular to convince the Executive Leadership to raise the $12,000 annual matching maximum so that at a minimum it keeps up with inflation (since this blog in September I've personally spoken with six different Microsoft VPs and I've talked to and heard from several partners and principals, all of whom agree with this need -- and one of the VPs has gone to SteveB directly already to make the case).

    4) Work with HR and Benefits to better rationalize the long term future of health care to solve more of the actual problems with fraud and mismanagement and errors than the current plans will be able to do.

    5) Figure out how to get everyone thinking about accessibility as a quality of work and quality of life issue for our customers -- and not as line items in an ADA compliance spreadsheet -- a problem that a friend an colleague who is a Director now considers one of her full-time commitments, but I want to do whatever I can to support this.

    6) Directly work on a few of the important technical efforts that go beyond the scope of Windows, to make sure they can be made a bigger priority for those other business units I am not in, as well¹.

    Okay, just six things, right?

    I imagine that even if I was a Technical Fellow with the authority and resources and budget to accomplish all of them, they could easily take up 120% of my time.

    Yet as I sit here and see them listed out here, I feel just as strongly as ever about the need for them to happen as anything currently in my commitments at work.

    And I want to do my part in whatever I can to make each and every one of them happen....





    1 - I'd give more detail here (and the original draft did!), but it may not be prudent if I want to keep my current job, so the extra details were ultimately omitted.

  • Sorting it all Out

    The evolving Story of Locale Support, part 14: Tifinagh, Tamazight, and Berber? Oh my!

    • 14 Comments

    Previous blogs from this series:

    Today's entry started when I got a message (via the Contact link) from Paul Anderson:

    Hello. I've been working for some years on comprehensive keyboard support in Latin and Tifinagh script for a range of Berber languages, with national Berber institutions, associations and universities in North and West Africa.

    While researching, I've often been led to your blog by obscure MSKLC and Unicode issues. I hope you can enlighten me again here!

    I've been trying to take my keyboard layouts to the next stage, and adapt them for inclusion in or packaging as standard extensions to Windows. If the layouts pass muster with the national institutions, the institutions will then work with Microsoft's local offices.

    Material I have read seems to indicate that future keyboards are unlikely to be accepted into Windows if they have deadkeys. I also read that Microsoft will not pay much attention to ISO9995.

    I'd like to ask: - If the keyboard needs to be closely key-compatible with French AZERTY to be accepted by users (since Berber is a minority and non-official language and needs to fit with existing physical keyboards and user habits), could it keep deadkeys? - Three of the deadkeys, both "nice to have" but not absolutely essential, do not simply add diacritics, so the Unicode style of typing doesn't make sense. One deadkey yields superscript modifier letters. Another yields obsolete (but similar) forms of letters (forms likely to linger because some users still prefer them). The third rotates Tifinagh letters (similar situation to rotated letters in the Cree syllabary).

    Would deadkeys still be appropriate in these situations? - What mechanism is likely for access to occasional letters and punctuation?

    I notice that the Canadian international keyboard's Windows implementation uses Right Control + Letter or Right Control + Shift + Letter, not latching, with no 3rd level in group 2. Is that a technical limitation? Berber languages are used in countries where other languages are dominant, and typing of proper names in other languages is common, so an extension layer would be very useful.

    Or would I continue to provide my own, by deadkey?

    Thanks a lot!
    Paul Anderson

    Wow that's a lot of info in one message!

    I'll start with the Tifinagh question - we've ha it in fonts for a few versions now. And in Windows 8 you will see two Tifinagh layouts based on NM 17.6.000: Technologies de l’information – Prescriptions des claviers conçus pour la saisie des caractères tifinaghes.

    Clearly we are more Tamazight focused (Windows 7 even added a Latin script locale):

    But the Windows 8 extended Tifinagh layout should be able to support Berber written in Tifinagh, and for Arabic script Berber there are several existing keyboard layouts to choose from.

    Of course if you install the beta version of Windows 8 when it comes out (and no, I don't know the date!), you will get the better version of these keyboards (there were a few bugs that weren't fixed until after the Developer Preview).

    But by the next version made available you'll see two different layouts, designed to comply with NM 17.6.000, and the two keyboard layouts it describes!

    Now I don't know much about the other suggested work to get other keyboards into Windows, but it might make sense to look into NM 17.6.000, since another keyboard for Tifinagh is unlikely for Windows 8. Perhaps it is good enough for Berber?

    Microsoft isn't anti ISO9995, we just uae MAKLC sources for our builds, so no matter how it is described in a standard, it must haave a reference implementation built with MSKLC for us to make use of!

    For the other issues related to dead keys, such uses are reasonable and might make sense for users who are working with dead keys now, though I know they are not in the standard we were working from.

    Neither Windows nor Microsoft is truly anti-deadkey when they make sense, and I wouldn't want any of my rhetoric meant to discourage people from using them then don't make sense to stop people from using them when they do....

    Here are recent versions of the basic and extended keyboard layouts for Tifinagh, in the BASE, SHIFT,and ALTGR states.

    BASIC:

         

    EXTENDED: 

         

    In fact, you can install MSKLC on Windows 8, load up those two keyboards, and you can then create setups to install them downlevel if you like.... :-)

    The point Paul Anderson raise about wanting a bilingual keyboard is not one that the standard embraces, though the idea has some merit -- perhaps in a future version?

  • Sorting it all Out

    The evolving Story of Locale Support, part 13: Divvying up locales, yet again!

    • 20 Comments

    Previous blogs from this series:

    Now in the past, I've written The Locales of Windows 7, all divvied up, which included:

    • Table 1: the locales representing languages into which Windows 7 localizes
    • Table 2: the locales representing languages for which Windows creates Language Interface Packs, aka LIPs
    • Table 3: locales whose identifiers are not directly associated with any localizations of Windows, even if a related identifier might make for one representing a suitable localization

    I've also written the sequel, The Locales of Windows 7, divvied up further, which included the slihttky more niche:

    • Table 4: the locales into which Windows Server 2008 R2 is localized
    • Table 5: the locales into which PowerShell is localized, by Microsoft
    • Table 6: the locales into which Visual Studio is localized, by Microsoft

    And in The evolving Story of Locale Support, part 5 (...until the decision was made to not refuse to add it), which listed a bunch of locales added to Windows 8 that at some point might make nice entries to an updated version of Table 3.

    I didn't comment about how many of them might be added to a nice updated version of Table 2, because that list is still confidential info. Though if history is a guide then some of the new languqages and some of te exiting ones might fit there.

    Also, in The evolving Story of Locale Support, part 2 (raising the roof on keyboards), I Listed a bunch of new keyboards added to Windows 8, some of which have no LCIDs, as I pointed out in that very blog (I described some of my implementation concerns on this matter in The evolving Story of Locale Support, part 11: What language is that keyboard for?).

    But there is yet another list -- a missing list.

    You see, Table 3 should have, rather than being called

    Table 3: locales whose identifiers are not directly associated with any localizations of Windows, even if a related identifier might make for one representing a suitable localization

    should have instead been more accurately called

    Table 3: locales supported by Windows 7 whose identifiers are not directly associated with any localizations of Windows, even if a related identifier might make for one representing a suitable localization

    Because just yesterday the question was asked:

    I am wondering if en-HK is a supported culture in Windows 7? From this document, it says it’s supported in Windows XP &
    Server 2003, but yet I get a CultureNotFoundException trying to instantiate the culture with “en-HK” or it’s LCID 15369.

    Any suggestions? If not, please forward to a more appropriate alias, thanks.

    One of the people (Alexander) got that mail forwarded it to me, and he (Alexander) is on the small list of people I have Outlook set to let the mail get to me even though I'm on vacation (as I mentioned in I plan to go somewhere that starts with a "T").

    And it is an interesting question, so I thought I'd take it up now!

    The problems with that Locale IDs Assigned by Microsoft web page are numerous, and I'll get into that some other day.

    But for now I'll give a first crack at yet another table:

    Table 7: Locales whose LCD values are reserved but which are not really supported in Windows 8 or earlier

    Language Name

    Reserved LCID

    Burmese

    0455

    Edo

    0466

    English - Hong Kong SAR

    3c09

    English - Malaysia

    4409

    English - Singapore

    4809

    French - Cameroon

    2c0c

    French - Democratic Rep. of Congo

    240c

    French - Cote d'Ivoire

    300c

    French - Haiti

    3c0c

    French - Mali

    340c

    French - Morocco

    380c

    French - North Africa

    e40c

    French - Reunion

    200c

    French - Senegal

    280c

    French - West Indies

    1c0c

    Fulfulde - Nigeria

    0467

    Guarani - Paraguay

    0474

    Ibibio - Nigeria

    0469

    Kanuri - Nigeria

    0471

    Kashmiri

    0860

    Kashmiri (Arabic)

    0460

    Latin

    0476

    Manipuri

    0458

    Nepali - India

    0861

    Oromo

    0472

    Papiamentu

    0479

    Rhaeto-Romanic

    0417

    Romanian - Moldava

    0818

    Russian - Moldava

    0819

    Sepedi

    046c

    Sindhi - India

    0459

    Sindhi - Pakistan

    0859

    Sinhalese - Sri Lanka

    045b

    Slovak

    041b

    Slovenian

    0424

    Somali

    0477

    Sutu

    0430

    Tamazight (Arabic)

    045f

    Tibetan - Bhutan

    0851

    Tsonga

    0431

    Urdu - India

    0820

    Venda

    0433

    Yiddish

    043d

    HID (Human Interface Device)

    04ff

     Note that en-HK is one of the many locales on this list.

    Some also recognize Urdu - India,which I discussed in Where's the other Urdu?, a blog where I also talked about the Locale IDs Assigned by Microsoft page and some of its problems.

    I'd love to comment about some of the reasons why a locale would end up here on this page, reserved, but in many cases the reasons would be mere suppositions on my part.

    I mean I know about my fruitless campaign for an Urdu - India that was champoioned by and ulimately tied to a Microsoft VP who left under unhappy at the lack of direction of many efforts he championed in the unused potential of Microsoft India -- technologies that ultimately were marginalized and now have no owners (an issue I also was unable to influence in the endd since no one had the resources to take the work on). But that's heroic tale of throwing an elbow that didn't connect, a little too much inside baseball, and something that ultimatelty blew my stack on sports metaphors that only some o my readers will get.

    Alternately, I know about the campaign of harassment by a professor that led to the Yiddish locale being added to this list, and I could revel my readers with tales like that. But although it came to me from a reputable source, it is still hearsay -- and I'd like to keep blogs admissible. :-)

    Kind of hints at future Table 8: Locale lists that are broken in one or more ways. :-)

    But for now, that Table 7 list should do. And I got a question of a customer (albeit an internal customer) answered, which id good since i still do serve at the pleasure of the customer....

  • Sorting it all Out

    On limitations your design that you may have failed to take into account

    • 2 Comments

    New ways to support input are really all the rage these days - from IMEs to new IMEs to the soft keyboards of Windows 8 to the Swypes and Swipe Its and Sliders and Touch pals and Shape Writers and so on.

    Technically I am not connected to any of these efforts, even as I Continue to crank out new keyboard layouts for Windows (e.g. What language is that keyboard for?, Behind the Cherokee Phonetic layout in Windows 8, and others).

    Though I end up at least indirectly connected o most of them, since no matter how little they resemble hardware keyboards they all tend to sit atop keyboards.

    And thus they need to interact with this venerable way to get text entered in by users.

    My inbox regularly sees questions pop about some of the new things that don't work properly because of unanticipated design limitations in the support underneath!

    In theory such issues could be considered bugs. But since in general the design works on all these platforms:

    • Windows NT 3.1
    • Windows 95
    • Windows NT 3.5
    • Windows NT 3.51
    • Windows 95 OSR2
    • Windows NT 4.0
    • Windows 98
    • Windows 98 SE
    • Windows Me
    • Windows 2000
    • Windows XP
    • Windows Server 2003
    • Windows XP SP2
    • Windows Server 2003 64
    • Windows XP 64
    • Vista
    • Windows Server 2008
    • Windows 7
    • Windows Server 2008 R2

    (and that's not even a complete list!), then I really have no problem looking qt the reported bug that someone's "exciting new world changing input method" runs into, and call it

    BY DESIGN

    BY DESIGN

    BY DESIGN

    BY DESIGN

    BY DESIGN

    without even bothering to feel a little bit embarrassed.

    That code has been around for a long time, and a lot of of people depend on its behavior. It cannot be changed lightly. Unlike your new code, that no one has ever depended on before.

    Frankly, no one has seriously entertained working outside the existing input stack. Which means they've implicitly agreed to work within its rules and limitations.

    Perhaps I'll even talk about some of these limitations in the future, if I can sufficiently extract them from their original reports -- I'm not trying to use this Blog as a way to scold these input innovators, except generally like I do in today's blog. :-)

  • Sorting it all Out

    So...a three pack of black socks from Nordstrom, huh?

    • 2 Comments

    I promised a friend that I'd tell a story.

    Right here. and right now.

    and I like to keep promises, even silly ones.

    So, without further adieu, the story....

    Christmas is an unusual time of year.

    One of the more unusual parts of it has ti do with when friends ask you what you want for Christmas.

    Conversations can go (and on occasion have gone) something like this:

    Friend: So, I was putting together my list and thought I'd ask what you might like for Christmas.
    Me: Are you doubling as an elf for Santa this year?
    Friend: No, this is just my [husband|wife|boyfriend|girlfriend] and I, just doing our Christmas shopping.
    Me: You do realize I was raised Jewish, right?
    Friend: Okay, for Hanukkah. Whatever.
    Me: I'm more of atheist M.O.T., these days. No need to give presents. The upside being neither Santa nor Hanukkah Charlie leave me coal.
    Friend: Michael, could you just pretend you we normal for 30 seconds and give us a chance to do something nice for you during this special time of year?
    Me: Okay, okay. I guess I could use...uh...never mind. It's silly.
    Friend: How silly could it be? Tell me! And let friends of yours who aren't sure what you'd like give you a gift!
    Me: Okay, but you asked for it....

    I would then proceed to the two things I could really use right now.

    First of all, I could really use some socks.

    Yes, socks.

    For the 4-5 months of the year that during the day I'm not barefoot and not wearing sandals, and the twelve months of the year where I'm going out at night, some nice black socks would be a goodness.

    Ideally, socks like these from Nordstrom or some similar store (click on the picture for the link):

    Selecting those specific socks would greatly enhance the Chances of me wearing them, but it's nominally a gift so I can't get too choosy about it.

    Of course no one seems very enthusiastic about buying socks as a present. Despite assurances that they would be used. And appreciated.

    They then focus on the fact that I said two things.

    In the hopes that the other thing might seem more like a gift to them.

    Because apparently someone wanting to give me a gift, us really all about them.

    Sigh.

    The other thing I could really use is a drying rack.

    Because now that I'm finally buying clothes that are the right size, I don't want them to shrink.

    And thus the multiple decades logic I used to run under of assuming everything not marked dry clean only would be dryer bound could have a more effective means of being carried out than on hangars in my bathroom.

    I would then at that point say that what I really needed was one of those drying racks - they are too awkward to get home myself without a vehicle, and would truly be useful.

    They would then pause for a second, consider the issues, and realize that I really had no other gift in mind.

    Then each time I would get the same response:

    "So...a three pack of black socks from Nordstrom, huh?"

  • Sorting it all Out

    I Adar you to guess how they make it work!

    • 0 Comments

    The other day, when I blogged These aren't the MONTHS you're looking for (aka You'll never get to the 13th month *that* way), I oversimplified some of the described results in order to concentrate on proving that GetLocaleInfo was never returning the CAL_HEBREW (Hebrew calendar) months, without distracting the issue with other points.

    The comments went slightly afield on other points, e.g. when Shachar Shemesh suggested:

    Surely, the 12th month in the Hebrew calendar is אלול, not כסלו. Now, if you claim that the sixth month is אדר א׳ and the seventh אדר ב׳, then the 12th month would still be אב.‎ כסלו is the third month, either count.

    Or is that what you meant by "but you probably know what I mean", in which case, I know what you mean, but not that you meant it.

    Shachar

    or when Alex Cohn commeted:

    @Shahar, some would argue that Kislev is number 9, because the first month is Nissan! But anyway, I would choose Adar bet to occupy the 13th slot, to emphasize its conditional existence.

    Obviously the NLS API has to make a decision somehow.

    The Hebrew calendar works a little differently anyway, as I'll describe a little bit of now.

    If you use GetCalendarInfo or GetCalendarInfoEx with CAL_HEBREW, here are the names you get:

    Constant Month Name
    CAL_SMONTHNAME1 תשרי
    CAL_SMONTHNAME2 חשון
    CAL_SMONTHNAME3 כסלו
    CAL_SMONTHNAME4 טבת
    CAL_SMONTHNAME5 שבט
    CAL_SMONTHNAME6 אדר
    CAL_SMONTHNAME7 אדר ב
    CAL_SMONTHNAME8 ניסן
    CAL_SMONTHNAME9 אייר
    CAL_SMONTHNAME10 סיון
    CAL_SMONTHNAME11 תמוז
    CAL_SMONTHNAME12 אב
    CAL_SMONTHNAME13 אלול

     Note that neither function takes an actual date value, so you have no way from just these functions to determine whether the 7th month entry wll be skipped or not for a given year.

    Only when you format a date that uses this calendar can you determine whether that seventh entry is to be used.

    And of course note that unlike the LOCALE_SMONTHNAME# constants, you cannot really equate the numbers in the CAL_SMONTHNAME# values -- the calendar itself does everything and doesn't give you easy methods to get additional information....

     Bonus thought:

    Keeping in mind the differences I mentioned in I Adar you! Hell, I Double Adar you!, where are the other .Net names hidden now that Windows an .Net share data?

  • Sorting it all Out

    Every character has a story #35: ARABIC LETTER HAH WITH TWO DOTS VERTICAL ABOVE (U+0682)

    • 0 Comments

    So it started the other day.

    On the Unicode List.

    Andreas Priliop asked:

    Arabic letter U+0682 shows two dots above. It has the cryptic remark "not used in modern Pashto". But was it ever used?

    The new 2011 edition of German standard DIN 31635  "Romanization of the Arabic Alphabet"  http://www.beuth.de/en/standard/din-31635/140593750 shows the real archaic Pashto letter on page 22.

    It has one dot above and one dot below, corresponding to Pashto U+0696 and U+069A.

    This is also the form shown in "Lehrbuch des Pashto (Afghanisch)" von Manfred Lorenz.


    /* The current form for [dz] is U+0681. */

    What he didn't realize was this was not the first version of an annotation for this letter.

    We then had some people weigh in, like Ken Whistler:

    To understand where the "cryptic" remark came from, you need to know more about the history of the character in the standard.

    U+0682 was encoded in Unicode 1.0. I don't have the material in hand right at the moment to track down its original source, but for these kinds of extensions to Arabic dating back to Unicode 1.0, it most likely in some poorly resolved handwritten or photocopied source labelled "Pashto" but without much analysis.

    However researching the exact details for that turns out, in Unicode 1.0 the character was published with a note "Pashto".

    On February 13, 2003, Roozbeh Pournader sent a note around with a number of comments of Arabic character extensions and annotations. Among those notes was the statement:

    C6. For 0682: The comment is wrong. This is not used in modern Pashto (just rechecked with my Pashto dictionaries). I am back from Kabul doing a study of computer requirements of Pashto and didn't see this anywhere. I guess we should send a public email and ask if anybody knows what this is. [Just an alert. Don't do anything for now.]

    Then on March 19, 2003, Roozbeh followed up with another note:

     > 3. Comment for 0682: Remove 'Pashto'. This is not used in
    > modern Pashto.
    > Never. And not in loanwords. (May possibly be old Pashto.)

    Based on that note, and with no further clarification provided by anyone on the issue, I and the other editors modified the annotation in the Unicode *4.0* names list, so that it read "not used in modern Pashto".

    It has remained that way in the names list since that date.

    If Andreas (or anyone else) has better information, that can certainly be submitted, and the editors can then work to further clarify any annotation for the character.

    My own suspicion is that the original form from Unicode 1.0 may have been a hard-to-interpret glyph alternative for 0681. Note another note on the unicode email list from 2001, from Vladimir Ivanov. This note doesn't address 0682 specifically, but does raise questions about the exact nature and shape of the diacritic above the hah for dze in Pashto usage:

    ==============================================================

    Date:     Fri, 8 Jun 2001 07:27:11 +0400

    My Pashto informants call it "dI paxto alifbe", saying it has 10 extra =
    letters.
    Letter "dze" is represented in Unicode by U+0681 "Arabic letter heh with =
    hamza above",
    though the sign above heh is not exactly hamza. It is a zigzag-like sign =
    of the same height as hamza, but they are well distinguished. My =
    informants could not recall any special name for it.
    If you use "heh with hamza above", people usually accept it as a =
    substitute, saying that "computer is not able to build a real Pashto =
    letter" (?!).
    I could not find such a letter in Unicode. I would be glad to hear some =
    comments on  it.

    Sicerely,
    Vladimir Ivanov

    ==============================================================

     some others weighed in, in small ways.

    They all meant well, but with no new info to impart.

    Finally, Ron weighed in with what looked like the answer:

    I think I have an answer to a possible source of U+0682:

    Grammar of the Pasto or Language of the Afghans, Compared with the Iranian and North-Indian Idioms. By Dr. Ernest Trumpp. London and  Tuebingen, 1873. (Available from Google Books)

    Page 1 (Page 24 of the PDF download from Google Books):

    "Only one consonant has been left indistinct, the media [U+0685] d (=  dz), which is not distinguished from its tenuis [U+0685] t (= ts) by  separate diacritical marks. We have endeavoured to supply this want by  placing two dots above [U+062D], viz. [U+0682], as for a foreigner at 
    any rate the non-distinction of the two sounds must prove very  troublesome."

    Indeed, some other 19th century grammars refer to Pashto [ts] and [dz] as distinct letters but typeset them identically with three dots above (that is, like U+0685). Here are two such examples:

    A Grammar of the Pukkhto or Pukshto Language on a New and Improved System, by Henry Walter Bellew, London 1867 (see alphabet table on page 3, that is page 20 of the PDF download from Google Books).

    A Grammar of the Pukhto, Pushto, or Language of the Afghans, by Lieutenant H. G. Raverty, Calcutta 1855 (see alphabet table on pages 3-4, that is pages 77-78 of the PDF download from Google Books).

    So it appears that the character "Hah with two dots vertical above" was a 19th-century attempt to distinguish Pashto [ts] and [dz] for didactic purposes. The convention of writing [dz] using Hah with hamza above (U+0681) appears to have emerged later. There are still some unanswered questions.

    - Why did a character from a 19th-century book get coded in Unicode? Did it ever receive wider use beyond Trumpp's book?

    - Is the present hamza convention a development of the two vertical dots proposal, or are they unrelated? About a year ago I worked with several Afghan expatriates living in Southern California, and in handwriting they would typically join two diacritical dots as a squiggle rather than a line (which is more common in Arabic). One could see how two vertical dots might develop into a vertical squiggle and later into a hamza, especially given the note by Vladimir Ivanov cited below. But this is only a conjecture at this point.

    Anyway, I hope to have contributed a few pieces towards solving the puzzle :-)

    -Ron.

    Nw under ordinary circumstances, this would probably not have ben encoded without a ot more effort.

    But in those early days, a lot of stuff slipped in.

    Including what may have been an early, experimental linguistic innovation never ultimqately picked up in the language that this letter may indeed have been.

    Now in most fonts that cover Arabic:

    And as we say, "every character has a story."

    Some just need deeper digging than others!

  • Sorting it all Out

    These aren't the MONTHS you're looking for (aka You'll never get to the 13th month *that* way)

    • 9 Comments

    The other day I had somebody call me out about a blog I wrote a little over a year ago. I'll excerpt the mail to avoid some of the more graphic language:

    Dude, maybe you should read a little before you freaking write! If you query for month names of the current user default when you Hebrew is the default and the Hebrew calendar is selected, then the only way to *avoid* getting Hebrew months is passing the flag to ignore user overrides. Kind of a huge hole in the "logic" of that blog you wrote!

    For the record, I replaced the word he used with freaking there. He used 𝓕𝓾𝓬𝓴𝓲𝓷𝓰 only he didn't do something clever with Unicode to avoid the publish filters.

    I was not offended, sometimes mail kind of goes that way.

    But there are a few things you should probably know about me:

    First of all, sometimes I can be wrong. There are two types of situations where that can happen:

    • When it's something I don't care about. In that case, I'm simply not interested and you have an uphill battle trying to convince me it matters;
    • When it's something I do Care about. In that case, I'll work to correct the problem, whatever it may be.

    Now the fact that I don't care about the first category means the people who point those things out might not like me, but from my point of view I find them to be a waste of time, so that saves me a lot of time. And I can use that Free time to minimize the second category when possible!

    Second of all, before you go down the road, you might want to try it yourself, and not trust what you might infer from the documentation.

    I love our docs but I point out their real flaws all the time!

    In this case, it is quite easy to change the Standards and Formats (aka default user locale) to Hebrew:

    and then click the Additional settings... button and choose the Date tab to get to where you can change the calendar:

    Once you do this, you'll see that we are now using the Hebrew calendar:

    Now if while you are in this this situation, you call

    GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SMONTHNAME13)

    you will get a zero length string.

    And if you call

    GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SMONTHNAME12)

    you will get

    דצמבר

    a.k.a December, not

    כסלו

    a.k.a. Kislev.

    Well, not really Kislev, but you probbly know what I mean.

    You see, to the random person who sent me the mail, I knew of this particular limitation. And before I wrote and published Note to the NLS API: It ain't ever gonna by 13 o'clock, either., I tested it out explicitly to make sure the issue was still there.

    Because when it's stuff I find interesting, I know it makes a 𝓕𝓾𝓬𝓴𝓲𝓷𝓰 difference to get things right....

    In the end, although GetLocaleInfo/GetLocaleInfoEx have documented behavior related to the LOCALE_NOUSEROVERRIDE flag, the fact is that the different month names inherent in many alternate calendars have never been returned, even when the LOCALE_ICALENDARTYPE has been changed and the functions can detect this fact.

    Only the user overrides you can drectly change are impacted, and the "secondary" data like month names aren't available through GetLocaleInfo[Ex].

    Is it a bug?

    Sure, kinda.

    But it is a bug that has existed for as long as alternate calendars have. In almost ever copy of Windiws ever sold or stolen.

    This fact is easily verified experimentally by anyone who takes the the time to do so.

    Ideally before cursing me in my own inbox, though if not I can always write a blog about it!

  • Sorting it all Out

    SharePoint and CJK Extensions A, B, C, D, and even E?

    • 2 Comments

    So, the question I got the other day was:

    We are setting up SharePoint and want to know what collation to use. What support does SQL Server have for CJK Extensions A/B/C/D?

    Now that's an interesting question.

    If you think of SQL Server 2000 as the first version to support the current architecture of collation in SQL Server, it is fair to say that SQL Server 2000 did not support any of those four CJK extension ranges.

    Similarly, Windows 2000 didn't support any of them either.

    Then,starting in XP and continuing in Windows Server 2003, something interesting happened.

    Basically, support was added for CJK Extension A that placed all of Extension A at the end of the list in the default table.

    And also support was added for all of the high and low surrogates in planes 1, 2, 15, and 16.

    This was done using the same info i added to The basics of supplementary for those four planes, by assigning weights to:

    • All of the low surrogates, U+dc00 to U+dfff;
    • U+d800 - U+d83f (Plane 1, Supplementary Multilingual Plane);
    • U+d840 - U+d87f (Plane 2, Supplementary Ideographic Plane);
    • U+db80 - U+dbbf (Plane 15, Supplementary Private Use Area A);
    • U+dbc0 - U+dbff (Plane 16, Supplementary Private Use Area B).

    Two interesting side effects here -- first, the non char acer sentinels in each plane were given weight, and two every character in Planes 1 and 2 whether they had characters assigned yet or not, were given some weight.

    Now note that Extension B, Extension C, and Extension D are all located in Plane 2 -- which means that every single ideograph in CJK Extension B that was assigned at the time, the CJK Extension C and CJK Extension D that were assigned later, and all of the not yet assigned space including the part roadmap'ed as being CJK Extension E were all given weight.

    Code point order, of course. But some order is better than giving them no weight, right? :-)

    SQL Server 2005 basically picked up these additions, but only for a few of the newly added collations.

    They thus introduced the notion of having code points that have weight in some collations but not others.

    But again just code point order within the ranges (and Extension A after Plane 2).

    now enter Vista and Windows Server 2008 and Windows 7 and Windows Server 2008 R2 and SQL Server 2008 and SQL Server 2008 R2, which all added sorts with linguistic relevance to (depending on the collation) some or all of the ideographs in CJK Extensions A and B.

    And every ideograph not included there keeps those same default weights that stick them at the end (though at least we put Extension A before its later counterparts!).

    Note that no linguistically relevant info is used for CJK Extensions C and D....

    Anyway, that answers the question about SharePoint, I think. :-)

  • Sorting it all Out

    The evolving Story of Locale Support, part 12: Logic dictates that we keep a sense of proportion about the RATIO

    • 24 Comments

    Previous blogs from this series:

    As I've mentioned in the past, I used to work across the hall from Shawn.

    His office, he used to have a large cardboard cutout of Mr. Spock:

    I have no idea if it still is there, but it used to inspire me.

    To be logical.

    To have a sense of proportion.

    Even now I once again find myself with Douglas Adams on my mind, in this case from a bit of The Restaurant at the End of the Universe:

    The Total Perspective Vortex derives its picture of the whole Universe on the principle of extrapolated matter analysis.

    To explain--since every piece of matter in the Universe is in some way affected by every other piece of matter in the Universe, it is in theory possible to extrapolate the whole of creation--every sun, every planet, their orbits, their composition, and their economic and social history from, say, one small piece of fairy cake.

    The man who invented the Total Perspective Vortex did so basically in order to annoy his wife.

    Trin Tragula--for that was his name--was a dreamer, a thinker, a speculative philosopher or, as his wife would have it, an idiot.

    And she would nag him incessantly about the utterly inordinate amount of time he spent staring out into space, or mulling over the mechanics of safety pins, or doing spectrographic analysis of pieces of fairy cake.

    "Have some sense of proportion!" she would say, sometimes as often as thirty-eight times in a single day.

    And so he built the Total Perspective Vortex--just to show her.

    And into one end, he plugged the whole of reality as extrapolated from a piece of fairy cake, and into the other, he plugged his wife: so that when he turned it on she saw in one instant the whole infinity of creation and herself in relation to it.

    To Trin Tragula's horror, the shock completely annihilated her brain, but to his satisfaction he realized that he had proved conclusively that if life is going to exist in a Universe of this size, then one thing it cannot afford to have is a sense of proportion.

    And what does this have to do with anything, you might be wondering?

    Well, I was thinking about it when it was pointed out to me that when describing a time value, the COLON (U+003a) was not really what people generally used.

    Instead, the RATIO (U+2236), available since at least Unicode 1.1, was preferred, due to the bottom dot being spaced slightly higher up.

    You can look at pictures of digital clocks to prove it to yourself, here.

    Sound reasonable enough.

    In fact, in case you are running on Windows, there is just one problem here: the fact that not all fonts have a RATIO in them.

    Looking at four fonts (Segoe UI, Verdana, Meiryo, and Meiryo UI) on Windows 7:

    You will see two problems:

    • Not all fonts have RATIO in them, though they all have colons;
    • Font linking and/or fallback seems to be grabbing the character from other fonts in some cases, like Simsun.

    Now the situation is slightly better in Windows 8, looking at that same document:

    We do seem to have added a bunch of RATIOs there.

    And with all of the many interesting places that times pop up, that preferred view makes sense to me. Looking at (for example) Segoe UI:

    Okay, I'm prepared to say that the RATIO in that font looks a little better than the COLON -- or, if nothing else, it matches the way that many of those many different clocks you can see here.

    Anyway, they are now working through fonts to get the character in.

    I imagine that given the many fonts that might be used in the user interface, the various font linking/fallback issues that will sometimes insert a glyph will full width spacing like Simsun, it seems like a good thing that they are doing that -- you can look in earlier versions of Windows to see the cost of getting the right glyph in the wrong font!

    Now from a collation standpoint, they are two different characters, and always have been.

    I had somebody ask me about that the other day, and also whether anyone wanted to change the underlying locale data, but that answer is more complicated.

    I mean, after all, the RATIO is not directly on any code page, and it is only "beat fit mapped" to a COLON on some but not all of them.

    And no matter whether you prefer the RATIO to the COLON, probably either one is better than the question mark.

    We can't change the code pages to add best fit mappings to 874 and 932 and 1255 and such.

    The cost of hacking GetLocaleInfoA and GetTimeFormatA and EnumTimeFormatsA would also be pretty risky and expensive, not to mention it would add a performance hit to every call to these functions.

    Better to do work that costs us some effort now than cost customers time or potential bugs later.

    To do this, we have to engineer it right, because logic dictates that we keep a sense of proportion about the RATIO!

Page 1 of 2 (20 items) 12