Blog - Title

August, 2007

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Every character has a story #28: U+1e9e (CAPITAL SHARP S)

    • 7 Comments

    That night I saw in the pipeline fair
    A character that wasn't there
    Non-existence won't stop the encoding; it's true
    So it's coming soon to a Unicode near you!

    It all started with Every character has a story #15: CAPITAL SHARP S (not encoded), and then continued in Every character has a story #26: CAPITAL SHARP S (might be encoded?).

    And you can read the title of this latest blog post and know what is happening now without any hints from me....

    Though I must admit the trip has been both long and strange.

    It was decided within both ISO 10646 and Unicode that this interesting character was indeed going to be encoded (as per the pipeline, it was officially accepted on May 18th of this year and as of April 27th is in Stage 5 of the ISO process.

    And I have probably learned more about the nature of letters within typography than any experience before or since!

    Immediately after this process started, there was a whole bunch of discussion on the Unicode List about a very important topic:

    WHAT DOES A CAPITAL SHARP S LOOK LIKE?!?

    There were a whole bunch of proposals here, and much of the conversation then took a southward turn.

    Like people suggesting that DIN should be dissolved by law for supporting the proposal.

    And others pointing out that the proposal specified an enlarged version of ß. nothing more and nothing less.

    But I have told you about the Unicode List, the next 100 messages oscillating between discussing typographic innovations that would make sense if the letter did indeed exist based on different theories of its etymology and people who remained unconvinced by the proposal even after it had been accepted since in their view it isn't a freaking letter in the first place.

    Plus lots of SZ vs. SS arguments.

    An informal survey of the Germans I knew all seemed to fall squarely in the camp of the insanity of DIN, though many of them considered the opinion to be redundant....

    And then with a few people talking about the consequences for Unicode properties, just to add the vague scent of relevance to the discussion. :-)

    John Hudson had in my opinion the most amusing observation:

    The irony of the recent exchanges is not lost on me:

    On the one hand, we have Marnen Laibow-Koser, who thinks that this character should not exist, but that it does, and therefore needs to be encoded.

    On the other hand, we have me, who thinks that this character should exist, but that it does not, and therefore does not need to be encoded.

    Just so.

    For Microsoft, it raises some interesting questions for both collation and case for the next version of Windows.

    I mean, think about the issues I have already talked about in posts like What the %#$* is wrong with German sorting? where we make ss equal to ß so that the uppercase version "SS" will sort near the ß in a sort ignoring case -- where we do things that make less linguistic sense in order to give regular results that are intuitive.

    So who would expect that if U+00df is equal to ss that U+1e9e wouldn't be made equal to SS? Meaning that in the collation tables, U+00df and U+1e9e would simply be case variants, with no real choice in the matter.

    And as to casing....

    Now just because we make the relationship in casing does not mean we make it in collation. After all, as I have pointed out several times before, collation != case.

    But on the other hand, the case table is used in order to enforce the case insensitivity in the NT object namespace and the file system. And one clear issue is that there is no good reason to allow one to put filenames differing only by the presence of U+00df and U+1e9e in the same directory. Users would either never try it or they would never expect it to work. So it is quite possible that in the next version of Windows (which only does simple casing) it may make the most sense to make the two characters case variants of each other -- to enforce reasonable use of both letters!

    There is still lots of time to decide, though at present I am leaning this way since it will give the most intuitive behavior for end users (even at the expensive of giving slightly unintuitive results for developers).

     

    This post brought to you by ß and ß (U+00df and U+1e9e, LATIN SMALL LETTER SHARP S and CAPITAL SMALL LETTER SHARP S)

  • Sorting it all Out

    Getting the language (and more!) of an LCID-less keyboard

    • 10 Comments

    So back in May when I was talking about Getting the language of an LCID-less keyboard, I promised to do a bit more explaining about how support for custom locales was integrated into MSKLC 1.4, so that people could write code to work with specific keyboards.

    I've been busy so I did not get to it right away, but I am going to talk about it a bit more now. :-)

    It was actually way back in 2004 in Some Keyboarding Terms that the story begins; in that post I discussed the Keyboard Layout Identifier (KLID) and the fact that a KLID value with an "A" prefix meant an MSKLC-created layout. And I have talked about KLID values a few times since then, including that time in May when I talked about the custom ones.

    I don't think I have ever explained what the install package for the keyboard provided by MSKLC does, though, so I'll start there.

    Let's say we have installed that Valley Girl custom locale I asked Shawn to make for our fearless leader that I talk about in You wonder if like Vista supports custom locales? Fer shur!. It will show up in the Vista and Server 2008 user interface:

    The fact that it is installed on Windows will mean that MSKLC picks it up in its own language list from Project|Properties, whether you select it as I have or not:

    When the install package is built, it will include a LANGID value in it (it will end up in the MSI Property table, in string form in the LCIDValue property, in numeric form in the ProductLanguage property. For custom locales, the values of 0c00 and 3072 are used.

    At install time of that package, a small process occurs:

    1. That LANGID is used as the LOWORD of the KLID of the keyboard -- thus for our custom locale we have 0x00000c00;
    2. An "A" prefix is put in the front, to signify an MSKLC layout -- thus for our custom locale we have 0xA0000c00;
    3. The registry key with the layouts (HKLM\SYSTEM\CurrentControlSet\Control\Keyboard Layouts) is checked to see if that subkey exists; if it is not, then it is created and the process skips to step 5;
    4. 0x00010000 is added to the KLID candidate (thus for our custom locale we will have 0xA0010c00 the first time we hit this step) and then step #3 is repeated;
    5. The following relevant registry values are added underneath the new subkey:
      1. Custom Language Display Name - an SHLoadIndirectString style string pointing to a resource in the layout DLL that contains the name of the custom language -- for our custom locale, the resource it points to is Valley Girl (California);
      2. Custom Language Name - A plain text version of the name of the custom language -- for our custom locale, Valley Girl (California). 
      3. Layout Display Name - an SHLoadIndirectString style string pointing to a resource in the layout DLL that contains the name of the custom keyboard layout -- for our custom keyboard, the resource it points to is Like Totally For Shure.
      4. Layout Text - A plain text version of the name of the custom keyboard layout -- for our custom keyboard, Like Totally For Shure.
      5. Layout Locale Name - Only present for custom languages, the name of the locale for use in name-based NLS API calls, e.g. en-US for English (United States) -- for our custom keyboard, valley-GIRL.
      6. Layout Product Code - a GUID representing the ProductCode in the MSI's Property table -- for our package, {B883CD61-769F-4488-8070-FBD07C0147E7}.

    The view of the Property Table in the MSI file will look something like this (almost completely identical between all three MSI files):

    And after the install, the registry will look something like this:

    Without further adieu, the "cool for the ISV" design consequences:

    1) The Layout Product Code is identical between the i386, amd64, and ia64 versions of each package, which is a theoretical violation that may cause the likes of fellow traveler Heath Stewart to cringe but ends up being a sound decision since each of the three different packages cannot ever be run on any architecture other than the one for which it is intended.

    2) That Layout Product Code provides an excellent way for an application that installs an MSKLC-generated keyboard layout to find that specific layout, even though the KLID value may be different if other custom keyboards that use the same LANGID value are already installed.

    3) The KLID value's LOWORD provides either an excellent LANGID to use for getting locale information if it is needed OR when the LOWORD is 0c00 a signal to look at the Layout Locale Name registry value.

    4) The Layout Locale Name provides the name of the custom locale when one was used for the layout's creation.

    Now I will cover some other interesting and/or important design consequences in the next post in the series, but this should be plenty to get people started....

     

    This post brought to you by (U+1553, a.k.a. CANADIAN SYLLABICS FE)

  • Sorting it all Out

    İ şéè đêäđ ķéÿš

    • 24 Comments

    (Apologies to M. Night Shyamalan!)

    So, another great shirt showed up at CafePress.com:

    There is not a single bit of this shirt that I don't think rocks! :-)

    Click on the shirt to buy it, etc.

    Now of course people liked it, but when Simon Daniels saw it you know what he said, being a member of the Microsoft Typography team, right?

    He wondered what font that was!

    Here is a close-up of the text:

    Or for those who speak code points:

    U+0130 U+0020 U+015f U+00e9 U+00e8 U+0020 U+0111 U+00ea U+00e4 U+0111 U+0020 U+0137 U+00e9 U+00ff U+0161 

    Well, he didn't think it was Verdana given the y above that has an extra tail and the Verdana has an uppercase I with dot with serifs.

    But it couldn't be Tahoma because of the comma bekow the K or the cedilla under the s didn't match.

    It couldn't be Segoe because of serifs on the I that it has, and it looked a bit like Segoe UI except the various dots are round there, not square like on the shirt.

    It couldn't be Humanist with its bigger downward curve in the lowercase a and the fact that its has no k with comma below.

    And it couldn't be Frutiger given the different look of the comma below the k and the cedilla under the s.

    I swear I thought it might be Calibri, but Carolyn set me straight -- the cedilla is different and the y has no extra tail.

    Cambria, Candra, and Century also had their own differences -- the whole thing felt like one of those puzzles where you have to find what is different about each picture!

    So here is the big attempt:

    And even now no one has identified the font yet....

    Does anyone out there have any thoughts on what font it might be? :-)

    In any case, I love the shirt, it is very cool.

     

    This post brought to you by ķ (U+0137, a.k.a. LATIN SMALL LETTER K WITH CEDILLA)

  • Sorting it all Out

    In Case there is a bug....

    • 1 Comments

    (Today's title has two possible meanings, thanks to the verbal Sargasso of unclarity that is English)

    Katy King is one of the testers over in the managed world (and sometimes contributor to the BCL Team Blog) who I manage to run across from time to time.

    The main reason is that she periodically finds results that seem unexpected or inconsistent, and she wants to ask if she is missing something.

    Now her track record is actually pretty good since pretty much all the issues she has raised are either messy things that are known and by design (but still messy) or actual bugs.

    So if she wanted to skip the step of sending mail to ask, she wouldn't be out of line. I mean I hope she still sends the mail, since having people interested is always nice and sometimes when they are known issue the conversations get fascinating; I'd hate to lose that. :-)

    Anyway, she found a doozy of an issue this time....

    The bug she found was that some the case pairs seemed to be reversed!

    Here are the UnicodeData.txt entries for the eight characters:

    1FC3;GREEK SMALL LETTER ETA WITH YPOGEGRAMMENI;Ll;0;L;03B7 0345;;;;N;;;1FCC;;1FCC
    1FCC;GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI;Lt;0;L;0397 0345;;;;N;;;;1FC3;

    1FF3;GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI;Ll;0;L;03C9 0345;;;;N;;;1FFC;;1FFC
    1FFC;GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI;Lt;0;L;03A9 0345;;;;N;;;;1FF3;

    2C65;LATIN SMALL LETTER A WITH STROKE;Ll;0;L;;;;;N;;;023A;;023A
    023A;LATIN CAPITAL LETTER A WITH STROKE;Lu;0;L;;;;;N;;;;2C65;

    2C66;LATIN SMALL LETTER T WITH DIAGONAL STROKE;Ll;0;L;;;;;N;;;023E;;023E
    023E;LATIN CAPITAL LETTER T WITH DIAGONAL STROKE;Lu;0;L;;;;;N;;;;2C66;

    The Vista uppercase table:

    0x1fcc 0x1fc3 ; GREEK LETTER ETA WITH PROSGEGRAMMENI
    0x1ffc 0x1ff3 ; GREEK LETTER OMEGA WITH PROSGEGRAMMENI

    0x023a 0x2c65 ; LATIN LETTER A WITH STROKE
    0x023e 0x2c66 ; LATIN LETTER T WITH DIAGONAL STROKE

    And the Vista Lowercase table:

    0x1fc3 0x1fcc ; GREEK LETTER ETA WITH PROSGEGRAMMENI
    0x1ff3 0x1ffc ; GREEK LETTER OMEGA WITH PROSGEGRAMMENI

    0x2c65 0x023a ; LATIN LETTER A WITH STROKE
    0x2c66 0x023e ; LATIN LETTER T WITH DIAGONAL STROKE

    So indeed these four pairs have uppercase characters where one would expect lowercase, and vice versa.

    Yick!

    The interesting question is whether to fix (and when, if the decision is made to fix).

    Now for the default behavior of the filesystem and the NT object namespace, which is case preserving as I talked about in this post and this other one, would actually not be affected, since the characters are still treated as equal in comparisons. And since none of the characters are in any code pages, there is no non-Unicode behavioir to worry about.

    But the problem comes in for the people who actually do conversions and then use the results for thing like case insensitive hashes -- which really happens. So the fix would really require something along the lines of an "opt-in" flag for these four case pairs.

    And how did this happen?

    (People really get in to Root Cause Analysis around here, or maybe that is just me!)

    Funny story, I suppose, bug this is really a bug in two parts!

    The four Greek script characters have been around since Unicode 1.1 but they were never included in the case table for some reason. When they were added, it was through an automated process that was erroneously making the assumption that uppercase comes before lowercase, which in this case it was not.

    And the four Latin script characters had the uppercase letters added in Unicode 4.1 and the lowercase letters added in Uncode 5.0, and it just looks like the distance between the upper and lower case letters confused things a bit. No even automated excuse, just plain old human error....

    I guess I can just console myself thinking about the fact that on the bright side, like over 230 other case pairs were added that weren't wrong. And people had really already been thinking about whether it made sense in the long run to add the notion of versioning to the case table, so now the issue looks like it may be forced appropriately!

    So all is not entirely lost.

    But clearly Katy deserves a raise, in any case. Keep 'em coming, Katy!

     

    This post brought to you by , , , , ⱥ, Ⱥ, ⱦ, and Ⱦ (U+1fc3U+1fcc, U+1ff3, U+1ffc, U+2c65, U+023a, U+2c66, and U+023e, a.k.a. GREEK SMALL LETTER ETA WITH YPOGEGRAMMENI, GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI, GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI, GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI, LATIN SMALL LETTER A WITH STROKE, LATIN CAPITAL LETTER A WITH STROKE, LATIN SMALL LETTER T WITH DIAGONAL STROKE, and LATIN CAPITAL LETTER T WITH DIAGONAL STROKE)

  • Sorting it all Out

    It doesn't always happen on the hour....

    • 6 Comments

    (Intentionally posted at the half hour mark. Subtle, huh?) 

    Jason's question of the group of people:

    Hello All,

    I tried using the Following Code which uses new Date ( ) in JScript: 

    <%
        var d = new Date();
        Response.Write(d);

        /*  d prints the local time as Tue Aug 14 21:55:46 UTC+5 2007 The machine
            is in  IST TimeZone which is UTC+0530 ..But UTC+5  is returned */

        var e = new Date(d);  //  I try to use new Date () again and pass the value d whic was used earlier

        Response.Write("<BR>");
        Response.Write(e);

        /* because UTC+5 is the Time Offset , e  prints Tue Aug 14 22:25:58 UTC+5 2007 (30 minutes more
           than the system's local time Pls see the pink text highlighted where 30 minutes difference is seen)*/
    %>

    When I try using New Date () for the first Time, I get the Time with UTC+5.Teh local Time of my machine which is set to IST which is 5.30 hours ahead of UTC. So I should get UTC+530 and not UTC+5.When I try to use new Date again, I get 30 minutes difference compared to my Local Time.

    When I try to change my Local Time setting to UTC+9 or UTC+10 or UTC+1 ...... or any natural number (i.e. UTC+1, UTC+2....UTC+12...), I don’t get this error since the value appended is already a whole number. When I try to set Time Zones involving Fractions like( UTC+530 ,UTC+430,UTC+330 ......), I get the UTC offset truncated to (UTC+5,UTC+4,UTC+3...). This undesired output format of Date gives lots of Errors in Storing Time.

    Kindly let me know any pointers on this Issue. Any links that can help will be highly appreciated. Thanks in advance.

    It turns out that this is actually a bug:

    This is a bug in the Jscript .NET implementation, which assumes the timezone offset to be in multiples of hours.

    The Jscript engine used by IE (and other hosts in Windows) do not have this bug.

    Given that all of the following time zones have offsets that are not on the hour (this is the XP latest list):

    • Afghanistan Standard Time (GMT +04:30)
    • AUS Central Standard Time (GMT +09:30)
    • Cen. Australia Standard Time (GMT +09:30)
    • India Standard Time (GMT +05:30)
    • Iran Standard Time (GMT +03:30)
    • Myanmar Standard Time (GMT +06:30)
    • Nepal Standard Time (GMT +05:45)
    • Newfoundland Standard Time (GMT -03:30)
    • Sri Lanka Standard Time (GMT +05:30)

    A rounding problem? Storing the value in an int? Who can say for sure? It does make one miss good old JScript just a little bit....

    Yet another example of how easy it is to have bugs when the test cases don't cover all of the different time zones. :-(

     

    This post brought to you by (U+0f03, a.k.a. TIBETAN MAL GTER YIG MGO -UM GTER TSHEG MA)

  • Sorting it all Out

    It is true that your LCID sucks, but your LANGID sucks more

    • 7 Comments

    Sometimes when working on software projects we spend so much time thinking a certain way that thinking any other way just does not occur to us.

    Like the other day when someone sent me some email asking:

    Hi Michael,

    I saw your name on MSDN regarding GetLocaleInfo() and I hope you could help me with my question.

    ...I need to convert an ISO 639 code (extracted from a stream) to a LANGID in Windows. I know GetLocaleInfoEx could provide me with LANGID or ISO 639 code with LOCALE_ILANGUAGE and LOCALE_SISO639LANGNAME2 if  I give it a locale name. Since I need to convert an ISO 639 code to LANGID, do you have any suggestion as to how to get a locale name from an ISO 639 code or is there any other way to do it?

    Thanks,

    On MSDN? Moi?

    Oh yeah, its that Search and ye shall find, SIAO style! thing. Kind of raises a ruckus with that whole blog as an official source of info thing that was mentioned before, huh? :-)

    Okay, let's move past that issue for now. I may have some thoughts to share another day....

    Onward.

    Let's take a moment and think about the request here (getting LANGID values from ISO-639 codes), and you may know where I am going with this.....

    It starts with the whole Your LCID sucks issue. You don't need a mathematical proof to cover the fact that if LCIDs suck, then the subsets of LCIDs known as LANGIDs suck worse. :-)

    One could obviously think of all kinds of clever ways to take ISO 639 names and convert them to locale names which are in part made up of ISO 639 names anyway.

    But is that the right thing to do?

    Office built a huge architecture off of locales for its proofing tools, and Windows did for its input methods. They did it with LCIDs, sure. And it true Your LCID sucks style they have to take steps to fix that (like input methods have started to do in Vista).

    But is the answer to just always move to locale names, still staying with locales in every case?

    Locales have their own data to perform certain tasks, sure. Does this mean that any project that is taking an ISO-639 language name (which as of the latest update can be over 6000 different items!) and try to shoehorn it into a list of locales that is limited to a mere 208 choices?

    The 300 pound man who jumps off a trapeze pole into a cup of water can be a fun way to draw a cartoon, but is it a way to write software? :-)

    Now in situations where you need data from a locale then sure, limiting yourself to that list can be important. But if you don't specifically need the data, then why let locales hold you back.

    I can't speak for locales, but I am pretty sure that if they could those little LCID values would be proud to see that you have grown past them....

    The key thing to ask yourself is whether you need the locale. When you think about it, this whole issue is really just a generalization of the problems with CurrentUICulture I talked about in two parts. It is simply bad engineering to require the support of the whole huge model of the locale support in Windows or the .NET Framework when you really need nothing more than an identifier or at most a name string. It's time to start thinking about how to lighten the load, people!

     

    This post brought to you by (U+104f, a.k.a. MYANMAR SYMBOL GENITIVE)

  • Sorting it all Out

    Not the most sensible post to riff on, but we do deal with GD here at SIAO

    • 0 Comments

    It may not be the most sensible post for me to riff on, but it is the one that got me thinking about the issue, so into the breach I go.... 

    I did almost lose a keyboard since I had just taken a big sip of Limonata before looking over at the monitor when out of nowhere Chris Pirillo's Viagra vs. Cialis post popped up in FeedDemon.

    And it succeeded in catching my attention, if nothing else. Which was his stated goal, so well done there!

    My first thought was that the spammers had perfected the "spam post" technology and put a whole post up full of links -- since he uses Google for ads, in a way they did. :-)

    But I thought I'd try to look a bit deeper here....

    Because whether the conversation is comfortable or not, we are talking about GD (globalization dysfunction) here in this blog.

    There are many "cures" for it, but finding the right one for the circumstances can be a challenge.

    When dealing with two technologies that are mostly the same but have some sometimes subtle differences that really can drive a software developer toward one or the other, it may seem like trying to choose between two drugs that everybody knows something about (even if not quite so many of us know the differences between them).

    And I often get those questions when people have to choose between Win32 and MLang, between CultureInfo or GetLocaleInfo, between LCIDs and locale names, and so on.

    It's easy to be a cheerleader and say rah! rah! use the new stuff! and tell them what to choose. But that isn't helpful.

    I mean usually I might suggest Win32 over MLang, but what if we're talking about code page detection?

    I might usually suggest GetLocaleInfo (since I am on the Windows team) but what if we're talking about a managed code project?

    And I might usually suggest locale names, but what if you're just using LOCALE_USER_DEFAULT, or what if you are running downlevel?

    The real answer to the question "which technology should I use?" is really complicated -- pushing it into a sound bite Hillel style (where he told the heathen who wanted to learn the whole Torah while standing on one foot "What is hateful to you, do not do to your neighbor: that is the whole Torah while the rest is commentary; go and learn it.") might have some visceral appeal, but that usually just isn't going to work.

    And not just because I am no Hillel.

    (In fact, after talking to like three or four folks in Office and SQL Server over the last few weeks who had never known me before but were told to contact me for some help or answers by their colleagues, it is very clear that I have developed something of a reputation at Microsoft, and if one had to choose a rabbi it would be closest to in description, I am clearly closer in reputation to a Shammai than a Hillel, though in fairness I would never chase someone off with my builder's cubitcane or one iron for asking me a question!)

    But like I said, it is not just because of that.

    I really need to know what the actual requirement is -- what one wants the code to do.

    Blindly saying "use managed code" or "use names, not LCIDs" without any consideration for needs may not make one a complete and utter moron, but it does not make one a genius, by any means.

    And without knowing what you were doing, there is no way to answer the question. If there were, then I would have only needed like maybe 20 posts in this blog instead of the tangled mass it has become!

    So, just as you should probably go to the doctor if you needed an answer to the whole erectile dysfunction question and whether to use Viagra or Cialis, coming somewhere like here for questions related to globalization dysfunction just makes sense.

    Especially with Dr. International in such a non-responsive state (in a coma since February of 2006!). :-)

     

    This post brought to you by (U+2206, a.k.a. INCREMENT)

  • Sorting it all Out

    Additional personal speculation on the Vista MUI SKU story

    • 10 Comments

    I was watching Oliver Stone's Wall Street last night and there was a bit that popped out at me. It was the scene where Bud Fox (Charlie Sheen) was explaining his plan to bring the airline to profitability:

    Thank you, Gordon.
    First, I want you all to know that my door will always be open...
    ...because I know from my dad it's you guys that keep Bluestar flying.
    What I've come up with here is a basic three-point plan.
    One: we modernize. Our computer software is dog shit. We update it.
    We squeeze every dollar out of each mile flown.
    Don't sell a seat to a guy for 79 bucks when he's willing to pay 379.
    Effective inventory management will increase our load factor by 5-20%.
    That translates to approximately $50-200 million in revenues.
    The point being,we can beat the majors at a price war.

    That bit in red is the sort of thing that everyone can understand, it is just the way businesses work. It is a simple principle that can be expanded out into any business that is either successful, or that could have been [more] successful had they paid more attention to the principle.

    Anyway, please keep this principle in mind, as you read this blog post.

    At the risk of repeating myself, I feel that I must point out that this post is inspired by my personal opinions. Anyone who quotes this post with an "According to Microsoft..." (who I cannot speak for) is a completely and utter moronic wingnut who is not paying close attention to what is going on around them.

    I am going to talk about MUI (Multilingual User Interface).

    Like a "Google 20% time" project on steroids, the success of MUI was not really foreseen by the highest levels of management that approved it -- the biggest push was I believe for some of the cost saving aspects that the technology would in the long term bring to the overall localization process.

    Perhaps people had hopes that some of the huge scenarios like large multinational companies buying into it would turn into such big areas, but I honestly doubt it really hit any kind of radar as a possibility until those huge customers started clamoring for it.

    And I know there were huge scenarios that were missed like the Terminal Server scenario with one server having all the languages installed where people would log in with their preferred language choice -- I know this because the language counts in server were always lower than client until a huge clamoring to get those client resources on Terminal Servers came up. People at MS were just caught unawares, the full potential of the idea had just not occurred to everyone beforehand.

    They say that the best thing that can ever happen to your feature is for the executive staff to notice it, and MUI was no exception here -- as lot of the hacks that had to be put in to make it work in Windows 2000 were allowed to be replaced over the next few releases with a more solid implementation in the resource loader, and the division wide push for a language neutral Vista was able to made, as well as the updates to the entire model for localization in Windows.

    Of course they also say that the worst thing that can ever happen to your feature is for the executive staff to notice it, and MUI was no exception here, either.

    Because almost from the very beginning, pretty much as soon as the potential was realized based on the clamoring of those big customers, the licensing model for MUI has always been constructed with the assumption that it is a "big version" feature.

    And this was true long before the The Vista MUI SKU story spelled it out so baldly.

    I have dozens of messages from the Contact link along the lines of these examples:

    Can you please tell me where can I download a Vista English LIP? (No, not for Vista Ultimate). I have a Japanese Vista Home and was looking everywhere for weeks to find an English LIP so I can operate the new laptop. Bought it in Japan so it's in Japanese. Bummer...

    questions for you.
    I need to test something on localized versions of vista home(German, French, Japanese, Korean).
    do I have to buy four different versions of Vista or can I just buy one and use that for all these languages.
    I remember hearing that Vista is built with language neutral so you can just change your locale settings to turn your English Vista to some localized(fully) version of Vista. Is that correct?

    Michael, how can I get MUI for Vista Home Basic? I am not looking for a handout and I'll pay for the extra languages I need. But I can't believe there is no way to get such a thing for two-language households. I can't believe we really have to buy the version that enables every language.

    How do I get English on Spanish Vista Home?

    How do I get Spanish added to my English Vista Home Basic for my grandfather? His English is not so good.

    I hate to comment bile, but I just have to say that it's really lame that you have to buy the "Ultimate" version of windows to get a feature that is more than 80% complete. Especially when we are talking not about some value-added entertainment option, but rather just to get language packs for an international product. Isn't that just saying "Here's the 80% finished version of Windows. If you want the really finished version, you've gotta pay extra."
    Since 2000, I think that Microsoft has made some great strides in improving their image, and they did it the honest way -- by opening lines of comunication, genuinely listening to their customers, and acting on what they learned. That said, I think that their marketing of the Vista SKUs was a setback, and it's a real shame because Vista is a great product.

    There was an ad for HP with an English and Spanish machine low end, but I can't find it now. I wonder how hispanics are supposed to afford a bilingual machine for grandma or whatever. It should be $10 for Spanish, not $200.

    Hey...what's the word from the inside on how folks with no money (developing countries) can get a copy of Vista that IS NOT ultimate but can still get language packs? Seems stupid to limit packs to Ultimate, no?
    Otherwise Windows is dead outside of "countries with White Folks."

    Wasn't the original word that all languages would be available on all SKUs in Vista? What's the point in depriving Slovakia their own freaking localized version...they can't afford Vista. (I assume you agree with all I'm saying)...what's the business justification and what are the chance it will open up?

    And so on -- like I said, I have a ton of them. And others in support and other customer connection situations have lots more. It is clear that there is a scenario here.

    I am sure that all of the people asking would probably love to get something here for free but most everyone expects if they get the feature that they would have to pay something. It is just that no one wants or expects to have to buy the most expensive consumer version for a more gated/scoped scenario of just a couple of languages. If there was a feature up on http://shop.microsoft.com to add a language for some amount that was smaller than the $200 difference for the full version and the $159 difference for the upgrade, I think these language packs would sell like crazy because most people aren't willing to but the fuller version if they don't need all the functionality behind it.

    However, it is clear that the SKUs are not set up that way -- they are set up assuming that everyone will go with Ultimate.

    But then, they are also (as I said before) really thinking most about other scenarios that are not this one.

    It does not make sense to complain too much, as the situation is better: prior to Vista, you had to set up a SELECT agreement which had a minimum requirement of six licenses for Windows. And now you just have to buy the one more expensive version.

    But with all of the people clamoring, it is really hard to believe that we (Microsoft) as a company are not missing an additional group of scenarios here.

    Of course the places where these decisions are made are far outside of where I work, or really anyone in Windows International. In this case the connections we have here are that

    • some of the people in the group worked hard to enable and extend the functionality;
    • lots more have spoken with affected customers
    • many more have spoken with potentially affected customers

    So I am interested -- and we are interested. The trick is that the case has to be made to the people who make the decisions and the case has to make sense in terms of the additional, unanticipated by the current plan scenario(s), since it isn't up to the people who are hearing so much about the missing/undersupported cases....

     

    This port brought to you by $ (U+0024, a.k.a. DOLLAR SIGN)

  • Sorting it all Out

    Finding lost friends and (for something completely different) sites that suck

    • 1 Comments

    I found an old friend the other day. Not in person or anything, I found her updated blog.

    You see she use to be at http://www.laurahcomputing.com/ with a feed at http://www.laurahcomputing.com/atom.xml and then she moved to http://www.shutuplaura.com/ ages ago.

    No worries, I found it just by looking at the old site since it redirected. But the old RSS link redirects to http://www.shutuplaura.com/atom.xml rather than the right link at http://www.shutuplaura.com/journal/atom.xml (I just mistakenly assumed she stopped blogging!).

    I meant to look into it, but truly I've been busy. Sorry Laura!

    Of course she hasn't written anything in a while so this might be an old blog, too....

    Anyway, I was catching up on posts and then I saw her most recent post from February of this year entitled Drowning in DST, which talked about the note in KB 930879 that said:

    "Note Do not confuse the Outlook tool installer package that is named TZMove.exe with the actual Outlook tool executable file that is also named TZMove.exe."

    After seeing her rant on the topic, I almost felt bad that the install package for Microsoft Keyboard Layout Creator is entitled MSKLC.EXE, as is the main executable....

    Luckily in my case, it is not a tool you would ever move around in ways that you might be likely to confuse the two. We have never had to put together a KB article on the topic, despite all those downloads. :-)


    On a separate note and by the way, if you are reading this post from any of the unauthorized mirrors like Noticias externas (http://geeks.ms/blogs/) or .NET Blogs (http://vsqa.net/blogs/bloggerads) or MSDN Blog Postings (http://msdnrss.thecoderblogs.com) then be aware that these sites SUCK. They all strip out the identity of the authors of the actual blog postings (replacing it with the name of the communal feed) and just leave a link to the original post that does not describe the source and sends a ping/trackback notification for it, something the communal feed for the server at http://blogs.msdn.com/ does not.

    A trackback to them serves no useful purpose other thsn to artifucially drive traffic from the original site. I do not begrudge them the "stolen" content (this is the Internet, after all), but they should TURN OFF THE FREAKING TRACKBACKS! The fact that they publish them is just lame.

    The other sites that annoy me are the ones that track mentions of random words like L   A   O   S or T   I   M   O   R   -   L   E   S   T   E (obfuscated to avoid being pointed to again) and add my post to their feed, sending repeat trackback spam for every new post they find until mine scrolls.

    I find all of this to be really annoying and I have to apologize to the people I link to who may end up with link spam if their filters do not know to remove this sludge from the trackbacks. As this site does sometimes.

     (I am trying to get them to consider blocking them always and maybe do more than that but I won't hold my breath on that)

    I was considering whether it makes sense to add a small warning akin to this one to every post I do until/unless they cut this crap out. But I doubt people read them anyway, and they probably aren't listening, so the warning wiuld just clutter my posts and even this rant is most likely just a waste of time (after reading Scott Hanselman's post I'm convinced it may well be a complete waste of time. Maybe I have to just turn off trackbacks (which solves the problem for me, but not the people I link to). Sigh....).

    All of these pure republisher sites with trackbacks turned on suck, completely and utterly. If they make money off of ads too, then they both suck and are evil, as well.


     

    This post brought to you by a (U+0061, a.k.a. LATIN SMALL LETTER A)

  • Sorting it all Out

    Who are the heirs of Bernard R. Miller? (aka U+2323 when you say that!)

    • 8 Comments

    I am embarrassed by the Heinlein allusion in the title but then I am embarrassed by the whole post so consider this me trying to work outside my comfort zone....

    As more and more of the genuine work of Unicode (trying to encode all the scripts of the world, past and present) is accomplished, a great deal of the work to do in the way of new scripts starts to slow down.

    Then, as I pointed out in Fictional could make things less functional, there are entities that start to feel that "Unicode may not be done, but it is done enough for us." Those are the companies that were in it to get their particular needs met within a character encoding standard, and it is a perfectly valid approach.

    Other entities actually care about the remaining scripts and want to see that work done. Organizations like the Script Encoding Initiative diligently work to try close that gap, though the pace is not going to set any speed records. This is hard work, and it is work for specialists, at least for most of the work to get proposals together.

    I often wondered, in looking at many of the doctoral dissertations of linguists, who are in such a wonderful position to help provide this information for new scripts, why none of them are engaged directly. I mean, take this bit of Kieran's post about how she ended up at Microsoft:

    A really great dissertation is read by maybe 100, 200 if we're being generous, people. Maybe it is of use to 10 or 15% of them. And that's a really, really great one.

    (I know she was making a slightly different point about a different kind of linguistics dissertation but I am pretty sure the numbers are on the same order of magnitude here, too.)

    Why don't more of them take that work to the next step and get that language into Unicode, taking genuine steps to do work that will be seen by a lot more people and will be hugely appreciated in this world where the proposals aren't getting produced fast enough because the people qualified to do it are not a huge group?

    But that's an aside, I'll talk more about that another day (though if there are any linguists working on their dissertation about a language written in an as-yet-unencoded script, they should feel free to contact me and maybe we can work on a proposal to get that script in Unicode -- a true cherry on top of a Ph.D. sundae knowing that the work will be of tremendous benefit to a lot more than the small numbers quoted above! :-)

    A third thing that some people involved in Unicode will find themselves doing is trying to perfect what is currently encoded. This can have bad consequences on existing implementations so it probably a good thing that after the Bidi mirroring snafu that the pendulum will tend to swing the other way for a bit. The burned child fears the fire, after all!

    This group can still do lots of productive work, writing Unicode Technical Notes and generally working to improve their implementations. They are not a thumb twiddling sort and even if they aren't going to be able to change as many properties, they have plenty of work they can do.

    A fourth thing that some people will do is move their focus into other areas like the CLDR, which works to try and provide locale data. Obviously working on a whole new standard within the standard is a way to occupy one's time if one is staying in standards.

    There is also a fifth thing that is actually much like the fourth thing (basically people taking up new causes, new work items, new ways to keep busy), what I believe is the answer to the question raised in the title of this post -- the fact that there is now a Symbols Subcommittee, which according to this page:

    Discusses and makes recommendations about the encoding of symbols, such as wingdings, train schedule symbols, mobile phone symbols, etc.

    This may sound scary to you. It does to me. And not just because every single message sent to their mailing list is copied to the list for Unicode members, making me wonder why they even bother to have a mailing list as I receive two of each message (the mail system at unicode.org is not smart enough to avoid the duplicate sends!).

    And as several people have pointed out to me and on the list recently looking at the introduction to the Unicode Standard, 5.0 text:

    Note, however, that the Unicode Standard does not encode idiosyncratic, personal, novel, or private-use characters, nor does it encode logos or graphics. Graphologies unrelated to text, such as dance notations, are likewise outside the scope of the Unicode Standard. Font variants are explicitly not encoded. The Unicode Standard reserves 6,400 code points in the BMP for private use, which may be used to assign codes to characters not included in the repertoire of the Unicode Standard. Another 131,068 private-use code points are available outside the BMP, should 6,400 prove insufficient for particular applications.

    Keeping that text in mind, consider this: one of the big items that the Symbols Subcommittee is talking about right now is whether to encode the Emoji (絵文字), the symbols that are so popular in the Japanese wireless market. Suddenly, lots of characters previously rejected may be okay to encode now if some of these people get their way, and the worry about the obviously faddish nature of things like Emoji will come full circle when wireless operators claim they need Emoji in text streams so that they can document the Emoji they support.

    Thinking back to the innocent days of the contributions of William Overington and Bernard R. Miller, I cannot avoid a sense of deja vu in all this.

    We could call it the next Comet Circumflex system, or the new golden ligatures, or the courtyard codes used for numbers and chess pieces, or we could notice that there are specific symbols in Bytext (a link that it pains me to give, truly, but it is topical -- read this question from his FAQ if you doubt me) that look a lot like the symbols I am talking about and how hypocritical it feels to have told this person that Unicode is not built for what these people wanted to do only to later form a committee to talk about the same thing those people wanted to do.

    I thought all of those things were a bad idea then, and I still think so now, by the way.

    Ignoring all that, I have a hard time seeing myself either

    • spending real time reviewing proposals for random symbols that (according to at least this one survey) see their biggest use for faces and which people want to be able to do more with emotion symbols in the future (and the happy ones are most happy at how cute the pictures are, while the unhappy ones want more kinds of Emoji);
    • trusting that any group of people who are interested in producing such proposals are really going to not include a lot of crap like that Image:prince symbol.svg Prince symbol, which I could easily see being the kiind of symbol that would show up on some phone and thus need to be seriously considered (for the plain text representation of teenage girls in Japan talking about The Love Album).

    And there is also a proposal to encode the Japanese TV symbols used by ARIB, as well. The proposal was even written by a Microsoft employee.

    So I guess we're getting into symbols too, though weren't we anyway with our [probably also to be encoded] WingDings and WebDings fonts (which have their own problems in Microsoft products and have for years because they aren't encoded and the silly silly features in WordMail AutoCorrect!).

    It is a slippery slope that we head down here, and clearly I am not speaking for either Microsoft or Unicode when I say that I think it is a really bad road to be heading down.

     

    This post brought to you by (U+2639, a.k.a. WHITE FROWNING FACE)

  • Sorting it all Out

    You look so familiar; I think you're my type....

    • 5 Comments

    If you have Windows Vista or have the beta of Windows Server 2008, you know that font folder is pretty huge.

    Hell, if you are a regular reader of this blog then you might remember The fonts directory is freaking huge in Vista from October of last year and since you are a regular reader you might just trust me on this one, realizing I am not a Stephen Glass type who made up the size of the folder. :-)

    Now in that post I generally defended the humongoid font folder on the basis of the whole What isn't in the default install for NLS issue. I still stand by most of that.

    But for the record, I've been starring at a few fonts that seem to be 100% duplicates of each other.

    As far as I can tell:

    Angsana New ≈ Angsana UPC

    Browallia New ≈ BrowalliaUPC

    Cordia New ≈ Cordia UPC

    The only difference I can see between the UPC and the New versions is the name.

    These fonts were licensed from Unity Progress Co., Ltd., and according to this thread over on Typophile they aren't in the fonts biz any more. They just happen to be some quite identical looking faces....

    This is really a tempest in a teacup, I know (I mean, if 302,695,113 bytes in 388 files in your problem, then you may not believe that half of 2,357,260 bytes in 24 files is your problem, right? :-)

    Now of course there are these multiple names there, and it is not as simple as just taking three of them out since that would break any application or document using the names unlucky enough to not be picked.

    I wouldn't recommend deleting a bunch of files here!

    So what can be done in the long run?

    Well, we could use one of the many features like the previously mentioned font substitution. Or we could make three TTCs, each of which shared pretty much all of their contents but contained two fonts a piece. I like the .TTC idea myself, but then I'd like us to also stick the bold, italic, and bold italic versions in there, thus cutting down the number of actual files to 3, while still keeping 100% fidelity with the names. That just seems kine of cool to me. :-)

    Anyway, here is a bit of side by side for those who want to compare for themselves:

     I don't think this problem exists for the rest of the fonts (putting aside Arial vs. Helvetica stuff!).

    And those square blocks are a bunch of Thai vowels and tone marks stacked on top of each other. :-)

     

    This post brought to you by (U+0b5d, a.k.a. ORIYA LETTER RHA)

  • Sorting it all Out

    Some blog boggles and a few disclaimers

    • 7 Comments

    NOTHING technical, just self-indulgent meta-blogging crap, so feel free to skip if you aren't interested in the way my mind works; I know I'm not....

    I've been thinking about the whole blogging thing and how it fits into the universe, or, to be more precise, my universe.

    Which is not to say I own the universe or anything; I don't. But I have some of the same worries that Raymond has about people taking this blog to mean more than it does. I'm nowhere near as popular as he is, but it worries me anyway....

    I mean, there is my usual disclaimer text I have over in the margin (the first paragraph of which is clearly about 6000 times more serious than the second paragraph, which is spiced up from other time through the judicious theft of fun ideas from other people's blogs):

    Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers would at best shake their heads and sigh, at worst severely repudiate the content, should it ever manage to appear on their radar.

    Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Not labeled for individual sale. Objects in mirror are closer than they appear. Do not bend, fold, spindle or mutilate. Not for use on unexplained calf pain.

    And then sometimes I take it a step further and add a specific additional disclaimer, such as this one I used recently:

    Please also note that these are my personal opinions. Anyone who quotes this post with an "According to Microsoft..." (who I cannot speak for) or even worse "According to Adobe..." (who I definitely cannot speak for) is a completely and utter moronic wingnut.

    Now please note that I am not calling anyone who reads this blog (whether casually, regularly, or religiously) either a wingnut or a moron.

    The name calling was reserved for the people who might want to take something that is clearly not a statement of official policy and treat it like one.

    In the end, I decided these were my only choices: to not write the way I do (in which case I probably wouldn't have a blog), or just occasionally do some of this explicit disclaimer work so that in the end if everything goes to shit in some terrible scandal that my opinions are the center of, that I can claim a bit of a moral victory in the fact that the persons who quoted me did in fact misquote me (and was in fact showing either first class wankery or grade "A" wingnuttery, using The Wanker-Wingnut Continuum definitions).

    You know its funny; here is the OOO mail I used when I was out of the office due to TypeCon and UTC:

    At TypeCon2007 July 31-Aug-5, and at UTC Aug 6-10 (though the former is in Seattle and the latter is in Redmond  so I am not far away, and I will be doing the COSD Globalization training).

    I will be on email as often as I can manage, though delays should obviously be expected and not taken personally, unless perhaps you think such an approach would be deserved.

    It really has the same kind of approach as I want to take here -- light-hearted, not mean, but (recalling the disclaimer!) with the intent of a self-policing readership who can decide for themselves if they wish to be considered worthy or unworthy of pleasant thoughts.

    The goal here is to be me, but moreso. But no so much moreso that people would think less of me, if you know what I mean....

     

    Characters in Unicode don't really have time to sponsor this kind of post; the not-encoded ones were interested, but there was no practical way to represent them!

  • Sorting it all Out

    Normalize Wide Shut

    • 2 Comments

    (Apologies to Stanley Kubrick, of course!) 

    It was almost the very first blog post I ever wrote, back in end of November 2004, entitled Normalization and Microsoft -- whats the story?

    In it I mentioned that during my time at Microsoft I really had heard of four different uses of the word normalization.

    Well, the day before yesterday I heard of another.

    It seems that the ManagementAgent class has a NormalizeString method which (according to MSDN):

    The NormalizeString(String) method enables the user to normalize case and accent in a string according to the setting for that particular MA. By calling this function, the user can normalize the string according to the connected directory format during provisioning. As a result, when the management agent string is imported back to the connector space from the target directory, the string value imported will be the same as what was written to connector space at provisioning time, allowing confirmation of the export.

    As long as I am being all quote happy I'll include the remarks:

    Certain directories, such as RACF, TopSecret, or ACF2 change text strings that are imported into the directory to remove accents from text characters or to make the text all upper case. When data from the management agent is imported back into the connector space, the string in connector space is not the same as what was staged for export, since the directory will have modified the value. When you use this method, you can set case rules on the string, which makes the string all upper case, or accent rules, which removes accent characters from the string. Since normalization is carried out only in outbound synchronization, setting the initial value of an attribute in CS where the configuration specifies normalizing both case and accents would yield the following:

    • MV attribute value is café.
    • On initial attribute setting or during EAF, use of NormalizeString transforms the value (assuming both case and accents are configured in the UI) to CAFE.
    • CS attribute is export to RACF as CAFE.
    • At a later stage, the CAFE is imported, and the unconfirmed change is cleared.

    MV Data flow CS uses NormalizeString Data flow Connected directory
    café

    ———>

    Initial attribute value (provisioning) Or EAF

    String is normalized

    café<——>CAFE

    ——>

    Export

    CAFE

    RACF Directory

    café ———>

    Exported data is re-imported

    CAFE<——>CAFE

    <——

    Stage

    CAFE

    RACF Directory

    This method is used on the Extensible Management Agent and the XML Management Agent. The management agents can be call-based or file-based.

    Once you have created the management agent that contains this method, you must use the Identity Manager UI to set the options for how the string should be normalized. In Identity Manager, you will need to create a new management agent. In the Create Management Agent UI, on the Configure Connection Information page, the management agents can be set as call-based or file-based. You must select the Import and Export radio button as the step type. On the Configure Attributes page, there are two check boxes: Upper Case and Remove Accents. Select one or both options.

    It is actually a little known fact, but if you call LCMapString or LCMapStringEx with nothing but NORM_IGNORENONSPACE, you can actually see it did a bit of diacritic stripping (it will not do as complete of a job as my prior posts on this subject get into across all versions of the .NET Framework and Windows, but it was a step long before my blog existed. Hell, before .NET even existed as a concept. It never occurred to me to call it that way....

    Anyway, there is also the Aux.NormalizeString method that is part of the Team Foundation Server SDK. Though with documentation that does not explain what it does and even goes so far as to directly say "This method supports the .NET Framework infrastructure and is not intended to be used directly from your code.", who knows what to expect!

    So feel free to think of Aux.NormalizeString method as the "mystery meat" of the normalization world, and don't spend too much time looking directly at it. :-)

    But anyway, this fifth Microsoft meaning to "normalization" that I honestly didn't think I got until I read all of the text in the topic and even after that was not 100% sure I got it, plus the sixth method that is a real mystery, should be added to the list. If nothing else it will add to the challenge of searching for that NormalizeString function on MSDN. :-)

     

    This post brought to you by ƕ (U+0195, a.k.a. LATIN SMALL LETTER HV)

  • Sorting it all Out

    Ever look in the mirror[ing] and not like what you see?

    • 1 Comments

    From the TV show Angel, the episode entitled Epiphany:

    Angel: I'm still not sure what happened...
    Lorne: What's not to understand? You think you're the first guy who ever rolled over, saw what was lying next to him, and went GYAAAAH!?

    Okay, my challenge now is to appear to tie THAT quote in while veering back into the appropriate. Hopefully I do well.

    Here goes....

    For a long time now, there are implementers of Unicode who really feel the need to stop tweaking existing property values in the Unicode Standard since the changes can have a huge impact on their implementations and on their customers, changes that are not always well understood until it is too late.

    Opposing those people are the implementers of Unicode who notice discrepancies or inconsistencies and in the interests of trying to make values more understandable and consistent try from time to time (and UTC meeting to UTC meeting) to tweak those properties.

    You can see the conflict here:

    The second group says "It is messy how it is now."

    And the first group says "Life can be messy, let's leave it alone."

    So the second group says "This is a bug we should fix. what's the harm?"

    And the first group says "I don't have what is bad about it right in front of me but when we do this sort of thing stuff breaks."

    So the second group says "Well, let's just make it a Public Review Issue and then we'll know if anyone feels like we have broken them, and we can reconsider."

    Then that first group, with no convincing thing to say against that argument, reluctantly concurs.

    Then, a few months later, we are back in the UTC and there is little feedback against it and minimal feedback at all. With no reason to avoid the change, no one really has much to argue against. So it gets into the standard.

    And then, half a year later, someone stands up, points out that the Emperor was wearing no clothes those day, and we have a corrigendum.

    This time around there were two Public Review Issues: #80 which talks about (among other things) changing the Bidi_-Mirrored property on some characters including QUOTE characters, and #91 which is the update UAX #9 itself. Yada yada yada. and the change made it in to Unicode 5.0.

    Luckily Jonathan Kew produced a document that was discusses in this recent UTC meeting which pointed out all of the things this broke. A corrigendum has been issued, which you can read about in Corrigendum #6: Bidi Mirroring. The data changes listed there are:

    Changes to Bidi Mirroring

    When this corrigendum is applied to Unicode 5.0.0, the Bidi_Mirrored property of the characters 2018..201F and 301D..301F is changed to "false" and their Bidi_Mirroring_Glyph is adjusted accordingly. Make the following changes to data files:

    1. Change the 11 lines in UnicodeData.txt which define properties for these characters to have the following contents:

        2018;LEFT SINGLE QUOTATION MARK;Pi;0;ON;;;;;N;SINGLE TURNED COMMA QUOTATION MARK;;;;
        2019;RIGHT SINGLE QUOTATION MARK;Pf;0;ON;;;;;N;SINGLE COMMA QUOTATION MARK;;;;
        201A;SINGLE LOW-9 QUOTATION MARK;Ps;0;ON;;;;;N;LOW SINGLE COMMA QUOTATION MARK;;;;
        201B;SINGLE HIGH-REVERSED-9 QUOTATION MARK;Pi;0;ON;;;;;N;SINGLE REVERSED COMMA QUOTATION MARK;;;;
        201C;LEFT DOUBLE QUOTATION MARK;Pi;0;ON;;;;;N;DOUBLE TURNED COMMA QUOTATION MARK;;;;
        201D;RIGHT DOUBLE QUOTATION MARK;Pf;0;ON;;;;;N;DOUBLE COMMA QUOTATION MARK;;;;
        201E;DOUBLE LOW-9 QUOTATION MARK;Ps;0;ON;;;;;N;LOW DOUBLE COMMA QUOTATION MARK;;;;
        201F;DOUBLE HIGH-REVERSED-9 QUOTATION MARK;Pi;0;ON;;;;;N;DOUBLE REVERSED COMMA QUOTATION MARK;;;;
        301D;REVERSED DOUBLE PRIME QUOTATION MARK;Ps;0;ON;;;;;N;;;;;
        301E;DOUBLE PRIME QUOTATION MARK;Pe;0;ON;;;;;N;;;;;
        301F;LOW DOUBLE PRIME QUOTATION MARK;Pe;0;ON;;;;;N;;;;;

    2. Remove the following 9 lines from DerivedBinaryProperties.txt, and change the count at the bottom of the file accordingly from 537 to 526:

        2018 ; Bidi_Mirrored # Pi LEFT SINGLE QUOTATION MARK
        2019 ; Bidi_Mirrored # Pf RIGHT SINGLE QUOTATION MARK
        201A ; Bidi_Mirrored # Ps SINGLE LOW-9 QUOTATION MARK
        201B..201C ; Bidi_Mirrored # Pi [2] SINGLE HIGH-REVERSED-9 QUOTATION MARK..LEFT DOUBLE QUOTATION MARK
        201D ; Bidi_Mirrored # Pf RIGHT DOUBLE QUOTATION MARK
        201E ; Bidi_Mirrored # Ps DOUBLE LOW-9 QUOTATION MARK
        201F ; Bidi_Mirrored # Pi DOUBLE HIGH-REVERSED-9 QUOTATION MARK
    
        301D ; Bidi_Mirrored # Ps REVERSED DOUBLE PRIME QUOTATION MARK
        301E..301F ; Bidi_Mirrored # Pe [2] DOUBLE PRIME QUOTATION MARK..LOW DOUBLE PRIME QUOTATION MARK
    
        # Total code points: 526

    3. Remove the following 11 lines from BidiMirroring.txt:

        2018; 2019 # [BEST FIT] LEFT SINGLE QUOTATION MARK
        2019; 2018 # [BEST FIT] RIGHT SINGLE QUOTATION MARK
    
        # 201A; SINGLE LOW-9 QUOTATION MARK
        # 201B; SINGLE HIGH-REVERSED-9 QUOTATION MARK
    
        201C; 201D # [BEST FIT] LEFT DOUBLE QUOTATION MARK
        201D; 201C # [BEST FIT] RIGHT DOUBLE QUOTATION MARK
    
        # 201E; DOUBLE LOW-9 QUOTATION MARK
        # 201F; DOUBLE HIGH-REVERSED-9 QUOTATION MARK
    
        301D; 301E # REVERSED DOUBLE PRIME QUOTATION MARK
        301E; 301D # DOUBLE PRIME QUOTATION MARK
    
        # 301F; LOW DOUBLE PRIME QUOTATION MARK

    For more info (and soon for the link to updated data files for those who need them), see the link to the Corrigendum. 

    The wider issue here goes back to the original group #1 and group #2 I first talked about, and the dynamics which will likely need to start changing there in order for big companies (like Microsoft and others) to be able/willing to pick up changes more quickly.

    Obviously staying multiple versions behind is the safest thing to do in products but not the most helpful for the standard or frankly for customers, but a better balance has to be reached when issues like this come up. The changes to the dynamic in the UTC need to be fewer changes with a common default being no changes without boith significant problems from implementers affecting customers, significant discussion, and then significant feedback when the PRI is issued....

     

    This post brought to you by (U+2018, a.k.a. LEFT SINGLE QUOTATION MARK)

  • Sorting it all Out

    Alive and well, but more grown up

    • 0 Comments

    So I was watching that video about the demolition of building 100 at Microsoft. You may have seen it if you work for Microsoft....

    Anyway, they talked about all of the new buildings that were coming, and they showed all of the graffiti and damage that employees were inflicting on the building since it was going to be destroyed.

    Good clean family fun.

    There was this one part with Lisa Brummel, and her quote was something like this (in part):

    When groups can come and do crazy, zany things to celebrate a building coming down, you know we're still alive and well.

    It struck me funny. Attend me for a moment....

    Gentle reader,

    I'm not going to get all minimsft on you, and I personally don't think MS culture is dead at all, and I like some of th stuff that Lisa is doing.

    But with that said....

    I have a hard time seeing this incident as proof of that lack of thanatosis.

    Maybe it is just me, but sanctioned vandalism of an about-to-be-demolished building seems kind of an imitation of an older MS culture.

    Does it hearken back to nameless (well, I am not going to name them. You probably know who they are!) employees who punched or kicked holes in walls when they were very angry, damage that I assumed had to be absorbed by the facilities budget?

    Sure, it does.

    And imitation is the sincerest form of flattery, sure.

    But no one is punching holes in walls today, at least not that I have heard of.

    We still have culture at Microsoft. But I am reasonably certain that if I punched a hole in the wall or intentionally drove my scooter through something to punctuate a particular issue that I'd be paying for it (one way or another), and I might lose my job, too.

    (Ignore the time I almost ran over Julie; the GM loophole has been plugged, and that won't be happening again. Plus, that wasn't on purpose.

    Now if someone who was actually important did a stunt like kicking a hole in some drywall, I'm pretty sure there would still be some consequences, even if they stayed....

    So rather than proof that the type of culture (or lack thereof) that led to that kind of behavior is still alive and well, the sanctioned vandalism of building 100 done under the eyes of the senior VP in charge of HR, this "approved shooting of a corpse" that it was, seems like a fitting memorial to that old "culture."

    Like maybe we have grown up a little bit.

    Now, had they let the vandalism happen months before the building was to be torn down, amd then people were still working in the building for a few months, maybe it would have been a different kind of symbol. Maybe it would have looked wilder and zanier and crazier.

    But messing up the crap you're throwing away? That has to be the most tame kind of cutting loose there is.

    Maybe other people see this differently.

    Speaking personally, I am proud of the fact that it has been almost two and half decades since I have punched a hole in anything (it was a door).

    I feel like I have grown up since; I certainly think I have a bit more culture than I did back then.

    Maybe Microsoft can say the same here. On both counts. :-)

     

    This post brought to you by 𝍓 and 𝍏 (U+1d353 and U+1d34f, a.k.a. TETRAGRAM FOR ON THE VERGE and TETRAGRAM FOR CLOSURE)

Page 2 of 5 (61 items) 12345