Blog - Title

August, 2008

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    The fault is ~60% functionality, ~40% documentation

    • 0 Comments

    The question in the email was:

    Hi,

    I am trying to debug an issue... and was wondering if you could explain or elaborate on the behavior I am seeing from

        GetLcidFromRfc1766() with the string “ja-JP”

    it is returning S_false, indicating that only the primary langauge tag was matched.  The lcid that is returned is 1041. However,  According to the windows language code identifier reference this lcid corresponds to ja-jp which has both a primary and a subtag.  Is it just a coincidence that the correct lcid is being returned? 

    Ah, where to begin? :-)

    I suppose we can start with GetLcidFromRfc1766 and its unusual return value semantic:

    Returns one of the following values.

    S_OK Success.
    S_FALSE The returned LCID matches only the primary language of the RFC1766-conforming name.
    E_FAIL The method cannot get the information from the database or the system.
    E_INVALIDARG One or more of the arguments are invalid.

    I think I have made my feelings about "creative" uses of return values clear in blogs like API Consistency and Developer Comfort and When the documentation is confusing, it is often because the functionality is, too (especially the latter one which talks about the creative return value semantic of NormalizeString).

    Now of course in addition to overloading an HRESULT such that both S_OK and S_FALSE are both meaningful, we have an additional problem, this one in the documentation.

    After all, doesn't the text about "only the primary language of the RFC1766-conforming name" kind of imply it is talking about PRIMARYLANGID stuff?

    This is what the person asking the original question was kind of thinking it meant. He was thinking that the S_FALSE implied a return of something like 0x11 (LANG_JAPANESE) when one passes "ja".

    Which is why ja-jp getting S_FALSE does seem a little off!

    Now this is an entirely reasonable assumption one could make given the normal and reasonable expectation of consistent terminology in documentation.

    However, that is not what it means here.

    If you saw The weird, weird world of the SUBLANGID from way back when, you'll have the hint -- the documentation is thinking of a LANGID/LCID value constructable with SUBLANG_DEFAULT as being the "primary" language tag.

    Though if you read The weird, weird world of the SUBLANGIDyou'll see whay that is a bad idea. I mean, given the lengths the definitions try to go to make the claim that when there are multiple SUBLANGID values defined for a given PRIMARYLANGID that the first one is arbitrary and is no more privileged than later ones. You know, that Microsoft is making no claim of

    MAKELANGID(LANG_PORTUGUESE, SUBLANG_PORTUGUESE_BRAZILIAN)

    being more special or more important than

    MAKELANGID(LANG_PORTUGUESE, SUBLANG_PORTUGUESE)

    just because it is the first one.

    Though if you think about it, as bad a the documentation is here, they are just documenting the [flawed] functionality.

    So it is still mostly not the fault of the documentation. :-)

    This blog brought to you by(U+104c, aka MYANMAR SYMBOL LOCATIVE)

  • Sorting it all Out

    What's in a name?

    • 13 Comments

    One of the core tenets of globalization and localizability of software is that making assumptions in formatting information will lead to bugs and limitations that will keep people in other cultures from properly using the software.

    There are two sides to this.

    On the globalization side, there is (for example) the formatting of numbers, dates, and times. There is the sorting of lists, and so on.

    On the localizability side, there is (for example) assumptions about word order in inserts that would violate the grammar of the target language (leading in many cases to grammatically poor sentences in the target language in order to accommodate the badly placed inserts).

    Then there are examples that actually span both globalization and localizability, like the names of people.

    I can't imagine what people do when they have to enter their name in an online form that insists on a name that is made of a single word first name, possibly a middle initial, and a single word last name -- none containing any punctuation.

    Right in Windows International we have many examples of names that violate such simplistic rules (rules which, though easing the complexity of software development and database storage, blithely ignores the reality of names throughout the world).

    Take for example Group Manager Jan Roelof Falkena.

    His last name is Falkena.

    Now in Jan Roelof's own words, "The use of double names (without hyphens) is fairly common back home."

    Thus his first name is not merely Jan, any more than Captain Jean-Luc Picard's first name is Jean. and putting Jan R. Falkena in such a form would be ridiculous, and not at all how his parents or he would have wanted his name expressed.

    Or take Test Lead Gerardo Villarreal Guzman.

    His first name is Gerardo.

    His last name is derived half from his father's name (Villarreal) and his mother's (Guzman). The hyphen is not used between these two halves, and the name itself becomes an interesting symbol of what singer/songwriter Gavin DeGraw referred to as "the birth of two souls in one". Which in my opinion is actually kind of a nice thing, culturally speaking.

    Now coming to the USA and knowing how inflexible so many process are about names, he might easily have been willing to simply go by Gerardo Villarreal and saved himself the grief (that is, for example, his name on Facebook), though the fact that Gerardo Villarreal Guzman is the name on his passport made that much more problematic for the company address book in other such places.

    To extend this a little bit, Gerardo Villarreal Guzman is married to Hortensia Ortiz Roffe.

    Their children are:

    • David Villarreal Ortiz
    • Paola Villarreal Ortiz

    Now the dropping of the maternally derived surnames from both parent's names is common and if you think about is one of the only way to really scale names across many generations, as I am sure neither David Villarreal Guzman Ortiz Roffe nor Paola Villarreal Guzman Ortiz Roffe would be terribly hasppy having to fill out forms with their names in them! :-)

    Though interestingly, when the names are more well-known due to political or economic or cultural influences the full name sometimes is retained, and in that case hyphenated -- thus if Gerardo were famous his children might be David and Paola Villarreal-Guzman Ortiz, or alternately if Hortensia were famous might have led to their names being David and Paola Villarreal Ortiz-Roffe.

    Though one could take such a practice with a cynical eye and look at it as a form of snobbery, I'd rather give such a practice a more culturally kind eye and look at it as just remembering identities that could have unique significance to others in the future.

    Even the other names mentioned above, from the fictional hyphenated French name Jean-Luc Picard (who would have to deal with the indignity of the Risa planetary computer system not allowing the hyphen) to the singer/songwriter Gavin DeGraw (who might sometimes be forced to go by Degraw due to a system not remembering the case of letters in the name -- which sucks -- or worse titlecasing -- which also sucks).

    And then there are readers of this blog like Gé van Gasteren and Jeroen Ruigrok van der Werven, both having names that would confound these systems.

    Or the way Japanese names are usually given in the form <family name> <given name>, well other than the imperial family.

    Or the different practices used in North and South India (the latter often not including a surname).

    The list could go on for hours -- I could have even included more specific examples like I did with Gerardo and Jan Roelof if I had more time to ask people for permission to "use" their names for more extended analysis).

    The fact is, the simplified structure of names "used in the United States" is kind of a lie anyway since many of these people live in the US.

    And thus while falling under the theoretical heading of a localizability issue, is probably better thought of as an issue that is important independent of the need to prepare for localization since this flexibility is required even in products that are not being localized, or in non-localized versions of products.

    Though it is also important in localization, so that localizers can reposition controls to meet the most common expectations for a target language.

    Which I guess gets back to answering the question What's in a name?

    Respect, or the lack thereof....


    This blog brought to you by(U+337b, aka SQUARE ERA NAME HEISEI)

  • Sorting it all Out

    Let go of your Type1 and Use the Force, Luke!

    • 2 Comments

    Christopher's question to me was not my favorite kind of inquiry:

    Hi Michael. A simple question, I hope: It appears to me that the MSKLC (v1.4) doesn't see any installed Type 1/CFF-flavor fonts. Is this true? Makes it difficult to test keyboard layouts created for such fonts... :(

    Thanks,
    Christopher

    The reason that this isn't my favorite kind of question?

    Well, mainly since the answer is so unfortunate....

    Microsoft Keyboard Layout Creator is a managed WinForms application, heavily dependent on GDI+.

    Because of that, the lack of support for Type 1 fonts in GDI+ is a pretty blocking issue for the display of text within the MSKLC user interface.

    The one good thing here about all of this is that you can still build the keyboard, using Unicode code points and such, and then you can build and install the keyboard to test it out. Not ideal, but at least it is possible....

    This "using the force" method to develop a keyboard layout was how I was able to build a test layout for Deseret when I was first testing out MSKLC's support of supplementary characters, when I did not have a Deseret font to test with.


    This blog brought to you by 𐐑 (U+10411, aka DESERET CAPITAL LETTER PEE)

  • Sorting it all Out

    To some, the name might be the WRONG SINGLE QUOTATION MARK

    • 6 Comments

    In the last item in the Suggestion Box as of the time I wrote this blog, Gé van Gasteren asked in comments to A more usable Dutch keyboard that works properly?, over here and here:

    Thanks, Michael, for giving me the full treatment! Interestingly, the great job you did would look great in the MSKLC documentation, but 99% of it was wasted on me, because I had gone through all that, whereas 1% (one little remark) suggested a possible solution -- or as close to a solution as practically possible.

    But first re. the problem: What you describe certainly looks like it should work, and it does in Test Keyboard Layout. But when I generate the installer and actually install the layout, I get the problem I mentioned in my post: The quote key stops working as a dead key and produces two curly quotes with each keystroke.

    This does not happen when I don't assign U+2019 to it but the spacing acute U+00B4, possibly because that one is in the ASCII range (as I mentioned in a later post, added as a comment to the first suggestion).

    So if you have really installed the layout you created in your post and it worked correctly for you, there is something wrong with my XP setup, or the thing only works properly in Vista, or whatever.

    Now for the brilliant 1%:

    First you talk about switching off that 'brilliant quotes' feature, and right after that about calling product support. I guess that latter bit would be a long shot, a tall order, and what not.

    But that gave me this idea, much easier to implement and to get consent for:

    Microsoft should ship all Dutch-language software packages with the default for the smart quotes feature set to "disabled". Tadaa!

    This simple measure would make all non-typographs produce straight quotes ' and " when typing. Not beautiful, but correct. Those interested in typography would switch the feature ON, and would usually (hopefully...!) be interested enough to use the proper curly quotes in the special cases I mentioned.

    The only wish after that would be to have U+2019 more easily available, e.g. on Alt-Gr-quote. But I think you wrote somewhere that existing layouts are never (never!) changed, so I'll learn to live with that.

    So how to make such a suggestion to product support and make it stick?

    ----------

    Reading the series about Table Driven Text Service, there may be a better way (now or soon) to implement smart quotes:

    If it is possible for applications to switch such tables on and off, there could be a setting in an application called "Convert quotes", with options "On", "Let me choose", and "Off".

    A Dutch-language application could have "Let me choose" as the default, and at typing a quote, a choice box with ‘  ’ and ' could pop up.

    I'm not sure I understood it rightly (after reading all ten installments in one session, my brain is a bit frazzled by the Chinglish) that several tables can be active simultaneously and on top of each other like CSSes (e.g. one for auto-correcting, one for quotes, one user-defined, etc.) but that seems necessary to make this kind of switching practical.

    And, apart from making the "smart quotes" smarter, this approach has the advantage that it is customizable!

    Okay, first I'll start by pointing out that in the only version of the keyboard lyout I can still find on my machine that I was playing with, I did not have U+2019 defined on the key itself.

    Though I did have it as the spacing version of the character at the bottom of the dead key table, as you can see here:

    http://www.trigeminal.com/images/dutchest.png

    So it may be that when I thought I was saying that there was no bug here that there may be one -- I cannot keep from getting two ’ (U+2019) characters if I define the dead key on U+2019.

    Anyway, as I typed I was getting the right character showing up, so I was pretty much paying the most attention to that.

    Which points to the definite workaround -- in fact, though I did not change the "name" on the dead key, MSKLC supports changing it -- even to RIGHT SINGLE QUOTATION MARK, if you like -- so you can make it anything you like and essentially never even see that the character ever, except in the people who are looking at WM_DEADCHAR messages, or calling functions like ToUnicode/ToUnicodeEx.

    For the most part, this means nobody. :-)

    The bug leads to the title of this blog, and the pseudo-rename of U+2019 to WRONG SINGLE QUOTATION MARK makes for a very nice linguistic back-formation, something that I don't think I have seen before....

    I'll make sure that other bug gets reported. It isn't as simple as being an ASCII only issue but there is a problem here so it needs to be figured out.

    Now there is a lot of other content here in terms of suggestions or thoughts, and I especially liked the really solid attempt to solve some oft he problems related to smart quotes that have come up over time. Though I think they are interesting ideas, moving to use a text based TSF text profile would be a huge change for a lot of users, so ther would need to be a really large number of people who needed this kind of functionality.

    I don't think we are really quite there yet in the Netherlands, even enough to ship an updated keyboard like the one suggested above, let alone a new TIP.

    Plus there is the lack of support for additional shift states which would definitely need to be addressed before anyone would even be willing to take a look at it!

    Then finally there were the suggestions about contacting product support.

    I do know one thing, for sure.

    If more of the customers who contacted me complaining about the "smarter quotes" feature in Word, or the other "feature" with CTRL+ALT shortcuts in Word that stomp on ALTGR characters (the feature that Marc Durdin dissected in an article I mentioned in The key to key messages is a key contribution), then perhaps the folks on the Word team would have the impression that they should mke it easier to alter these features than the current user interface allows. :-)

    I am going to play around with the TIP idea a bit, in any case. There are some really interesting possibilities that would be allowed here....

     

    This post brought to you by(U+2019, aka RIGHT SINGLE QUOTATION MARK)

  • Sorting it all Out

    Quite a shock, being hit by 1.3 VOLT like that

    • 2 Comments

    My ever-cool colleague Sergey had a great announcement a few days ago in the Microsoft VOLT users community:

    Hello everybody,

    Today we are releasing new version of Microsoft VOLT 1.3. There are lots of new features and improvements in this version, here are just a few:

        * New font explorer allows font designer to quickly search and navigate through the project.
        * Usability improvements, including lookup comments, new UI options and detailed error messages.
        * Huge performance improvements on complex font projects.
        * Uniscribe and sample files are updated to match ones shipped with Windows Vista SP1.

    You can find MS VOLT 1.3 download link and complete release notes on version history page.

    Thanks,
    Sergey

    This is very cool for the people who use VOLT (the Visual OpenType Layout Tool), including those who are much more knowledgeable than the not quite professional folks like me. :-)

    You can look here for more information from a font developer point of view....


    This blog brought to you by Ʋ (U+01b2, aka LATIN CAPITAL LETTER V WITH HOOK)

  • Sorting it all Out

    How bad does data have to before it is wrong? And how long does it have to be wrong before it is right?

    • 2 Comments

    There are not very many times that a feature within NLS can make a person psychotic.

    Though of course by making such a claim one implies that there are in fact such cases, no matter how rare they may be.

    This post will be about one of them....

    It is about the  TransliteratedFrench and TransliteratedEnglish calendars in Windows.

    In order to properly tear them apart, first we'll write some code to enumerate the information in them.

    Note that the tortuous method of getting to the data is not my idea. :-)

    Here is the code:

    namespace PsychoticCalendars {
        using System;
        using System.Globalization;
        class PsychoticCalendars {
            [STAThread]
            static void Main(string[] args) {
                CultureInfo[] rgci = {new CultureInfo("en-US"), new CultureInfo("fr-FR"), new CultureInfo("ar-IQ")};
                foreach(CultureInfo ci in rgci) {
                    foreach(Calendar cal in ci.OptionalCalendars) {
                        if(cal is GregorianCalendar) {
                            Console.WriteLine("{0}\t{1} ({2})", ci.Name, cal, ((GregorianCalendar)cal).CalendarType);
                            ci.DateTimeFormat.Calendar = cal;
                            Console.Write('\t');
                            for(int i = 1; i <= 12; i++) {
                                Console.Write(ci.DateTimeFormat.GetMonthName(i) + "  ");
                            }
                            Console.WriteLine();
                            Console.Write('\t');
                            for(int i = 1; i <= 12; i++) {
                                Console.Write(ci.DateTimeFormat.GetAbbreviatedMonthName(i) + "  ");
                            }
                            Console.WriteLine();
                            Console.Write('\t');
                            for(DayOfWeek d = DayOfWeek.Sunday; d <= DayOfWeek.Saturday; d++) {
                                Console.Write(ci.DateTimeFormat.GetDayName(d) + "  ");
                            }
                            Console.WriteLine();
                            Console.Write('\t');
                            for(DayOfWeek d = DayOfWeek.Sunday; d <= DayOfWeek.Saturday; d++) {
                                Console.Write(ci.DateTimeFormat.GetAbbreviatedDayName(d) + "  ");
                            }
                            Console.WriteLine();
                            Console.Write('\t');
                            for(DayOfWeek d = DayOfWeek.Sunday; d <= DayOfWeek.Saturday; d++) {
                                Console.Write(ci.DateTimeFormat.GetShortestDayName(d) + "  ");
                            }
                            Console.WriteLine("\r\n");
                        }
                    }
                }
            }
        }
    }

    First, before you run this code, you will want to run chcp 1256 or chcp 65001 since those are two of the only code pages that will be able to contain the French and Arabic letters that will be needed here.

    Okay, now here is the output....

    en-US   System.Globalization.GregorianCalendar (Localized)
            January  February  March  April  May  June  July  August  September  October  November  December
            Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
            Sunday  Monday  Tuesday  Wednesday  Thursday  Friday  Saturday
            Sun  Mon  Tue  Wed  Thu  Fri  Sat
            Su  Mo  Tu  We  Th  Fr  Sa

    en-US   System.Globalization.GregorianCalendar (USEnglish)
            January  February  March  April  May  June  July  August  September  October  November  December
            Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
            Sunday  Monday  Tuesday  Wednesday  Thursday  Friday  Saturday
            Sun  Mon  Tue  Wed  Thu  Fri  Sat
            Su  Mo  Tu  We  Th  Fr  Sa

    fr-FR   System.Globalization.GregorianCalendar (Localized)
            janvier  février  mars  avril  mai  juin  juillet  août  septembre  octobre  novembre  décembre
            janv.  févr.  mars  avr.  mai  juin  juil.  août  sept.  oct.  nov.  déc.
            dimanche  lundi  mardi  mercredi  jeudi  vendredi  samedi
            dim.  lun.  mar.  mer.  jeu.  ven.  sam.
            di  lu  ma  me  je  ve  sa

    ar-IQ   System.Globalization.GregorianCalendar (Localized)
            كانون الثاني  شباط  آذار  نيسان  أيار  حزيران  تموز  آب  أيلول  تشرين الأول  تشرين الثاني  كانون الأول
            كانون الثاني  شباط  آذار  نيسان  أيار  حزيران  تموز  آب  أيلول  تشرين الأول  تشرين الثاني  كانون الأول
            الاحد  الاثنين  الثلاثاء  الاربعاء  الخميس  الجمعة  السبت
            الاحد  الاثنين  الثلاثاء  الاربعاء  الخميس  الجمعة  السبت
            أ  ا  ث  أ  خ  ج  س

    ar-IQ   System.Globalization.GregorianCalendar (USEnglish)
            January  February  March  April  May  June  July  August  September  October  November  December
            Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
            Sunday  Monday  Tuesday  Wednesday  Thursday  Friday  Saturday
            Sun  Mon  Tue  Wed  Thu  Fri  Sat
            Su  Mo  Tu  We  Th  Fr  Sa

    ar-IQ   System.Globalization.GregorianCalendar (MiddleEastFrench)
            janvier  février  mars  avril  mai  juin  juillet  août  septembre  octobre  novembre  décembre
            janv.  févr.  mars  avr.  mai  juin  juil.  août  sept.  oct.  nov.  déc.
            dimanche  lundi  mardi  mercredi  jeudi  vendredi  samedi
            dim.  lun.  mar.  mer.  jeu.  ven.  sam.
            أ  ا  ث  أ  خ  ج  س

    ar-IQ   System.Globalization.GregorianCalendar (TransliteratedEnglish)
            يناير  فبراير  مارس  ابريل  مايو  يونيو  يوليو  اغسطس  سبتمبر  اكتوبر  نوفمبر  ديسمبر
            يناير  فبراير  مارس  ابريل  مايو  يونيو  يوليو  اغسطس  سبتمبر  اكتوبر  نوفمبر  ديسمبر
            الاحد  الاثنين  الثلاثاء  الاربعاء  الخميس  الجمعة  السبت
            الاحد  الاثنين  الثلاثاء  الاربعاء  الخميس  الجمعة  السبت
            أ  ا  ث  أ  خ  ج  س

    ar-IQ   System.Globalization.GregorianCalendar (TransliteratedFrench)
            جانفييه  فيفرييه  مارس  أفريل  مي  جوان  جوييه  أوت  سبتمبر  اكتوبر  نوفمبر  ديسمبر
            جانفييه  فيفرييه  مارس  أفريل  مي  جوان  جوييه  أوت  سبتمبر  اكتوبر  نوفمبر  ديسمبر
            الاحد  الاثنين  الثلاثاء  الاربعاء  الخميس  الجمعة  السبت
            الاحد  الاثنين  الثلاثاء  الاربعاء  الخميس  الجمعة  السبت
            أ  ا  ث  أ  خ  ج  س

    Okay, so here we go.

    Let's pick them apart, after getting some advice from the NLS "Calendar Girl" Shelby who first pointed out one of the problems here (and though we both turned out to be mistaken as to the cause, that is only because both of us attributed more smarts to the actual process!).

    First of all, there is the fact that the shortest day names  for the MiddleEastFrench calendar, rather than matching the French Gregorian localized calendar like all of the rest of the data does, matches the Arabic Gregorian localized calendar.

    Thus instead of 

    di  lu  ma  me  je  ve  sa

    we have

     أ  ا  ث  أ  خ  ج  س

    which for those who don't know Arabic, is

    ALEF WITH HAMZA ABOVE, ALEF, THEH, ALEF WITH HAMZA ABOVE, KHAH. JEEM, SEEN

    Okay.

    Now there is also the fact that the shortest day names for the TransliteratedEnglish and TransliteratedFrench calendars are also identical to these. Note from the above that they are in no way transliterations for either the English or French Gregorian calendars.

    That seems like kind of a problem too.

    But don't worry too much -- it turns out that the day names and abbreviated day names for the TransliteratedEnglish and TransliteratedFrench calendars are also identical to the Arabic Gregorian localized calendar.

    And are also in no way transliterations.

    In case you don't believe me I'll take one and prove it. Wednesday is:

    الاربعاء

    which is

    ALEF, LAM, ALEF, REH, BEH, AIN, ALEF HAMZA

    which is obviously not a transliteration for either Wednesday or mercredi.

    Month name fare a bit better, though -- they do look like transliterations. Thus

    سبتمبر

    is

    SEEN BEH TEH MEEM BEH REH

    which is a fair transliteration for September, just as

    فيفرييه

    is

    FEH YEH FEH REH YEH YEH HEH

    which is kind of a transliteration for février.

    Though of course in both the TransliteratedEnglish and TransliteratedFrench calendars, the abbreviated month names, rather than being transliterations of the  English and French calendars, are identical to their non-abbreviated cousins.

    At this point, it is fair to say that of the data in these three Gregorian calendars:

    • MiddleEastFrench
    • TransliteratedEnglish
    • TransliteratedFrench

     60% of it is just wrong, wrong, wrong in any conventional sense of how a reasonable person would expect them to work.

    If you ignore the shortest day name stuff (which was added fairly recently) then only 50% of it is wrong.

    But this data is not newly wrong -- it has been wrong for as long as these calendars have existed -- in Windows 95, I think?

    It would be easy to claim that this is really a fallback system kind of thing -- you know, data was not there so it is falling back to data elsewhere.

    I could make such a claim right now credibly based on the situation.

    There would be one problem this this claim, though.

    The fact that I would be full of crap if I made it. :-)

    This data is stored as is in the data and has been for as long as the data has been there.

    These calendars are just wrong and weird and odd and strange and they are mostly not transliterations in any sense.

    From a quality of data standpoint, in fact, I would tentatively suggest that we are currently hip deep in the low-point of NLS right now.

    Transliterationally speaking, that is.

    So I'll put forward the two questions again:

    How bad does data have to before it is wrong?

    Will 50-60% do it? How about the 75-80% of the transliterated calendars only?

    And how long does it have to be wrong before it is right?

    Is over a decade long enough that this is not just okay? Or does fixing the worst of it make sense at some point in the future?


    This blog brought to you by ج (U+062c, aka ARABIC LETTER JEEM)

  • Sorting it all Out

    Providing an answer when one can't clearly anunciate what it is?

    • 1 Comments

    You may have seen the movie Class Action a few years back. It had Gene Hackman and Mary Elizabeth Mastrantonio in it.

    The movie isn't very important to this blog, other than to note the whole concept of "dumping" a ton of documents during the discovery period of a trial when one subpoenas a single bit of information.

    Obviously this a great way to hide information that one may not want the plaintiff to get, while still not violating the law by not handing over the requested information.

    This makes for great drama, as you can probably imagine.

    In many other cases, there are no nefarious purposes; there is just no easy way to determine how to get the information, so the "dump" is done to meet the letter of the law since the spirit is really hard to respect.

    The latter kind of happened to me not too long ago. :-)

    It all started when regular reader Jeroen Ruigrok van der Werven asked over in the Suggestion Box:

    Also Michael,

    does Office 2007 introduce any new fonts, aside from the Vista C* fonts, with any language pack?

    So I asked my Office colleagues if they had this information.

    They had something.

    What I got in my inbox was a ~1mb Excel spreadsheet listing 500 different font file names (separate file name when there were separate files for bold, etc.) in the columns and 771 different language/product/SKU combinations in the rows. And then any time that font file name shipped with that particular language/product/SKU combination, there would be an X at the intersection of that row and column.

    There were 65,583 of those X's.

    When the whole spreadsheet was converted to a PDF file, that PDF file was 1,258 pages and abot 4.59mb.

    The spreadsheet could only be viewed in Excel 2007 since the 256 column limitation exists in prior versions....

    It might have been faster to admit that they did not have the specific requested information. :-)

    In theory the information could be retrieved from this, assuming the supplementary information of the fonts in Vista was also available so one could remove those items from the list.\

    Or someone in Microsoft Typography could probably look at the list and with some time and some lookups could probably put the information together, too.

    But all I can really say about this huge mound of information that I can't really publish to the blog or provide as an Excel spreadsheet of PDF for download is that the answer to the question is "maybe", though in all likelihood the only fonts that would meet the requested criteria would be fonts that have also shipped in prior versions and/or language versions of Office (this supposition based on the instincts of one of the Ofice people!).

    But if you need definitive factual information then we're fresh out here, sorry!

     

    This blog brought to you by(U+09f2, aka BENGALI RUPEE MARK)

  • Sorting it all Out

    If you can find an unsigned copy, it's worth an absolute fortune

    • 2 Comments

    I was pondering the other day.

    You see, Ed Ye is leaving our team, and heading back to family and homeland in China.

    We have worked together many times over the last many years so I'll definitely miss him being in an office not far from mine (he has always been on the other side of the building but the distance didn't scare me off!).

    Anyway, as a part of going back to China he is giving away a lot of books that he really can't ship back with him.

    So, like many others I am scrounging through the pikes of books outside his office.

    I grabbed a copy of Helen Custer's Inside Windows NT (mine looks a lot more beat up than his, despite the fact that his is older and I haven't opened mine in years)

    And a copy of Jeffrey Richter's Advanced Windows (his is the same age as mine but definitely in better shape, mine was falling apart!).

    Then suddenly I stopped.

    The copy of my book that he had me sign was there.

    The scene that came to my mind unbidden was from Notting Hill, one of my favorite movies for perhaps not entirely unfathomable reasons:

    {William returns to his desk. In the monitor we just glimpse, as does William, the book coming out of the trousers and put back on the shelves. The thief drifts out towards the door. Anna, who has observed all this, is looking at a blue book on the counter.}

    WILLIAM: Sorry about that...
    ANNA: No, that's fine.  I was going to steal one myself but now I've changed my mind.  Signed by the author, I see.
    WILLIAM: Yes, we couldn't stop him.  If you can find an unsigned copy, it's worth an absolute fortune.

    {Anna smiles}

    I look at the words I wrote and I wonder what I will do with the book that has a "To Edward" inscription in it. Maybe I should have just left it for someone else to pick up but I just felt compelled to figure out something to do with it.

    Now in theory I could just sell it somewhere so that some worthy soul who had been looking for it can have a copy.

    Obviously for a discount.

    Though it's in good shape and still has the CD (I think about Ed and wonder how does that man not kick the crap out of his books as I have done so often in the past!), it just seems weird to sell it at its market value (whatever that is).

    Like Hugh Grant said, if you can find an unsigned copy, it's worth an absolute fortune -- but the signed ones (especially when they contain a note to someone else) seem worth a bit less to everyone other than the person they were intended for.

    You will definitely be missed, Ed. You drove a lot of work that I value to completion and taught me more than you may know about software globalization and other topics over the years....

     

    This blog brought to you by < (U+003c, aka LESS-THAN SIGN)

  • Sorting it all Out

    On describing poorly (and on not listening)

    • 3 Comments

    You may recall blogs like How do I feel about lstrcmpi? I think it blows.... where I pointed out a case where an entire division full of developers have essentially been doing something wrong for over a decade.

    I have even put out the straw man argument that perhaps if everyone gets it wrong so consistently then perhaps changing the function to meet might be easier than changing the expectations of so many developers.

    Now there are other times that we can run across this kind of problem.

    Like the one I ran across today as I was reviewing a ton of code.

    It is the problem I discuss in The user locale of the system account is not the system locale.

    There is important (if subtle) message in that blog.

    That message, quite simply put, is

    The default user locale of the SYSTEM account is not the default system locale.

    This is a point that I am physically unable to make any clearer.

    They aren't the same.

    The former, on the one hand, is changing the default user locale on the first tab in Regional and Language Options (RLO) and settable by applying changes to the system accounts on the bottom of the last tab of RLO. No reboot is needed but programs already running under the system account may need to be cycled to find out about the change. And its value is not returned by GetSystemDefaultLCID.

    The latter, on the other hand, is settable by changing the Language for non-Unicode programs.on the last tab of RLO, an the reboot is absolutely required even if you cycle services running under the system account.

    These two settings are not connected functionally and are only connected in people's minds because of their respective older names both having the word SYSTEM in them.

    There is a third setting that is also unrelated but bolsters the notion of the separate identity of the SYSTEM account -- the one I describe in UI language of the LocalSystem account (which almost never shows UI) -- which is not returned by GetSystemDefaultUILanguage though it sounds like it maybe ought to -- but we'll set that aside for the moment. :-)

    Anyway, people keep making this same mistake, which leads to real problems when services are not running with the settings that the expect even if customers are informed about how to apply changes to default/system accounts and they properly follow the steps to do this.

    All I can do is just yell at the top of my lungs here so that future Google searches can pick up on it, using the words:

    The default user locale of the SYSTEM account is not the default system locale.
    The default user locale of the SYSTEM account is not the default system locale.
    The default user locale of the SYSTEM account is not the default system locale.
    The default user locale of the SYSTEM account is not the default system locale.

    And just hope that people are listening? :-)

    Plus maybe whisper loudly in some ears at work....


    This post brought to you by δ (U+03b4, a.k.a. GREEK SMALL LETTER DELTA)

  • Sorting it all Out

    Tap your heels together three times and the problem is solved?

    • 1 Comments

    People who are regular followers of my words may recall my recent blogs Somehow I just get a Visual of the Logical Song (as sung by Supertramp) and So logical that even Mr. Spock (and my fiancée?) would approve.

    Those blogs were actually in response to some efforts of folks to get a handle on the lack of intuitiveness of the user interface there.

    Obviously the current behavior has been around for a long time, and it is solidly grounded in the way that Microsoft implements UAX#9: Unicode Bidirectional Algorithm -- which is important here not just because it helps to determine how text is displayed in these controls in Regional and Language Options, but it is also behind how these controls will behave when text is typed into them and when the cursor moves around in them as well.

    The bottom line is that text which starts off and ends up with entirely LTR text or entirely RTL text will have specific behavior when you hit the HOME key and the END key.

    If this behavior were mysteriously changed in just these controls, people would have serious trouble using them!

    Thus, looking at another tab in Regional and Language Options "Customize" dialog, in a control starting with strong LTR or strong RTL text, hitting the HOME key will place the cursor in a very clear place:

         

    just as will hitting the END key:

       

    Once and for all this bit of reality can settle the current behavior.

    However, and here is where I am not going to gloat just because it just so happens that I was right, there is still a problem here.

    The generic question of whether this behavior is correct or not still exists.

    The fact remains that it can be confusing.

    Because an unflawed design should ideally be able to produce intuitive results -- the fact that there is something non-intuitive here suggests that might be important to take a step back and try to determine if it is the design that has a conceptual flaw, or just the way in which it is implemented.

    Do let's take that step back.

    The current behavior is the expected design of the LTR text in the control. And that text is made up of tokens that are hard-coded for software developers calling GetDateFormat, GetDateFormatEx, GetTimeFormat, and GetTimeFormatEx. These tokens can't change their directionality or this would break the way that developers use them.

    Or can they?

    Maybe you have seen the blog I wrote back in February 2006 Some localized Regional and Language Options tags, where I showed how many different localized versions of Windows actually do localize those format tags in the user interface, and the UI properly converts the localized tags to the hard-coded ones that the function expects.

    Remember that scene near the end of the Wizard of Oz when the balloon has taken off and Dorothy is so sad because she has no idea how she'll get home?

    You know, the bit where we get the whole educational; spiel:

    Tin Man: What have you learned, Dorothy?
    Dorothy: Well, I - I think that it - that it wasn't enough just to want to see Uncle Henry and Auntie Em. And that it's that - if I ever go looking for my heart's desire again, I won't look any further than my own backyard, because if it isn't there, I never really lost it to begin with. Is that right?

    The localizers have had the power to "fix" this issue all along!

    Now of course looking at Hebrew, Arabic, Urdu, Farsi, or Pashto, the RTL languages into which Windows has been fully or partially localized to date, none of the tags have ever been localized.

    For localization, the issue is more than just clicking one heels together three times and saying "there's no place like my native language". There are, for example, issues to consider in the localization -- like if you use

    י

    (U+05d9, aka HEBREW LETTER YOD) for one of the tags in Hebrew, you run into problems if you have to double it up since both

    ײ

    (U+05f2, aka HEBREW LIGATURE YIDDISH DOUBLE YOD) or just two plain old YODs will look like something that might offend people to use in such a context since in Hebrew it is one of the names of God that are considered holy enough that they should only be used in a holy context such as a prayerbook. While it is arguably the most mundane of those holy names, it is nevertheless one of them, and thus would be problematic.

    Anyway, such issues would have to be figured out and decided upon, and people would have to test things out and see if the new behavior confuses fewer people than the old behavior has (with the explanatory key defined right there in Regional and Language Options, I am definitely more in favor if the intuitive behavior than people being wed to d/m/y/t and such, unless they did find it to be confusing).

    Perhaps me suggesting all this is my own way of being gracious in victory (to the extent that being right is a victory; many spouses (spice?) will tell you that when one is right it is important to apologize immediately!).

    But to be honest, that is all just stuff that occurs to me in retrospect. Mentally, I took it all through the process I described above, a process that is no less than one would hopefully desire (if not expect) of a good specification that as describing the intended behavior. :-)

     

    This blog brought to you by ײ (U+05f2, aka HEBREW LIGATURE YIDDISH DOUBLE YOD)

  • Sorting it all Out

    How to keep it up (or at least how to get it back up after it has gone down)

    • 0 Comments

    The question is actually one that has come from a few different people, most recently from a developer named Jeff:

    How can I keep the Language Bar up for my C# application?

    Or if that isn't possible, how can I make it come back up after it has been minimized to the tray?

    Like I said, it turns out to be a common question. :-)

    There are two solutions here, one for each of the subquestions in Jeff's inquiry....

    There is a thread on the WinForms forum entitled How do I keep Language Bar showing to the user in remote desktop application? that explains how to disallow minimizing -- if you deny modify permissions to HKCU\software\microsoft\TSF then you can't minimize the Language Bar at all.

    Though to be honest this is a bit heavy handed, in my opinion. If they want to minimize it at a given moment then I hate taking away the option to do so.

    But I would like to make it easy to have it pop back up!

    So let's move past that solution and think about the second one, shall we? :-)

    To make the Language Bar do your bidding, what you really want an object that implements the ITfLangBarMgr interface. Because you can use the ITfLangBarMgr::ShowFloating method to control all of the various aspects of the language bar from how it shows up to how transparent it is to how many icons it shows to whether the text labels are there and so on.

    The C# part will make this a bit harder, of course....

    Now there is a TF_CreateLangBarMgr function exported by msctf.dll that make it easy to get the ITfLangBarMgr interface pointer. And obviously it is easy to enough to call Marshal.ReleaseComObject on that object after you are done and keep the documentation in that function call which is so worried about refcounts from having a big "I told you so" moment on your code.

    But the middle part, the one that calls ITfLangBarMgr::ShowFloating with the TF_SFT_DHOWNORMAL flag, that requires you to actually have the interface defined if you really want to call the method from C#. Not impossible, but definitely some extra work made much harder by the fact that this one is in COM.

    Personally, I'm pining for an awesome sample to do all this coming out of the TSF Aware Blog. This is, after all, a nice, easily contained sample that would be tremendously useful to a lot of people.

    This is my subtle hint to Eric Brown to encourage him to combine his extensive TSF knowledge with the wonderful science of COM Interop to show off one of the more feasible uses of C# and the Text Services Framework that he has suggested in the past. :-)

    I suppose I could cobble together a sample too, eventually. Everything to do the work is documented and understood, and even COM Interop is not so hard if you don't try to make it so. But I'm hoping it gets done before I get to it....

    Anyway, that is the answer to this question that seems to come quite a bit for some reason. Perhaps also a message to the Text Services Framework team to make the "un-minimizing" support easier, too? :-)

     

    This blog brought to you by(U+0b35, aka ORIYA LETTER VA)

  • Sorting it all Out

    Perhaps the commonest (if not the leastest) of the least common denominators

    • 1 Comments

    We were at a meeting yesterday.

    A 4:00 meeting.

    Yes, a 4pm on a Friday meeting.

    When the sun was out.

    I probably would have left work early that day and tried to enjoy the sun, were this meeting not on the schedule.

    The organizer of the meeting, self-consciously aware of this, said as much in the invite:

    Sorry for the late time on Friday but we have varying OOFs coming up.

    Obviously it wasn't really late. Hell, after I had dinner I probably worked for several hours. but folks in the Pacific Northwest as a tribe seem to be quite jealous of people who would deny us our sun time.

    Anyway, it was an hour meeting.

    Well, technically it was two 30-minute meetings, back-to-back.

    I was only invited to the first one, but I suspected that unless I insisted they'd be happy if I stayed the full hour.

    Turns out I was right. :-)

    It was a few minutes after five when I finally aimed the scooter at the door.

    Everyone but Peter and I had already left.

    Essentially everyone but Peter and I were smart enough to leave....

    Peter asked me a question when the scooter was literally sticking out the door like something unmentionable sticks out that you have to tuck back in.

    Obnoxiously enough, it was interesting question.

    Peter was confused about something.

    He was wondering about all of these books like the Kano book1, they start taking about internationalization from the NLS side. It seemed to him like such an unnatural place to begin if you are trying to introduce the subject to people who had no background in the area. Wouldn't it make more sense to start from somewhere like text? Everyone has to deal with input or display, after all.

    On the one hand, he has a point. It does seem like an odd place to start.

    On the other hand, however, he is actually a font person and linguist, asking an NLS person with notions of linguistic aptitude why books like the quintessential tome for the area starts from what the latter is most familiar with when what the former is most familiar with seems like so much more natural of a place.

    When it doesn't, for the former. :-)

    I don't recall exactly what my answer was -- it focused on the viewpoint thing, and the fact that there were more applications that had to start from a simple place where their UI was a series of printf statements than people writing code that built huge word processing applications. So that it was attempting to start simple from what was perhaps the commonest (f not the leastest) of the least common denominators.

    I do know that I did not think of that line at the time; what I said was less poetic. But I was honestly thinking about the last bit of sunlight for the day and not focusing on remembering the exact words this time. Plus this seems more clever, with the proper amount of self-deprecation for me to feel comfortable with a situation where i was explaining why in a world of competing viewpoints, the one closer to mine happened to be the winner this time....

    The truth is perhaps a bit more complicated, though.

    After all, the same content could have been organized very differently and it would still be great content. Since it is the kind of book where almost every time it is opened it is to a specific place found from the TOC or the index or memory, and since few will read it from end-to-end even if you leave out the appendices, how it is organized is not, strictly speaking, as important as my answer implied.

    HOWEVER, many of the people doing the writing here came from that development background.

    Not the typographic one.

    And if someone is putting together a white paper or a book, one will only most comfortably write from the point of view of one's own experience; it's only natural.

    How would you have done it, if it were up to you?

    Would you have started with text display (the typography side) or locales (the NLS side)?

    And is it for any reason beyond your background that you can identify? :-)

     

    1 - Developing International Software for Windows 95 and Windows NT (first edition). Nadine Kano's name was on the cover but everyone (including her) knew that it was a collective effort from many people throughout Microsoft. We still call it the Kano book most of the time, or occasionally DISv.1 if we are contrasting....

     

    This blog brought to you by(U+0a01, aka GURMUKHI SIGN ADAK BINDI)

  • Sorting it all Out

    A serious lack of overlap with 64

    • 10 Comments

    So it started over on the VistaWindows Experience Blog's blog Watch NBC’s coverage of the Beijing Olympics in Windows Media Center.

    Some eager folks who downloaded the client were dismayed to see it was 32-bit only. Brandon (the blog's author) mentioned in comments that there may be news on this in the future, so perhaps there is eventual hope.

    Garrett McGowan summed up my feelings nicely:

    Awesome! Except...after downloading the 10MB client, I'm informed that 'Sorry, 32-bit Windows Vista only.'

    Foot, meet bullet.

    I'm serious about 64-bit. I've been running 64-bit on my primary home machine since February 2007. I put up with the early, shaky device drivers. I put up with the late availability of Windows Live software and the incredibly late Windows Home Server Client. Actually no, I gave up on that one, along with Creative's X-Fi full-feature drivers (they finally dropped this week). But now this.

    When is the 'ecosystem' going to take 64-bit seriously?

    I agree 100%.

    I had one of my own situations like that recently. I had finally decided to dump the 50gb of space I had set aside for the 32-bit version of Vista on my MacBook Pro and just go full 64-bit and use the space for other purposes.

    Soon after, I had installed many language packs, and subsequently made a sad discovery.

    Language Interface Packs were not being made available for 64-bit.

    "How could that be?" I wondered. I mean, the build process for them is the same as for the Language Packs that are available for 64-bit. The only difference is that they end up building a bit faster since by definition that have fewer language resources in them. I knew they could easily exist, I must be mistaken in my investigation that seemed to indicate they weren't available.

    But it was no mistake -- someone had deemed that the overlap of people running on 64-bit hardware and the people who need Language Interface Packs because they need windows in their native language was apparently not high enough to really merit the resources to release the 64-bit LIPs.

    So there was no reason for the official build machines to build them since they weren't being released like the 32-bit ones were.

    And there was no specific reason to test them since they weren't being released like the 32-bit ones were.

    Lame.

    Though I suppose this is a trend that can be easily reversed when/if the determination is made that the overlap of hardware and language makes it more sensible, strategically.

    This is hardly isolated, mind you. There are way too many examples where people just don't think 64-bit is needed. At least not yet.

    But then like many I wonder how all of the predictions about the end of 32-bit on the horizon can be true when there are so many examples like this happening.

    In the meantime, we won't be watching the Olympics on our 64-bit machines while we are in India, which is maybe okay because the Olympics viewing is currently supported in the US only anyway.

    But that is a rant for another day....

     

    This blog brought to you by 𝅘𝅥𝅱𝅁 (U+1d163 U+1d141, aka MUSICAL SYMBOL SIXTY-FOURTH NOTE and MUSICAL SYMBOL SIXTY-FOURTH REST)

  • Sorting it all Out

    How do you key your num pad without going blind?

    • 0 Comments

    Presumed regular reader and extraregular (ref: ordinary vs. extraordinary to understand extraregular here!) human Doug Ewell asks:

    Sorry, couldn't find this anywhere in SiaO, though I may not have looked hard enough.

    Is there any way to define the keys on the numeric keypad (optionally with Shift and/or AltGr) using MSKLC?

    He might have been using the built-in search. I didn't have much luck with that either.

    Perhaps unlike most of the company I work for, I have a simple daily algorithm I use for all of my searching through the ~2650+ blogs in the blog in this case where I know the content is there and remember some words in the blog:

    1. The first search of the day I use Live Search
    2. If the results are good, then I use Live Search for the rest of the day
    3. If the results are not good, then I use Google for the rest of the day

    Some zealots will feel I am betraying the company somehow, but I think if the stats of Google usage in Redmond are even 5% accurate I think I am being more than fair here.

    Some others may object to the strategy since it seems really arbitrary. That is kind of the point for me, though....

    Anyway, after my searching (I'll let you guess search engine I used -- it was my first search of the day!), I found the following relevant blogs which should help give Doug the good and bad answer he may or may not have been hoping for:

    Which one is most likely to have the answer is left as an exercise for the reader. :-)

    There were others, but the gist is pretty much gettable from these....

     

    This blog brought to you by 𐂆 (U+10086, aka LINEAR B IDEOGRAM B106F EWE)

  • Sorting it all Out

    My Gut[tman] instinct is...

    • 2 Comments

    Within the last few weeks, Amy asked (via the Contact link):

    hello

    i'm searching for a downloadable (preferred) hebrew font for an invitation and saw your website came up in my search: specifically the link to these fonts:
    http://www.trigeminal.com/images/hebrew.png

    i'm very interested in getting a hold of the Guttman Yad font- can you tell me where i could locate it online?

    thank you!
    Amy

    Amy found the link to this image in You got your latins in my hebrew! No, you got your hebrew in my latins!.

    To be honest, I (like I expect most people) have no real idea where they get most of the fonts on their machine from. I don't really tend to download fonts though, so almost all of them come from some random software product from Microsoft or some other company.

    Maybe some random language version of Office? I really don't have any idea. :-(

    I thought I'd put this question out there in case anyone wanted to suggest the best (legal) way to find the font in question and help Amy out....

    Anyone?

     

    This post brought to you by ס (U+05e1, a.k.a. HEBREW LETTER SAMEKH)

Page 3 of 3 (45 items) 123