Blog - Title

December, 2010

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    I think MaxLength needs protection to assure safer text

    • 14 Comments

    The presented scenario is simple (even more simple as I will present it here than it was originally!):

    1. A WinForms TextBox sits on a Form, empty. It has a MaxLength set to 20.
    2. The user types into the TextBox, or maybe pastes text into it.
    3. No matter what you type or paste into the TextBox, you are limited to 20, though it will sympathetically beep at text beyond the 20 (YMMV here; I changed my sound scheme to give me that effect!).
    4. The small packet of text is then sent somewhere else, to start an exciting adventure.

    Now this is an easy scenario, and anyone can write this up, in their spare time. I just wrote it up myself in multiple programming languages using WinForms, because I was bored and had never tried it before. And with text in multiple actual languages because I am wired that way and have more keyboard layouts than possibly anyone in the entire freaking universe.

    I even named the form Magic Carpet Ride, to help ameliorate the boredom.

    This did not work, for what it's worth.

    so instead, I entered the following 20 characters into my Magic Carpet Ride form:

    0123401234012340123𠀀

    Uh oh.

    That last character is U+20000, the first Extension B ideograph of Unicode (aka U+d840 U+dc00, to its close friends who he is not ashamed to be disrobed, as it were, in front of)....

    And now we have a ball game.

    Because when TextBox.MaxLength talks about

    Gets or sets the maximum number of characters that can be manually entered into the text box.

    what it really means is

    Gets or sets the maximum number of UTF-16 LE code units that can be manually entered into the text box and will mercilessly truncate the living crap out of any string that tries to play cutesy games with the linguistic character notion that only someone as obsessed as that Kaplan fellow will find offensive (geez he needs to get out more!).

    I'll try and see about getting the document updated....

    Regular readers who remember my UCS-2 to UTF-16 series will note my unhappiness with the simplistic notion of TextBox.MaxLength and how it should handle at a minimum this case where its draconian behavior creates an illegal sequence, one that other parts of the .Net Framework may throw a

    System.Text.EncoderFallbackException: Unable to translate Unicode character \uD850 at index 0 to specified code page.

    exception if you pass this string elsewhere in the .Net Framework (as my colleague Dan Thompson was doing).

    Now okay, perhaps the full UCS-2 to UTF-16 series is out of the reach of many.

    But isn't it reasonable to expect that TextBox.Text will not produce a System.String that won't cause another piece of the .Net Framework to throw?

    I mean, it isn't like there is a chance in the form of some event on the control that tells you of the upcoming truncation where you can easily add the smarter validation -- validation that the control itself does not mind doing.

    I would go so far as to say that this punk control is breaking a safety contract that could even lead to security problems if you can class causing unexpected exceptions to terminate an application as a crude sort of denial of service.

    Why should any WinForms process or method or algorithm or technique produce invalid results?

  • Sorting it all Out

    Short-sighted text processing #1: Uniscribe filters nothing

    • 9 Comments

    It was just the other day that regular reader Random832 commented to my I think MaxLength needs protection to assure safer text:

    Who decided that cutting off your string and beeping at you was good UI, anyway? Surely it'd be better to just not let the user submit the form until they delete enough stuff to fit [and provide interactive feedback about the limit, a la twitter]

    Well, I'm only half-serious - i'm sure this was easier with the limitations programmers had to work with in 1983 - but surely no new programs should be using this "feature".

    Now there's a question.

    It is in fact an excellent point, though.

    I mean, for all the bold talk about one of the possible indicatons for "complex script support" is indeed Filters out illegal character combinations -- and this is called out in Thai, specifically. GoGlobal goes a bit further in its Complex Scripts FAQ:

    Why do we need to filter illegal character combinations?
    Since Thai syllables consist of a consonant optionally followed by one vowel and/or one tone mark, some character combinations (e.g., two vowel marks in succession) are nonsensical. Thus, one of the tasks of complex script enabling is to filter out or disallow illegal character combinations.

    Interestingly, it makes me think of the prototypical example of this behavior:

    1. Open Notepad
    2. Switch to the That Kedmanee keyboard
    3. Type the "J" key, which tries to type U+0e48, aka THAI CHARACTER MAI EK

    Every time you hit the key, the computer will beep and insert nothing.

    But can I tell you a secret?

    You have to promise that you won't tell anyone. It is kind of embarrassing.

    Uniscribe isn't doing that.

    Seriously.

    I can put a bunch of those characters in a row just fine, in text, in an applicaion other than Notepad. And Uniscribe will display them.

    I can even paste lines of them into Notepad:

    or here:

    ่่่่่่่่่่่่่่่่่่่่่่่่
    ่่่่่่่่่่่่่่่่่่่่่่่่
    ่่่่่่่่่่่่่่่่่่่่่่่่
    ่่่่่่่่่่่่่่่่่่่่่่่่

    and guess what? There's no problem with doing it.

    The code that "filters" these characters sits in code called by the EDIT control that checks for two things:

    If both are, while you are typing, this code that is not in Uniscribe itself will fail the attempt to insert the text, and it will beep.

    Obviously that doesn't work so well for text that is already present (how do you scold someone for illegal text alreay typed?), so in that case Uniscribe will just do as it is told. And it will of course include the 'empty circle" that implies a missing base character.

    Now there are several problems inherent in this direction for the text processin engine to go, and I am going to get into that more tomorrow.

    But I wanted to start by saying that it is a limited number of dumb controls that screw with the input stream while you are typing that is doing the work here -- Uniscribe filters nothing.

    There are some folks who will like upcoming parts to this series, so I hope that (for example) Andrew West and Martin Hosken are around. Because both of them and a few folks like them, are gonna like this one....

  • Sorting it all Out

    PC LOAD LETTER? What the f**k does that mean [in Chinese]?

    • 9 Comments

    So, I'm still deciding how I feel about this one:

    Foreign words to be standardized

    Chinese media organizations and publishers are banned from randomly mixing foreign languages with Chinese in publications. When it is necessary to use foreign phrases or words, they should be accompanied by a translation or explanation in Chinese, according to a new regulation.

    The regulation was issued on Monday by the General Administration of Press and Publication (GAPP) with the aim of standardizing the use of language in newspapers and other publications, particularly when foreign languages are employed.

    The administration said the increasingly random appearance of foreign words and abbreviations, especially English, in publications is damaging the Chinese language.

    Under the regulation, abbreviations such as GDP (gross domestic product), CEO (chief executive officer) and CPI (consumer price index), which regularly feature in publications, should either be translated into Chinese or followed by explanatory notes in Chinese.

    This includes requiring the use of English place names, people and companies to be translated into Chinese.

    And so on.

    Victor Mair talks about the issue over on Language Log, under the title English Banned in Chinese Writing.

    He dissects the arguments much more effectively than I likely could.

    But I wondered about the next phase. I mean, since in China there is often a long view. And a next phase.

    I wonder if they will decide they feel the same way about localized software products.

    At the most extreme, there are cases like the one I discuss in My name is Michael Kaplan and I work for Корпорация Майкрософт and Майкрософт vs. Microsoft, aka On choosing the most reasonable inconsistency, where a language wants and requires everything to be localized.

    But for Chinese, there are many things kept in English by convention -- like keyboard shortcuts. And menu accelerators. And system account names. And passwords (the latter is because an IME would cause the password to be visible, a potential security issue).

    If it suddenly became a GB18030 or equivalent requirement to do this in localized Windows and not have these bits of English, the product itself would break backwards compatibility, usability, and security expectations for the entire market!

    I am still getting my head around this one, in case it becomes an issue.

    The title's joking reference to the non-assclown version of  Michael Bolton's reaction to an old school hardware error message is to hint at another problem: the ENGLISH is often nonsensical crap to even very knowledgeable English users; it is unlikely that many of such "forced translations/transliterations to Chinese" would do anything beyond worsening the situation.

    One could even make the argument that the purity of Chinese would be injured more by such a move -- it may well be better to leave these acronyms and confusing terms in English to keep this crap segregated from a language that someone has aspirations of keeping out of the gutter!

    OK, more seriously....

    On the plus side, hundreds of "we won't fix so leave us the hell alone" localizability bugs become "shut the hell up because you have to fix 'em" geopolitical bugs.

    But on the minus side, the product wouldn't serve people in China very effectively.

    We'll have to wait and see....

  • Sorting it all Out

    I swear the Romanian bug is fixed; it was fixed 4.5 years ago!

    • 8 Comments

    It was a bit like that Latvian bug.

    The one I described in I swear the Latvian bug is fixed; it was fixed 4.5 years ago!.

    Though I admit I get slightly less email about this other issue.

    In its latest incarnation, the mails that came in the other day, first this one:

    Dear Michael Kaplan,

    I am writing to this email address because I've seen it mentioned in a previous blog entry<
    http://blogs.msdn.com/b/michkap/archive/2010/11/30/10098113.aspx>
    .

    My question is mentioned in the subject of this email: *How to do accent insensitive comparison under Windows*?

    My test case would be:
    voila == voilá
    garcon == garçon
    kase == käse, or better kaese == käse
    arsita == arșiță (Romanian s and t with comma bellow, and a with breve)
    arsita == arşiţă (Romanian s and t with cedilla, and a with breve)
    arşiţă == arșiță

    I've hacked a MSDN C++ sample <
    http://pastebin.com/fG9APFDt>, but it doesn't quite work as expected.

    After reading the documentation of Boost.Locale<
    http://cppcms.sourceforge.net/boost_locale/html/tutorial.html> I was left with the impression that C++ could work as expected (at least "garcon" should be equal with "garçon")

    If it is possible in C++ how about the Romanian s and t with comma bellow characters? The C++ locale for Romanian on Windows is equivalent with ISO-8859-2.

    I've been informed that Microsoft doesn't want to add support for ISO-8859-16 in their products, UTF-8 and Unicode being the future, but how can I use C++ to have "arsita" and "arșiță" as equal?

    Thank you in advance.

    Sincerely,
    Cristian Adam

    and then this other one:

    Dear Michael Kaplan,

    I have found the CompareString Win32 function and I've made a test application <
    http://pastebin.com/6mvkpzxx>.

    The results on Windows XP are:

    ---------------------------
    Comparison
    ---------------------------
    voila == voilá
    garcon == garçon
    kase == käse
    kase > kaese
    arsita == arşiţă
    arsita > arșiță
    arşiţă > arșiță

    ---------------------------
    OK
    ---------------------------

    CompareString on Windows XP doesn't recognize the s and t comma bellow for Romanian language!!!

    Should I file a bug report through Microsoft Connect?

    Sincerely,
    Cristian Adam.

    Aha, now we know what the problem is.

    First of all, you do not need to use that weird way to contact me, use the Contact link on THIS page in THIS blog!

    And then, second of all....

    XP is not going to be updated to support the Romanian S and T with comma below letters.

    EVER.

    There are updates to fonts to display them, you can use MSKLC to create keyboards for them.

    But if you want collation and casing to work, you have to either:

    • settle for the cedilla-below letters, or
    • upgrade to Vista or Server 2008 or Windows 7 or Server 2008 R2 (or later, when later comes out)

    No XP solution to this problem is going to materialize that will give the right answer for accent insensitive comparisons.

    You can search on this site for the Romanian issue with these letters because I have talked about it often in the past.

    No, it is not worth the time of putting in a Connect bug -- because it can't be fixed downlevel. The fix already exists, and has existed for over half a decade....

    But with all that said, the real problem here is not the one I have been talking about, either here or in that other blog about Latvian.

    The "problem" is that there are lots of people who still love XP, and that every day those people notice a problem that they want fixed.

    Hell, I have one machine (a Dell Latitude D820) that I use for many things. It runs like crap on Vista, and Dell refuses to support with Windows 7 on it -- no drivers, occasional blue screens, etc. Though Vista is supported by Dell on it. So believe me, that one machine will be running XP for years to come.

    But if your language is Latvian or Romanian, you have an even better reason to upgrade:

    Because you want your @#%&*! language to work right.

    That one machine of mine that @#%&*! Dell won't support on Windows 7? It is doing specific tasks at home that involve neither language. And I have lower expectations because I know it is running something that was mostly developed a decade ago.

    By the way, that XP machine is running IE8 (I am not part of the 45% of China that is using @#%&*! IE6).

    But as I said earlier in this blog and many times in this Blog, XP is not going to get updated collation, or casing....

  • Sorting it all Out

    Falling back shouldn't mean falling over (though perhaps it does, a bit)

    • 8 Comments

    So it was just the other day, in It's a LIP that won't cost you an arm[enia] and a leg!, that regular reader Random832 commented:

    Why is U+0580 ARMENIAN SMALL LETTER REH in Verdana and the rest of the characters are in Sylfaen? It seems that Verdana contains exactly two Armenian letters [the other being U+0564 DA]; that's very odd.

    It looks like the currency symbol for Armenian (Armenia) is U+0564 U+0580 U+002E, which may explain it, but it still seems like a bizarre decision to support only part of a script.

    And indeed, things are exactly that way:

       

    And they are just two lone Armenian characters.

    They are in there for no other purpose, simply to cover the currency symbol!

    Of course over in Sylfaen, the font that has all the Armenian characters, both of these letters can be found:

         

    Weird, huh?

    Of course, when something is in Armenian people wonder whether it is done for Georgian.

    But we are covered there:

    That Latin script text (Lari) is in Verdana, obviously.

    Though to be honest the currency support for Georgian seems weird to me anyway, given the exchange rate of Lari, but that can be a topic for another day, perhaps. Today is for Armenian....

    Now at first I felt the same as Random832 did -- just a few random letters, to support the currency symbol? really?

    I mean, when you think about SQL Server's support for currency symbols (described in blogs of mine like Show me the [small]money!), it only takes some of the explicit currency signs that exist in Unicode -- it does not detect the text strings used in locales like Armenian's as currency signs.

    And yet, fonts like Verdana do a job on supporting the currency signs and symbols of many different locales, like a whole bunch in the currency symbols block:

    But does this really help anything in the long run?

    I mean, do we really expect there to be a common case where one is on a machine without Sylfaen where they will be looking at Armenian (and many other) currency values but no Armenian text and information?

    It just seems like a really contrived scenario.

    Though I suppose on a site with lots of currency transactions it would save one from having to load up lotsof extra fonts.

    Maybe it helps someone's performance just a little bit....

    Okay, I can almost talk myself into it, see?

    Then, I try one more thing: I look at the letters in Verdana on the left vs. in Sylfaen on the right, side by side, at various sizes:

    Crap.

    These letters in Verdana are much more like the style of the other characters in Verdana -- and not like the letters in Sylfaen!

    So on a web page that requests text be in Verdana, these two letters will come from Verdana and then when other letters are requested, they will be grabbed from Sylfaen via Uniscribe.

    And you will have one of those weird font mismatch situations.

    For two letters. Two Letters. But no others. The metrics and the style and the look, all are different....

    Okay, I am back where I started -- these letters in Verdana are not such a great idea.

    In fact, suddenly many of these currency symbols in fonts like Verdana seem like not such a great idea, given the massively different styles of some of the language specific fonts....

    That same type of problem I discussed in The utility of a feature like font fallback in Uniscribe can often be somewhat obviated by its design flaw, given a whole new dimension.

    And suddenly fonts are not falling back so much as falloing over. Again....

  • Sorting it all Out

    Anti-Microsoft conspiracy theories are fun #5 (aka Microsoft is not supporting the terrorists, dammit!)

    • 8 Comments

    Okay, so when I wrote in Anti-Microsoft conspiracy theories are fun #3 (aka Why the hell can't they just update Uniscribe?) that this was unlikely to be a series, I might have lied.

    But conspiracy theories are rife with lies, aren't they?

    Just ask a grassy knoll....

    The numbering scheme even suggests a series, and a conspiracy minded one, e.g. #4 was redacted, perhaps?

    Though this may be it. Maybe.

    Now today's blog is based on an entirely true story.

    Not historical fiction, mind you.

    This is stuff that actually happened.

    And it all started with an email I was sent.

    It went on for a while, but the critical piece was in the closing:

    I used to love your blog, but if you can't denounce Microsoft's support of Yemini terrorists, I will have to boycot you just like I am boycoting Microsoft.

    He also had already chosen to boycott the spell-checker? :-)

    The mail itself included screenshots of Vista and Windows 7 and Word 2010, all of which showed stuff like this:

        

    There were like a dozen similar screenshots across many versions and products and dialogs.

    Also, there were many links to various news stories that talked about the connection between some of the recent failed terrorist attacks against the USA and the country of Yemen.

    He sent it to me before the Wikileaks links that gave further pointers involving Yemen, or he probably would have included some of those links too.

    Anyway, the argument being made was a simple logical exercise:

    • Yemen supports Terrorism, and
    • Microsoft lists Arabic (Yemen) in its locale lists and Yemen in its country lists, so
    • Microsoft supports Terrorism.

    Now obviously there are flaws in this argument, but I think anyone can point those out.

    I could point out that the locale has been around since Windows 95 and NT 4.0, but again that's easy stuff to find out.

    Kind of boring.

    And people who like conspiracy theories really want a bit more excitement than dull old logic can provide.

    So why not let's have a little fun, kay? :-)


    First, my disclaimers:

    I don't support terrorism.

    and

    As far as I know, Microsoft doesn't support terrorism, either.

    The weasel words there are just because I am obviously not informed about every single project at Microsoft. Though since support of terroristic acts would seem to violate the corporate values, I think this is a reasonable belief.

    Okay, now with that said and out of the way....


    Thus far, it seems like the bulk of the attacks coming from Yemen are the ones that aren't succeeding.

    This suggests several possibilities that I won't enumerate just yet -- I'll come back to it in a moment.

    So, take a look at the Wikipedia article on Yemeni Arabic.

    It discusses the extensive differences in phonology, morphology, vocabulary, and syntax for the various dialects of Arabic used in Yemen.

    The single localization that Microsoft does into Arabic really centers around the Arabic used in Saudi Arabia. The differences between Arabic in Saudi Arabia and in Yemen are significant enough that sometimes there are "translation" efforts needed so people can understand it -- which suggests a certain degree of lack of mutual intelligibility. You know?

    And of course Microsoft has been supporting Arabic for some time now -- with localized versions for way over a decade in various versions of Windows and Office.

    Now, stitching together these small islands of factlets that are all true into my own little conspiracy theory:

    For the last decade, Microsoft has insinuated its version of Arabic that is not completely mutually intelligible with Yemeni Arabic. Those small differences led directly to the problems that these recent failed attempts terrorism ran into. Because problems trying to use Microsoft software that the terrorists didn't fully understand kept them from setting things up properly.

    I'll admit that all seems pretty unlikely.

    I mean who honestly believes that any group in Microsoft can keep a single coherent vision that spans multiple versions like that?!? :-)

    I mean, really. No one who knows anything about Microsoft would believe that!

  • Sorting it all Out

    On disliking Emoji, disrespecting code pages, and not looking past dogma

    • 7 Comments

    It will come as no surprise to people who know my "World Readiness" persona that I am not so fond of the Emoji (like those added to Unicode 6.0).

    For those who read this blog, it has come up before in the past here.

    I don't like them, because they feel to me like such a betrayal of some of the core principles that I used to espouse.

    And I still believe that.

    But I like enough of what Unicode does that I can see past that.

    I mean, here at Microsoft, for every Kin there's a Windows Phone 7, for every Vista there's a Windows 7, for every Bob there's also a Kinect, and so on. Overall I like the stuff that comes out of Microsoft, even though the working for them part feels more ordinary these days.

    And I still believe that (both of those thats).

    Now I think about the time I was on the NLS team (back when they were actually still called the NLS team) where everyone pushed so hard to get people off of legacy code pages and into using Unicode.

    And everyone has been quite consistently on message for that.

    I still believe that message is a correct one, and I believe there is no place for new code pages for languages.

    But when I think of the Emoji, when I know that some people will probably want to use and support them (like the phone and instant messaging and email and so on), and when I think of files like Unicode's EmojiSources.txt from the Unicode Character Database, I wonder if maybe my belief system, and the belief system of all of those who have been espousing the above, might be missing out on something obvious.

    From that file's header:

    # EmojiSources-6.0.0.txt
    # Date: 2010-04-24, 00:00:00 GMT [MS]
    #
    # Unicode Character Database
    # Copyright (c) 1991-2010 Unicode, Inc.
    # For terms of use, see http://www.unicode.org/terms_of_use.html
    # For documentation, see http://www.unicode.org/reports/tr44/
    #
    # This file provides mappings between Unicode code points and sequences on one hand
    # and Shift-JIS codes for cell phone carrier symbols on the other hand.
    # Each mapping is symmetric ("round trip"), for equivalent Unicode and carrier
    # symbols or sequences. This file does not include best-fit ("fallback")
    # mappings to similar but not equivalent symbols in either mapping direction.
    #
    # Note: It is possible that future versions of this file will include
    # additional data columns providing mappings for additional vendors.
    #
    # Format: Semicolon-delimited file with a fixed number of fields.
    # The number of fields may increase in the future.
    #
    # Fields:
    # 0: Unicode code point or sequence
    # 1: DoCoMo Shift-JIS code
    # 2: KDDI Shift-JIS code
    # 3: SoftBank Shift-JIS code
    #
    # Each field 1..3 contains a code if and only if the vendor character set
    # has a symbol which is equivalent to the Unicode character or sequence.

    If the Japanese telcos or those products I mentioned or whoever needs to map from their various proprietary mappings to and from Unicode, then that is essentially what code pages are all about.

    Perhaps being dogmatically against code page support is really not such a good idea.

    Perhaps the focus should have been on language support (which really requires Unicode) and that Microsoft has to support things like GB 18030 anyway, and not been so against the concept of code pages for mappings that can still make sense.

    Like the vendor mappings between Emoji and Unicode, whatever they may be.

    I mean, every time one of my friends with an iPhone (and there are a lot of them) sends a tweet via Twitter and there smiley face emoticons are private use area characters, I know that Microsoft is not alone -- Apple is making the exact same stupid mistake that Microsoft is, albeit in a different way.

    In my opinion, there should be symbol mappings added to a brand new code page (or if necessary multiple code pages) to support this key scenario.

    Claiming that Emoji are crucial (which many people do claim) and not providing the proper mappings between them and the random crap that people are using throughout the world because of a frenzied dogmatic belief that code pages are evil and so no new code pages should be supported is a really bad product decision coming out of a really bad belief system.

    With that said, I have minimal say in what these product groups do. Many of them read here and listen to what I say, shortly before they ignore it and do what they had already decided what to do.

    In that way I am like a not-as-well-paid version of Ray Ozzie, whose thoughts on issues such as privacy and career stage profiles and the Cloud are brilliant and deserve better than to be discounted by the huge percentage of MS discounting them....

    So I would be truly surprised to see things supported this way when the time comes to do the work to support Emoji. No one wants to admit they took a belief too far, since that would mean implicitly admitting one was wrong about something (and who wants to do that when the next review is in their minds?).

    I'm no better, mind you; I doubt I'd be writing this particular blog if I was still on the NLS team.

    Perhaps it is time to move on to a good idea, instead. That would be much better than forcing everyone to roll their own.....

  • Sorting it all Out

    People just want what they want, whether they have permission or not

    • 6 Comments

    The question that came onto the distribution list was:

    I have a customer who has removed administrative rights for the users in his domain and since then those users are not able to see the administrative tab in the Regional and Language Options on their respective workstations. The workstations are running XP SP2.

    I searched on this and found that this is something by default and hard coded in the OS and thus difficult to work around. Can somebody confirm this please?

    Thanks for your help.

    Of course what these non-administrative users wanted to do in the Advanced tab of Regional and Language Options in XP is less clear:

    Every single item there requires administrative permissions to do, since they all involve addng/deleting files in the SYSTEM32 directory and/or changing registry information under HKEY_LOCAL_MACHINE.

    it was in fact Raymond who set the record straight on that internal alias about whether or not one could find formal documentation on whether one coukld work around this issue; summary version: one can't.

    Maybe they just wanted an easy way to view what the default system locale was, or which code pages have been installed.

    Now there are nuances here -- the generic check for administrative rights would not handle special cases where registry permissions were modified in unusual ways to try to make all of the operations on this tab work (and we don't really document what that method is, or even what changes get made, in any real way).

    But of course given the fact that the issue is addressed for everything Vista and later makes the idea of modifying XP to support something that changes no real functionality would be pretty unlikely.

    If one wanted to query information writing a simple application that does that bit is a lot more likely, and easy to do.

    But there is no way to win here, completely, since:

    • if you hide something they complain it's not there;
    • if you have it there but disable it, they complain they can't change it;
    • if you enable it and let them know via an error message that they cannot make the change in question, they ask how to work around it (and complain that the change was allowed in UI if it wasn't going to work).

    Summary: people just want what they want, whether they have permission or not.....

  • Sorting it all Out

    Oceania is at war with obcaseinsensitive; Oceania has always been at war with obcaseinsensitive

    • 6 Comments

    I remember when I first read 1984.

    It was a few years after the original publishing, as I was not born until over two decades later and I was quite lazy in my pre-zygote reading phase....

    Anyway, the thing about 1984 that struck me at the time, that distracted me to no end, was how stupidly it portrayed people.

    I mean the notion that people could yell at the top of their lungs about their longtime enemy Eastasia and their longtime ally Eurasia, only to reverse the roles minutes after a news announcement?

    I remember asking the teacher whether that was like when my mom would say vegetables were good for me when I knew she was dead wrong but to keep the peace I let her false statement stand.

    No, she explained, in 1984 these people truly believed what they were saying.

    "Well then," I concluded, "they were stupid!"

    The teacher pulled a quote out of bumper sticker, suggesting that I never underestimate the power of human stupidity.

    Good advice (probably 50% of the interesting things this teacher ever said to me!). But I digress.....

    Anyway, it turns out I was wrong back then.

    It wasn't stupidity that was at issue here, it was the [possibly subconscious] recognition that the situation has changed and re-aligning themselves with the new situation. Instantly. Since the memory has no benefit in this situation, hanging onto it would be doubleplusungood.

    Which brings me to today's blog.

    And obcaseinsensitive.

    Regular readers might remember my Blitzkrieging the landscape.NET (aka in this case, two) (aka When is obcaseinsensitive not ObCaseInsensitive?) from early 2008.

    It talked about a registry key. A registry key that was always a particular setting unless people changed it - which was rare.

    It explained how for the sake of ASP.Net, the Developer Division's CLR took that registry key and decided to always change it. Even though that broke some people. And essentially made the value entirely useless since it was reseting the people who had changed it andf there was no way back.

    I probably wasn't much of a fan of ASP.Net before, but I freaking hated them after that, most of the time....

    Then this mail came the other day to one of those DLs I belong to:

    Hi All,

    I am using the code below to do a case sensitive file search.  
    hFind = FindFirstFileEx(argv[1], FindExInfoStandard, &FindFileData, FindExSearchNameMatch, NULL, FIND_FIRST_EX_CASE_SENSITIVE); statement does not seem to work as documented FindFirstFileEx Function (Windows).

    File on disk: "D:\TestFindFirstFile\Debug\Test.txt"
    Input : "D:\TestFindFirstFile\Debug\test.txt"

    Excepted Result : FindFirstFileEx to fail giving error indicating that the files differs in case.
    Actual Result : FindFirstFileEx does not fail nor indicate files differs in case.

    Please clarify if this is behavior by design or if this is issue or am I using the API incorrectly.

    Thanking you in advance.


    #include <windows.h>
    #include <tchar.h>
    #include <stdio.h>

    void _tmain(int argc, TCHAR *argv[]) {
       WIN32_FIND_DATA FindFileData;
       HANDLE hFind;

       if( argc != 2 ) {
          _tprintf(TEXT("Usage: %s [target_file]\n"), argv[0]);
          return;
       }

       _tprintf (TEXT("Target file is %s\n"), argv[1]);
       hFind = FindFirstFileEx(argv[1], FindExInfoStandard, &FindFileData, FindExSearchNameMatch, NULL, FIND_FIRST_EX_CASE_SENSITIVE);
       if (hFind == INVALID_HANDLE_VALUE) {
          printf ("FindFirstFileEx failed (%d)\n", GetLastError());
          return;
       } else {
          _tprintf (TEXT("The first file found is %s\n"), FindFileData.cFileName);
          FindClose(hFind);
       }
    }

    And then came the reason:

    Is HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\obcaseinsensitive set to 1?

    If you want this API to work as documented, you'll need to set this regkey to 0.  Setting it to zero has the effect of enabling case sensitivity across the system.  Although this is notionally opt-in (Win32 apps tend to opt-in) at the NT layer it's really opt-out (you get case sensitive unless you ask for insensitive.)  The practical effect of this is changing the registry key "lights up" latent bugs in NT applications and drivers that are not expecting case sensitive behavior.  So if you change this registry key, things might work - but they also might not, because they're probably not tested.
     
    You might then assume that seeing this key is 1 means you can ignore case sensitivity.  That's not really a valid assumption either.  After all, files could have been put on a disk differing only by case when the key was 0, and now it's 1, but those files are still there.  Or, you could be talking to a network share where your client promises to not create files differing only by case, but not all clients make the same promise.  Etc, etc.

    At this point, since pretty much every machine has this registry value set to 1, a flag that only works if it is set to 0 is a fairly useless bit of documentation in FindFirstFileEx for the FIND_FIRST_EX_CASE_SENSITIVE flag.

    I mean, is there any point in claiming otherwise? It is just a waste of time to pretend one would ever think it was something one might expect to be 0. A developer certainly shouldn't write code against it assuming a 0 will be there....

    Especially since Windows sets obcaseinsensitive to 1, and Windows has always set obcaseinsensitive to 1.

    Well, as far as any of us know.

    And at last, I have won the victory over myself. At last I love obcaseinsensitive.

    Um, hang on, that analogy broke down in the end.

    I'll work on it....

  • Sorting it all Out

    It's a LIP that won't cost you an arm[enia] and a leg!

    • 5 Comments

    THE WINDOWS 7 ARMENIAN LANGUAGE INTERFACE PACK IS LIVE!

    Click here to download the Armenian Windows 7 LIP via the Microsoft.com Download Center.

    Please note that the Armenian Windows 7 LIP can only be installed on a system that runs a English client version of Windows 7.   It is available for both 32-bit and 64-bit systems on the Download Center.

    The Armenian Windows 7 LIP is produced as part of the Local Language Program sponsored by Public Sector.

    A LITTLE BACKGROUND INFORMATION ON ARMENIAN

    NUMBER OF SPEAKERS

    7 million speakers

    NAME IN THE LANGUAGE ITSELF:

    Հայերեն

    Armenian is spoken by approximately six million speakers worldwide. While 3.5 million speakers live in the language's historic homeland, Armenia, nearly as many speakers live in the so-called Armenian Diaspora. There are two major dialects. Eastern Armenian is the official language of the Republic of Armenia but is also spoken by the many Armenians in Iran (370,000 speakers) and by Armenian speakers in Georgia and Azerbaijan. Western Armenian was originally spoken in the large Armenian communities in Anatolia before the massive atrocities in the last years of the Ottoman Empire in which nine tenths of the Armenian population were killed and hundreds of thousands uprooted. Western Armenian is now spoken especially in Syria (300,000) and Lebanon (235,000), but also in the United States (175,000) and many other countries. The two standards evolved when one part of Armenia was under Russian, the other one under Ottoman rule in the 19th century. During the Soviet rule of Armenia the two standards diverged further so that today they are not readily mutually intelligible.

    Old Armenian (also called Grabar) for which documents from the fifth century exist, was used as a literary language until the 19th century and is still used by the Armenian church.

    FUN FACTS:

    • The sound system of the Armenian language has some distinctive features: It is rich in combinations of consonants, especially in affricative sounds such as j, ch and ts.  It also has ejective sounds. These are made by using the vocal chords instead of the lungs for pushing out air and were probably borrowed from surrounding Caucasus languages.

    • Armenian has a rich case system for nouns (7 cases) but no grammatical gender. Most old synthetic verb forms have been replaced by analytical constructions (i.e. forms that utilize an auxiliary verb).

    Click here for more information about the Armenian language.
     
    CLASSIFICATION:

    Armenian is an Indo-European language, forming its own independent branch in this vast family. Because of the huge percentage of loan words from Iranian languages it was mistakenly classified until the end of the 19th century. 

    Click here for more information about Armenian classification

    SCRIPT

    Armenian is written in the Armenian alphabet which was created by Mesrop Mashtots in 5th century AD and consists of 39 (originally 36 letters).


    Click here for more information about the Armenian script

    Enjoy!

  • Sorting it all Out

    Part #1 of "You think this is better. Really?"

    • 5 Comments

    This is a story of a site. Well, half a story about a site. Part 2 is already written, and will be published tomorrow.

    Regular reader Yuhoing Bao told me via the Contact link:

    Go to GlobalDev and find one of the DBCS codepages. On these pages there will be hyperlinks on DBCS lead bytes. Click these links and see what you get

    Okay, I am going to disobey the suggested directive here, as I need to start somewhere else in order to give the narrative, the history of how we got here.

    Gather round!

    Once upon a time, Microsoft put up the Global Software Development website at http://www.microsoft.com/globaldev for everyone to enjoy. Here is what was on the home page:

    Welcome to Microsoft's revamped /globaldev site! Although it seems to have taken us forever to update these pages, we've finally managed to get around to it!

    We've added, and will continue to add, content aimed at helping you develop software that meets the language and locale needs of all your users. Check out our FAQ pages, dealing with subjects such as locale support and the Multilanguage Version of Windows 2000. Look up your favorite Windows Codepages. Or relax with our checklist of recommendations for software globalization.

    If you're a creature of habit, you'll be glad to find that our ARCHIVES contain all of our previously available material, some of which is sure to remain useful and relevant to your work.

    Of course, to make this site truly useful we need your feedback. Please email us to let us know what you think about these pages, and what kind of information you'd like to see posted here in the future. We'll do our best to oblige.

    That was from February of 1999.

    The site was a wonderful resource for many years.

    I briefly had a stake in it myself, when I was Dr. International (as I mentioned here).

    Now I am not going to say i wass the biggest, most important part; I wasn't. And neither were many of the random articles.

    Like with the first edition of Developing International Software, the mot important piece for many people was the data -- the code pages, the keyboards, the languages, the locales. the concrete things people could look at and count and reference.

    Anyway, the problems started in I guess late 2007 or perhaps early 2008. But I think it was late 2007.

    Everybody was talking about how the big problem with International was that nobody knew about it. And that the GlobalDev site was hard to discover.

    Even then I disagreed -- the problem with their premise was that most people out there really didn't care. And for the people who did, there were only two real problems:

    • Content not existing, and
    • Search (inside or outside the site) not finding content that does exist.

    But I only mentioned these two points occasionally, because people were often bitchingcomplaining about my attitude about the project.

    Anyway, everyone was convinced that a new location would make a huge difference. Some people even expanded on that and gave reasons like "IT Pros don't want to go to a site call GlobalDev-- because they are not devs."

    Seriously?

    Or to quote a Windows Phone 7 commercial I like a lot,

    Really?

    Like I said, people simply need search to properly index the content. Assuming the content is provided, that is all anyone needs!

     As far as I was concerned, if you are an IT Pro who is looking for an answer who refuses to go to a site called GlobalDev because it says GlobalDev, then you are a whiny little biatch who deserves to get firedperson who needs to seriously rethink your career choice. That disastrous year where they split TechEd in two between dev and itpro even though the ones who self identified in group often had interests in the other.

    The IT Pros I know never mind things being hard to find after they have found it, because now  they are inherently more valuable than some random person who doesn't know; they are not that much more valuable. I mean, c'mon!

    But the move itself scared me beyond that, in practical terms. I was worried about links being missing, content not getting found. I was worried about the way MSDN didn't use regular, predictable URLs and tended to move around from time to time -- GlobalDev was in the same place for that whole time. I was also worried whether the International Fundamentals team could own such an effort, lacking the long-term ability to provide all of the content. I know I wasn't interested in more writing than I did here, in a more formal style where I couldn't say what I really thought. And I was worried about the core teams who weren't sitting on the hands doing nothing eager to do lots of writing -- so where would the content come from?

    I was assured by many people that I was being overly pessimistic. That those problems could be solved.

    You might have heard about the new site. I mentioned its soft opening in March of 2008 in Yesterday was GlobalDev; Tomorrow is GoGlobal! and its hard opening in July of 2008 in GoGlobal NOW!.

    The site was okay.

    I often got complaints that they couldn't find stuff looking at the site. I would help them by figuring out where the content was and giving them the new link. I didn't scream loudly at the problems on the site at this time, because the core stuff was still findable at least.

    None of the teams that own the technologies on that site spend much time as owners of content there -- of updating it, fixing it, of providing new stuff. There was the one huge push to get a bunch of new MUI content up and that was nice. But as if desiring to prove me right, they didn't do anything after that and no other team jumped in to that degree.

    Try going to the only two things that most people care about -- the code pages and the keyboards. Those links aren't on GoGlobal's home page, and there is no easy way to divine how to get to them. Search works okay here at least, but search also finds KB articles that point to the old site. Oops.

    The most active thing on the Go Global Development Center used to be the links on the home page to my blog -- it was subscribed to the feed, but those were taken out recently. The blog links there right now are other people's blogs from 2009! No other interesting new content and hard to find the stuff you do need -- you have to use the search, which is what they should have had in the first place, like I said when all this started....


    This blog will be continued in Part 2, found here, tomorrow. Unless you are me, the link will not be active until January 1, 2011 at 7:01am....

  • Sorting it all Out

    "Comic Sans Fixed", that rather sansless dream-o-mine

    • 5 Comments

    In honor of some principle or another that I find honorable, this entire blog is wrapped in a Comic Sans span!

    I like Comic Sans MS.

    There, I said it.

    Prior to that I was one of those "There are only two fonts: Arial, and Arial 12pt" types, but the I started enjoying Comic Sans.

    Ever since Athena (that terrible MS chat client that got me off the "unpaid MS product support" bandwagon I was on from the CompuServe MSAccess forum days like it was a straight shot of an ebola and hantavirus cocktail.

    But I can't hate Athena completely, because she brought Comic Sans MS.

    Hell, even after a Q&A with Gary Hustwit (the director of the film Helvetica), when asked what would the sequel be, he said that a sequel was unlikely, except maybe Comic Sans. He smiled as we all did, but I think everyone realized that public "Comic Sans spotting" would be as much fun as "Helvetica spotting", and maybe even moreso for the Comic Sans haters....

    I even do my own Comic Sans spotting, like in The Company Meeting, the interesting science of Forensic Typography, and what happened after.

    Now I realize you may not like it. You may be a disciple of Ban Comic Sans, or  an acolyte of Comic Sans Criminal.

    If so, I feel sorry for you.

    I mean, Comic Sans is a hero, like the "This video wasn't long enough, so we made it double-spaced" Font Conference College Humor video suggests:

    Even if he sometimes shows up a little too late to actually, in fact, save the day, like this other College Humor video entitled Font Fight covers (which as a bonus finally settles the Helvetica vs. Arial issue, in a stunningly satisfying way):

    But beyond that, I just like the font.

    Now in blogs like

     in addition to talking about the serious issue of trying to bring the notion of fixed width fonts to complex scripts and trying to get Consolas into the Console, I did bring up my fantasy font -- a fixed width version of Comic Sans:

    Comic Sans Fixed

    In fact in another blog (Look out for Font Rage) I shared my primitive notion of this font:

    But in the end, as a going away present from my friend Carolyn Parsons, I finally have it.

    You can see it here in Character Map:

    though to be honest it looks the same here as regular Comic Sans does, mostly.

    So here is a view that kind of shows the effect of it compared to its ancestor:

    And maybe you can think about it in a code window.

    Or look at (for example) this one from the Frosting project that was about encodings that no one ever used from ten years ago:

    This is thoroughly amazing!

    Now maybe this font isn't for you.

    Actually, I am sure it isn't for you, since I have no permission to redistribute it! Though Simon did get a copy so it could conceivably ship one day if there were some Microsoft product or scenario designed to cater to people as batshit crazy weird as I am.

    But I am pretty happy now in regard to my font situation.

    In fact, the one thing that can make me happier (and which Sometimes everyone is happier when the game is Fixed (aka Consoling Consolas lovers) explains how to provide) is Comic Sans Fixed in the Console!

    How senselesssansless is that? :-)

  • Sorting it all Out

    I agree with you 100%. But we're both wrong (according to the spec)

    • 5 Comments

    It was last week, in response to You can't ignore crap and hope it won't cause problems... that Cheong commented:

    Yet in the question, we'd expect tar.IndexOf(r) returns -1 because content of r does not exist in tar. I can imagine having it return 0 will case some infinate loop problem in certain data stream processing functions if they're lazy enough to use string manipulation functions to process data.

    This ends up being an interesting design decision!

    I'll explain how things ended up where they did.

    First comes the easy question: What do you return for "meow".IndexOf("") exactly?

    The explicit decision that was made was that every string both implicitly starts and ends with the empty string.

    Thus returning 0 here "makes sense" by that design.

    Ignore the "".IndexOf("") inconsistency here, of course!

    I believe Java does the same thing, though it has been a while since I have done much with Java. Perhaps someone else can confirm.

    Now I think the design is kind of stupid, for what it is worth. For the same reason Cheong was thinking -- the possibility of infinite loops.

    In fact, the very first version of FindNLSString that I checked in had behavior I believed to be more intuitive, but it was actually my manager at the time who came to me shortly thereafter who mentioned I was not being consistent with .Net. And since that was the whole reason FinsNLSString was being added, this was a blocking issue.

    Now while grumbling and doing the research to get the behavior consistent (I was doing both at the same time since consistency with what I thought of as incorrect design is a worse sin than being half right), I found several inconsistencies in .Net as well. That manager found these inconsistencies very frustrating (though in truth he isn't the one who caused the problem; the parts he wrote were consistent), and he jumped in to fix the managed code to be consistent while I fixed the [new] native code to have the same behavior that he was busy making sure would be consistent.

    Anyway, where was I?

    Oh yeah, with "hiss".IndexOf("") returning 0.

    Now when you have strings with no weight, they compare as linguistically equal to the empty string.

    Thus "\uFFFD".Compare("") is expected to return 0.

    Now there are some standards bodies in parts of the world I am not going to name at this moment that would take statements like:

        "hiss".IndexOf("") == 0
        "\uFFFD".Compare("") == 0

    and then make the claim that

        "\uFFFD".IndexOf("") != 0

    but for the sake of a fragile attempt at consistency, this route was not taken -- and thus the zero length string is indeed assumed to adorn the front of that string.

    Native code and managed code still look at things that way, and huge chunks of the checkin suite verify this behavior is not broken by well-meaning developers who might try and "fix bugs" without realizing that they aren't considered bugs....

    So, to summarize the point to Cheong, I agree with you 100%. But we're both wrong according to the spec.

    Perhaps the spec was wrong, but I'm pretty sure taking that route with my changes would have created an uncomfortable working environment for me back then. and I doubt I would have won the argument in the long run anyway.... :-)

  • Sorting it all Out

    Anti-Microsoft conspiracy theories are fun #6 (aka If you use .Net, you may have a stupid .Parent)

    • 4 Comments

    Regarding the main title of today's blog, this still is not a @#%&*! series.

    For today's blog's alternate title, I am relying on the fact that the vast majority of technical content on this blog refers to something with internationalization or globalization or localizability overtones.

    I mean, I am not saying that your parents are stupid. I don't even know most of your parents, after all.

    This is not an anti-Microsoft consp[iracy theory claiming that if you use .Net that this indicates your parents screwed something up.

    And I am not saying that Anders Hejlsberg  , the man who for all intents and purposes is the father of C#, is stupid. Because he isn't.

    And I am really not saying that most of the .Net Parent properties are stupid, because (a) I don't know every one of them, and (b) most of the ones I do know aren't stupid. Statistically speaking, some may be, but none of them I know.

    Well, with one exception.

    CultureInfo.Parent is stupid. Really stupid.

    Perhaps this would be an appropriate time (after the provocative eye-catching title and cutesy introduction with art and bold central statement - my basic formula) to explain the basis for the claims of the aforementioned statement.

    I'll start from another topic, that talks about internal usage patterns of the property rather than the property itself.

    Like PropertyInfo.SetValue Method (..., CultureInfo)'s description of its CultureInfo parameter:

    The CultureInfo object that represents the culture for which the resource is to be localized. Note that if the resource is not localized for this culture, the CultureInfo.Parent method will be called successively in search of a match. If this value is null, the CultureInfo is obtained from the CultureInfo.CurrentUICulture property.

    Or ResourceManager.GetString Method (..., CultureInfo)'s description of it's CultureInfo parameter:

    The CultureInfo object that represents the culture for which the resource is localized. Note that if the resource is not localized for this culture, the lookup will fall back using the current thread's Parent property, stopping after looking in the neutral culture.

    Not for nothing, but the fact that these two descriptions and others like them are different is also a bug. As are some of the descriptions themselves.

    Anyway, you get the idea -- the usage of this property is for resource fallback.

    So that if you look for resources of one culture but cannot find them then it knows to fall back to another culture, and so on.

    This is not stupid.

    Well, it is not too stupid. It is a little stupid since in most cases this can also be done by simply looking at the string and chopping pieces off successively. But this potential bit of stupidity is obviated by the fact that the are exceptions to that principle, like

    zh-TW ---> zh-CHT --> {Invariant}

     and such. Thus having it in a property is sensible.

    Okay, now let's look at the property's own documentation and its description of this CultureInfo.Parent property whose usage is widely, if inconsistently, described:

    The cultures have a hierarchy in which the parent of a specific culture is a neutral culture, the parent of a neutral culture is the InvariantCulture, and the parent of the InvariantCulture is the invariant culture itself. The parent culture encompasses only the set of information that is common among its children.

    If the resources for the specific culture are not available in the system, the resources for the neutral culture are used. If the resources for the neutral culture are not available, the resources embedded in the main assembly are used. For more information on the resource fallback process, see Packaging and Deploying Resources.

    Um, where do I start?

    Well there is the whole Packaging and Deploying Resourcestopic that often has different rules, but we'll set that aside. Yes it makes the rules more complicated, but first let's focus on the simple design without dragging in signatures.

    Okay, so we have established that Microsoft means for this property to support something more interesting than I can do myself parsing the name for dash delimiters.

    But all of the interesting work that happens in LIPs to fall back like ca to es? Not in there.

    The fact that most language (like Arabic and English and French) only ship one version despite having 5-10 or more locales yet the fallback does not fall back to that language that would help them out? That'd not in there either.

    Office's LCID-based fallback model? Also not covered well -- they have to do their own thing.

    And the early adopter of Windows language support via things like LOCALE_IDEFAULTLANGUAGE? That isn't supported, either.

    Don't even get me started on claims in docs like

    The parent culture encompasses only the set of information that is common among its children.

    that fail the smell test for cases like sr-Latn-CS and sr-Cyrl-CS that have different scripts and code pages yet both fell back to sr for years. Same story with az-Cyrl-AZ and az-Latn-AZ have the same story, as does uz-Cyrl-UZ and uz-Latn-UZ.

    Then let's talk about how everything falls back to Invariant -- which has a name of the empty string and thus cannot ever have a directory with resources in it since a directory with no name is illegal on Windows.

    Okay. So we have a model that isn't going to match the majority of incoming or outgoing traffic. for Windows, Office, or anyone following Windows or Office (the largest and for most large scale purposes the only significant adopter of MUI on the platform -- including .Net's own defaults (that start from those of Windows). And which fails to match BCP-47 in key interesting areas for many years (as China and others might note).

    This is just stupid. A terrible design bolted atop a platform that has a design that for better or worse supports more than 60% of Microsoft's revenue.

    Now one could blame Windows/DEVDIV rivalries for these dumb incongruities or a slow burning dud of a Silverlight sorta fiasco later recovered, but one would be wrong in this case, since the design of CultureInfo and the CultureInfo.Parent were designed and implemented by Windows -- and they own the data as well, tho0ugh not the technoloogy that does the loading.

    "But we were just following orders!" the resource loading code would claim at the "Code Crimes" trial.

    Perhaps it is yet another conspiracy theory -- Microsoft screwing over the Developer Division by providing them with an unusable language resource loading model fallback plan that is not compatible with Windows even when managed components try to run on Windows?

    Nah.

    I'm overthinking it again like I did in Anti-Microsoft conspiracy theories are fun #5 (aka Microsoft is not supporting the terrorists, dammit!) and Anti-Microsoft conspiracy theories are fun #3 (aka Why the hell can't they just update Uniscribe?), ascrtibing to brilliance or malice what almost certainly not neither.

    It's just stupid....

  • Sorting it all Out

    Anti-Microsoft conspiracy theories are fun #3 (aka Why the hell can't they just update Uniscribe?)

    • 4 Comments

    Okay, this is unlikely to be a series, so pretend the title doesn't have a #3 in it.

    The inmates are still running the Unicode List asylum, so the situation is normal....

    A recent thread entitled Samogitian E with dot above and macron proved the point once again....

    After some of the typical bitter diatribes about evil Microsoft which I'll skip since you can just go to alt.i.hate.microsoft to see that kind of thing and don't need it here, the tail end of the thread (as of last night) was in this message:

    Peter Constable wrote:

    > Updates to any Windows component in a given version of Windows will
    > rarely add new functionality. This is the case for reasons of
    > compatibility and stability: for too many users, any change in
    > functionality entails unwanted costs either in the form of re-training
    > or in the form of incompatibility risks.

    I know this has been said before, but: I do wonder how many users depend
    on new, incremental functionality *not* being available, as a matter of
    stability or backward compatibility.

    "Dang it!  Microsoft added support for Bamum and Brahmi and Mandaic to
    my XP system!  Everything is broken now!"

    (Yes, I know there are a few people who would say just that.)

    If the goal is to get people to update to 7, it might be better PR to be
    honest and just say so.

    --
    Doug Ewell

    To be honest (which I bring up since Doug did, and not to contradict Peter!), the biggest factor involved with even the potential backport is the huge test matrix one would have to plan out, which is only partially an AppComapt issue.

    This test matrix is complete with:

    • multiple, random versions of Windows with
    • multiple, random versions of fonts with
    • multiple, random versions of Uniscribe.

    If you are a tester you might already be thinking about what that breakout might look like.

    Oh yeah, you also may have to  have someone test Office (which builds on Uniscribe and often has trouble with tweaks between versions, which is why some parts of use their own), and SQL Server (whose Reporting Services also make use of Uniscribe, or in earlier verions GDI+).

    If you don't also update fonts (to try to reduce the test matrix), then support can also not be there and font fallback schemes might fail.

    Wait, many new characters aren;'t in the sorting tables.How the hell can we say we support something if it compares equal to "" (an empty string)? Crap, we'll have to test that too, if we add the support.

    Or in the keyboards. How the hell can we say we support something if it can't be typed? Ditto on the Crap, and on that testing, too.

    But if you do update fonts then you also have to make sure you didn't break GDI+.

    Or WPF.

    Or Silverlight.

    Like Windows really needs another conspiracy theory rumor about Windows/DEVDIV adversity that a Windows breaking of any of those three components would inspire, especially after the recent Silverlight ruckus that made me realize we probasbly should pay Bob Muglia and Gu more than we currenly do. And maybe some other execs a scosh less. :-)

    Then add all the third parties whose support could be broken, just the same. If Microsoft keeps Office from breaking but breaks something Adobe or whoever is doing, then that would really suck. So there had better be a bunch of AppCompat testing, too.

    Of course each thing that fails in all of the above will be entirely Microsoft's fault for missing, and since many of those broken didn't care about new script support, they will only want to know how the hell to uninstall.

    Which is why all that testing would have to happen.

    At that point, when you have made the business decision to delay future work so you can test all of the above  so that an OS that shipped way back in 2001 can support scripts not even added to Unicode until 2-8 years later, someone will almost surely tap you to wake you up from the nightmare....

    TRIVIA: Did you know that installing Beta XP SP2 could cause a managed application written by a customer to crash, due to the added support and fonts for Bengali and Malayalam? It's true. The bug was always in GDI+ is you had a valid Bengali or Malayalam OpenType font, which we had not as of then ever shipped. So once we wanted to ship one in Windows in a Service Pack, this non-windows component broke. If the testing had not been done and that bug shipped, who would have been blamed?

    TRIVIA #2: I was kicked off one of three contracts I was doing for Microsoft (I was a vendor at the time) for my annoying and loud insistence that the GDI+ bug be fixed here. The conversations surrounding that, combined with the fact that the tester who originally found it was testing my component at the time, gave me a unique view of the anger and venom of the management aned executives in other divisions when random teams break their code and put the burden of unscheduled work put upon them!

    Now I am not anti-conspiracy theory -- just wait until I write up my Ampyra  pharmeceutical conspiracy theories, and you'd already probably read up on my health insurance changes conspiracy theories. And those theories will actually be hitting my bottom line a lot more directly than the "Unicode 6.0 on XP" issues are hitting armchair legalphiles on the Unicode List like Phillipe.

    But with that said, I suppose, since one of the things that happens when a new version of Windows ships is pretty much all of the above testing (with every group owning some of the "test your stuff on Windows" work item), you could make the claim that "Microsoft should just be honest" and say that the restriction is just to get people to upgrade. In a way, since it is considered required to do that work to make the claim that new things are supported, it is technically a push to upgrade. No one agrees to do that work out of band, or randomly for random versions of components and fonts and versions.

    Just like I can put BREATHER on my income tax form for my profession, since I spend my entire time at work breathing and could claim that my employer does not support me stopping such activities during business hours.

    Though the actual truth is just a little more complicated....

Page 1 of 2 (29 items) 12