Blog - Title

November, 2008

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    "We" don't tell you how to spell *our* language in *yours*, so...

    • 8 Comments

    Now if you look at all of the following blogs:

    The real issue we are talking about (once everyone stops complaining, which can take a while!) is the problem I explain in Who owns English, exactly?.

    Of course if we "owned" English (assuming "we" could define who "we" are in this case!), then wouldn't we take all of the following and more:

    Angielski
    anglais
    Anglè
    Anglès
    angleščina
    Anglické
    Angličtina
    Anglis
    anglisli
    anglizča
    Anglu
    anglų
    Angol
    Engels
    Engelsk
    engelska
    englanti
    Engleski
    Englezã
    Englisch
    English
    English Hol
    Ingelesa
    Inggeris
    Inggris
    İngilis dili
    ingilizçä
    İngilizce
    ingles
    Inglés
    Inglês
    Inglese
    inglise keel
    TiếngAnh
    Αγγλικά
    англiйская
    англизча
    Английски
    Английский
    Англис
    англисū
    Англиски
    Англия
    Англійська
    Енглески
    Инҝилис дили
    Անգլերեն
    אנגלית
    ענגליש
    الإنكليزية
    انگليسي آمريكايي
    अंग्रेज़ी
    ஆங்கிலம்
    ภาษาอังกฤษ
    ინგლისური
    영어
    英語
    英语

    and have opinions about the way English is spelled in other languages?

    Perhaps since no one in Great Britain or Australia or Canada or the USA is dictating how the items in this list are to look,people sould not spend so much time trying to tell people how their language is to be spelled in English? :-)

    You might be living in Iran, and bothered by the English word Farsi.

    Or perhaps you are living in the Xinjiang Uyghur Autonomous Region of China and bothered by the English word Uighur.

    And so on.

    But it might be a good idea to take a deep breath and relax....

     

    This blog brought to you by E (U+0045, aka LATIN CAPITAL LETTER E)

  • Sorting it all Out

    Trying to ignore the small stuff is harder, if you're Arabic

    • 0 Comments

    Via the Contact link, Alain asked:

    Hello Michael,

    I ask you about a problem I searched on the net all morning and get no response.

    We work à UNESCO (Paris/France) on a multi-lingual database (SQL Server 2005). We actually add Arabic to a English/French/Spanish/Russian thesaurus.

    We have Arab people at Alexandria test our application and they complained about not getting response when searching in Arabic with letters having not the same diacritics (e.g. Alif with and without Hamzah).

    We use SQL_Latin1_General_CP1_CI_AI, but I tried with Latin1_General_CI_AI and Arabic_CI_AI and got the same result.

    My questions : is there a way to add my own collation to a SLQ 2005 server. Or is there a collation just ignoring *all* diacritics for every UNICODE character ? And why does Arabic_CI_AI  not ignore Hamzah on Alif ?

    I wonder if I am the only guy around the world searching Arabic text on a SQL Server database. I am not an Arabic reader not speaker, but it seems the the requirment is very basic for Arabic...

    Thank you and, please, forgive my poor English

    Very good bunch of issues in there that all deserve some coverage! :-)

    Starting with the easiest part: SQL collations are terrible and essentially useless in most cases. My words in SQL Server: compatibility collations vs. Window collations are probably the best answer here to explain why not to use them. It is just that given how awful the SQL compatibility collations are for text outside of English, they are pretty much only the default in the US (otherwise SQL Compatibility collations are a bit too retro because Unicode and SQL Collations have nothing to do with each other).

    So less than ideal results there are kind of par for the course....

    Then there is the fact that Latin1_General_CI_AI and Arabic_CI_AI return the same results. This is actually also expected since both collations use the default table and the only difference between them in SQL Server is how they have different code pages attached to them for non-Unicode columns (1252 for the one, 1256 for the other).

    Therefore, this too is expected.

    Ok, enough stalling -- let's get too the actual issue -- the incorrect results!

    This is a longstanding bug that I have previously described in Is it punctuation, symbol, or diacritic?, which explains the nature of the problem and describes how in some cases NORM_IGNORESYMBOLS will help here when one is dealing with Windows 2000, XP, or Server 2003.

    Unfortunately there is no way to set this flag in SQL Server, so in the end there is no collation setting to work around the bug in SQL Server 7.0, 2000, or 2005.

    However, Is it punctuation, symbol, or diacritic? explains how Vista and Server 2008 actually fix this longstanding issue. and the cost of fixing eight separate problems with Arabic script collations was just one bug, in Persian (ref: Hello Madda, Hello Father (Iranian style)).

    and how does SQL Server get this fix?

    Ah, for that you can find the answer in On changing the world, or at least the way people order things in it, which explains that SQL Server 2008 has the absolute latest version of the tables to date when SQL Server shipped, and thus has the fix for this bug in it.

    There is, however, no downlevel fix for this problem that has really been around in Windows for as long as Arabic support has been in the product and in SQL Server for as long as Arabic support has been in that product.

    Custom collations or any way to modify collations? That is a feature that does not exist in either windows or SQL Server....


    This blog brought to you by ب (U+0628, aka ARABIC LETTER BEH)

  • Sorting it all Out

    On blowing a font cache, and overwhelming a Fonts folder with the raw power of typography

    • 3 Comments

     In response to About the Fonts folder in Windows, Part 3 (aka What changes in Vista?), Shaun asked in a comment:

    I unzipped a large number of font folders into my Windows/Fonts folder and now the unzipped folders are not showing up… my Fonts folder is only showing about 4,500 fonts and there are 65,000 fonts in there somewhere but they can’t be viewed and they’re not installed, just sucking up space and invisible. My “show hidden folders” option is enabled in Folder Options, and I can see the folders when I go into “Install Fonts”, but I can’t delete them!

    Any ideas on how to access these folders that are obviously there, but unaccesible?

    This comment took me back.

    Way back, in fact.

    To the Spring and Summer of 2000.

    I was in the final stages of a book.

    This book:

    Internationalization with Visual Basic

    Now most of the production was done in Microsoft Word for Windows, and the machines they were running it on were almost all running Microsoft Windows NT 4.0.

    And the folks doing the production work were having problems.

    It seemed like every chapter would have some characters missing. They would exit programs, log off, and reboot. The symptoms changed each time as the exact characters missing would vary, but invariably something would go wrong.

    They were desperate.

    Folks were getting more frantic in Indiana (where Sams was located), and the stress was being transferred to Redmond (where I was).

    So we had a nice long conversation where we wnt through the issue with the fragile font cache in NT 4.0 and how easy it was to blow by having too many fonts. And the large number of fonts that the book needed were more than enough to blow the font cache. and blow it huge.

    In Windows 2000, a huge push to fix these problems was very successful, but switching the production machines was just not an option -- the IT staff was just not set up to doing this. Even if they were, the results in Word were not the same between NT 4.0 and Windows 2000, and I was working on NT 4.0 for the book. Moving to a new platform would mean major reformatting work for the whole book. So if it seemed like there was a competition between them and I to decide who would drag their heels the most on the idea of a Win2000 upgrade, then the appearance probably wasn't far from reality.

    So my suggestion was to strip down the fonts to the absolute bare minimum, then add just the necessary fonts for each chapter and take them off the machine when done. And reboot in between each chapter, just in case.

    It mostly worked just fine (I say mostly because there were some problems in Chapter 3 that were not caught prior to print), and I swore to move all of my machines to Windows 2000 as soon as the book was completely done.

    Now like I said the problem was largely fixed in Windows 2000.

    But just because things don't blow up as easily as they did in NT 4.0 doesn't mean that GDI and the Fonts folder are prepared to scale beyond 65,000 -- or even 4,500 -- fonts! :-(

    In the words of the folks from In Living Color, Homey don't play that.

    But the deleton of the extra fonts is easy enough via an elevated CMD prompt. Which should allow the deletion to happen for all of the extraneous font files.

    Obviously the situation with my book was what at the time thought of as quite an extreme case of a machine being overwhelmed by the raw power of typography, but all things considered I am pretty sure that 65,000 fonts would probably top that on any version fo Windows (as the folder itself obviously has its own problems scaling to that quantity, beyond whatever problems the underlying infrastructure hits!).

    What are the scenarios that one would really need 65,000 fonts installed?

    How many fonts do you have on your machine, and in what version of Windows?

    when I think about Windows 7 and Long Zheng's Improvements to fonts in Windows 7 over in I Started Something, I can't help wondering if the new Fonts folder in Windows 7 will scale up to 65,000 fonts. I realized three things:

    • The mere fact that I ask the question here does not really make it a meaningful one, and
    • The mere fact that I ask the question here probably does mean someone will have to try it out if they aren't doing so already -- to have a description of the behavior in the future KB article, if nothing else, and
    • The behavior between now in the Windows 7 PDC build and in the final release of Windows 7 might be improved if this scenario wasn't already being tested....
    There is a class of bug that if someone finds it you kind of have to do something about it; this scenario might qualify. So I apologize to whoever is tasked to look further into things. :-)

     

    This blog brought to you by T (U+0054, aka LATIN CAPITAL LETTER T)

  • Sorting it all Out

    Grease is the word; ░░░░░░ not so much...

    • 3 Comments

    The question from, the other day was an interesting one. It was something like this:

    I’m trying to do a word-boundary check, and I noticed regex doesn’t handle boundaries correctly for some extended characters  (░╤╞╬═╣etc.).

    A simple example is “\b░” which should match “░” but doesn’t. Any normal character in front (“\bg░” : “g░”) will match correctly.

    If I manually check for boundaries (^$\W\s etc.) it works correctly.

    I haven’t found any of the regex options fix it.

    Is this a known issue?

    Does anyone have the equivalent pattern for \b so I recreate it myself?

     First let's look at those characters. They are:

    • ░, aka U+2591, aka LIGHT SHADE
    • ╤, aka U+2564, aka BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
    • ╞, aka U+255e, aka BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
    • ╬, aka U+256c, aka BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
    • ═, aka U+2550, aka BOX DRAWINGS DOUBLE HORIZONTAL
    • ╣, aka U+2563, aka BOX DRAWINGS DOUBLE VERTICAL AND LEFT

    Did you realize there was all this graphical crap in Unicode? :-)

    All of them have a Unicode General Category of So, also known as Symbol, Other. What the CharUnicodeInfo class I mentioned earlier would call UnicodeCategory.OtherSymbol.

    And then we'll look at how \b is defined when it comes to regular expressions, in topics like Atomic Zero-Width Assertions:

     Assertion 
    Description

    \b

    Specifies that the match must occur on a boundary between \w (alphanumeric) and \W (nonalphanumeric) characters. The match must occur on word boundaries (that is, at the first or last characters in words separated by any nonalphanumeric characters). The match can also occur on a word boundary at the end of the string.

    \B

    Specifies that the match must not occur on a \b boundary.

    There we go -- the explanation!

    It would be unrealistic to assume that a regular expresion engine even remotely Unicode aware would think that ░ or any other symbol would be a \w character -- because those symbols aren't words!

    When this was pointed out, the person asking the question definitely didn't expect anything different here; he said:

    That seems reasonable enough.

    If I need to support this scenario (probably don’t) I can create my own \w patterns that include those Unicode characters, like [^\p{L}\p{Nd}\p{Pc}…].

    which gives the workaround if anyone if looking for it (I suspect the actual need here to treat a symbol as a word would be pretty uncommon in text scenarios, as is the use of these symbols anyway).

     

    This blog brought to you by the previously mentioned symbols, obviously!

  • Sorting it all Out

    From I SCOOT to IBOT, #4 of ?? (with some pictures!)

    • 7 Comments

    Prior blogs in the series here and here and here.

    In response to I SCOOT TO IBOT, #2 of ??, Gwyn commented:

    Can you provide some pictures of the different modes? I'm not really sure what they all are exactly like.

    Very good idea!

    I had a few minutes and my camera so I decided to take some pictures.

    I had no one else here and I did not think of it earlier in the weekend when I was around some people, so this would have to be a solo operation. I'll probably try to get some more when there are people around, eventually....

    WARNING: Although I took these pictures of different modes while I was not in the chair, you should NEVER do that. EVER. The chair is calibrated with a me-sized person in it, and me not in it is not something that it knows what to do with!

    We'll start with standard mode.

    First from the front:

    IBOT standard (from the side)

    and then from the front:

    IBOT standard mode, from the side

    and then from the back:

    IBOT standard mode, from the back

    Special things to note in this mode (not including the I'm a PC sticker on the side, the Microsoft parking pass you can see in the front and the license plate you can see in the back, the latter two of which came from the Saab that is no more) -- this is the mode that is fastest -- up to 6.8 MPH.

    Note those extra caster wheels in the front -- they are only used in this mode.

    The control is fairly lousy so I really only spend time in this mode when I want to be parked or as low as possible or if I have to go somewhere in a hurry.

    Then there is the four-wheel mode.

    Here it is from the front:

    IBOT four-wheel mode, from the front

    and from the side:

    IBOT four wheel mode, from the side

    Now notice how those caster wheels are just kind of hanging there? They are not used in this mode.

    And this is the mode that can bring up the carpet squares if it used indoors. So that is something to not drive around in, inside.

    But it is very rugged and can take on some really steep hills (both up and down).

    I find it to be the best all-around travel mode, and the one I use (for example) to go to work with unless it isn't raining and I do not mind the extra few minutes of the next mode.

    Finally, there is balance mode.

    First from the front in its shortest view:

    IBOT balance mode (shortest), from the front 

    and from the back in its shortest view:

    IBOT balance mode at its shortest, from the back

    and then penultimately (and more impressively) from the side, in the shortest setting:

    IBOT balance mode at its shortest, from the side

    and then finally, and most impressively  is balance mode in its tallest view:

    IBOT balance mode at its tallest, from the side

    This difference may be impossible to see from the pictures, I think it'll have to have people next to it while I'm in it to be meaningful.

    Though there was one important difference which I took some brief video footage of.

    Basically, in the shortest balance mode, the empty IBOT shuffled back and forth a little bit as you can see in the low-res video here (WMV, ~679 KB zipped).

    But in the tallest balance mode, the empty IBOT was moving a lot and was clearly missing me, or some reasonable me-sized weight in there as you can see in the low-res video here (WMV, ~1 MB zipped).

    When taking it out of balance mode the fact that I confused it was even more apparent, as it struggled to try to balance a me-sized weight that wasn't there made it go forward at least five feet.

    I'll try to do this again when I have someone else there to do the video while I switch the modes. It really was quite a site!

    Anyway, hope that will do well enough until I have time to get some more made, with people so that they will be more useful for all the reasons I mentioned! :-)

    On a side note, there is an IBOT over on eBay right now for 12000 which, while definitely better than the full price might see more challenges getting are insurance company to cover it....

     

    This post brought to you by (U+267f, a.k.a. WHEELCHAIR SYMBOL) 

  • Sorting it all Out

    'It's Not Easy' saying WTF to an 'Ant in Alaska'

    • 4 Comments

    This blog title is not a reference to Kermit's It's Not Easy Being Green, as any diehard Liz Phair fan would recognize...

    I'm going to dig a little into one of the random questions that came out of this last April's I'm aware of that: an Andreaesque segue and intervention, of sorts.

    Andrea: I don't think people understand your relationship with your old team. Especially since you are still writing about a lot of the same things you were before. Do they read it? Do they agree with you or disagree? Do they still talk to you?
    {deep breath from Andrea}
    Andrea: And I don't just mean other people for this one. I don't get this one either. What's your connection to them, now?
    Michael: Oh wow, that one is a bit harder.
    Andrea:
    I'm aware of that. But I sincerely doubt that I am the only one who is curious.
    Michael: Okay, I'll think about that one, too. Maybe that'd worth a post or two, at least for the "me" half of it. I wouldn't try to speak for the other half....
    Andrea: It might help the confused among us

    My old team.

    NLS.

    National Language suppport.

    Not Localization, Stupid!

    Globalization Services.

    They have a lot of names.

    The song I'm playing now might tell you something about it.

    This is the one singing it:

    Liz Phair, lying there 

    It is Liz Phair, from the way back in the time of the girlysound days (the song, not the picture), the title is I Know It's Not Easy and it was re-recorded for the Exile in Guyville re-release under the name Ant In Alaska. They took out a line or two, I think, but you'll barely miss 'em if you don't have the original version.

    It has a lot of the same raw feeling, and I know some have argued whether it was re-recorded at all but I think most people agree that it was.

    The re-recording is most notable for the prefixed 58.5 seconds of silence, which for me symbolizes something too. Maybe I'll talk about that some other time, or maybe not.

    For now that is something between Me and Liz.

    And Liz, actually.

    The lyrics for the song go something like this:

    Call me when you think the coast is clear
    I've been hiding out almost a year
    Is there something wrong? What's taking you so long?
    And yeah, I know it's not easy
    You said if I waited it'd pay off
    But my eyes are growing wild and my body's gone soft
    Is there something wrong? What's taking you so long?
    And yeah, I know it's not easy

    You said I should let go of your hand
    But I don't even know if I can
    You're the only one, you are the very sun to me
    And you know it's not easy

    You'd tell me, wouldn't you, if we needed to talk?
    And you'd tell me, wouldn't you, if I'd pissed you off?
    Is there something wrong? What's taking you so long?
    And yeah, I know it's not easy

    Well, I look at the stars and I know you're under them
    I look at the cars and I know you insure them
    I look at the books and things people are reading
    I know that you've written them, too
    You've got so many little things to do
    But then I look at my life and I know you've forgotten
    The promise you made to me, I think that's rotten
    I'm hopelessly lost and there's hardly a sound anymore
    Coming through that can show me around
    'Cause I'm endlessly endlessly searching the crowd
    Looking for something from you
    Just one fucking measly clue
    Any shitty little tipoff would do
    But I'm just an ant in Alaska to you

    Then I look at the stars and I know you're under them
    I look at the cars and I know you insure them
    I look at the books and the things people are reading
    I know that you've written them, too
    You've got so many little things to do
    But then I look at my life and I know you've forgotten
    The promise you made to me, I think that's rotten
    I'm hopelessly lost and there's hardly a sound anymore
    Coming through that could show me around
    'Cause I'm endlessly endlessly searching the crowd
    Looking for something from you
    Just one fucking measly clue
    Any shitty little tip-off would do
    But I'm just and ant in Alaska to you
    I'm just an ant in Alaska
    An ant in Alaska
    An ant in Alaska to you 

    Now most of the themes of this song are not what I am saying my relationship with my old group is like.

    Seriously.

    Our "break-up" (such as it was) was nothing like this, at all.

    But that last line....

    This world in which I now live, in almost the southwest most place in the building on the opposite side of the group's East side abode in the building, when I haven't been enlisted in their branch for way over a year since Track change (a.k.a. A new job that has a few things in common with the old one) happened, I think that buried in the line is what I think of as the connection I have with my old group.

    At least symbolically, I'm an Ant In Alaska.

    Sometimes I meet with them and they ask me questions about stuff as they work on new features.

    Sometimes they tell me about their plans (since in theory a lot of the other groups I help out might find it helpful if I know about future plans, though in practice not so many are directly impacted).

    But not much (or at times any) of my feedback actually ends up in the new features, and the final plans are often wildly different than I was originally told.

    I probably have more influence and impact on their clients and on customers in other parts of Microsoft (in part due to this unofficial blog, in part due to past relationships) and even on external customers (again via this very Blog!) than I do on them -- to them, I think I really am an Ant In Alaska, even if they do read here (several don't, and it isn't like they have to, but some still do).

    To be honest, I don't know that I'm particularly bitter about that though. I suspect I'd be a lot less happy about the work I do if I knew more about what was going on there, due to the natural desire to be unhappy with things that change, especially if it is not changing the way I would have done it, given the chance. Not knowing gives me a better sense of distance.

    Other people in the building do read the Blog, I know -- they ask me stuff all the time and some of them even feed me ideas that end up becoming blogs here (and others are on the list to be done like the one of the double L, you know who you are!).

    But on the whole, I do feel closer to customers now, which was really the whole point of the Track change thing anyway. Which means I'm happier, a lot more often than I'm not.

    I mean, I won that cool award:

    Bulldog

    and the only people who really knew about were the folks who came by my office (not many of the folks from the old group) and the ones who read about it here. No mail was ever sent (amusing in and of itself since as I suspected it was never mentioned in any mail to the group), so people have just found it kind of randomly if they happened to be coming by for something else.

    We don't get a lot of visitors from the rest of the building, though.

    I was talking to a teammate from the NLS days the other day about an issue that had come up and it had the same cordial feel of a conversation I had with a former manager from nine years prior. No bad feelings, a lot of mutual respect and interest, and very little real idea of what the other person was up to, which kind of made the small talk much more purposeful as we both tried to "catch up" on things. Not self-consciously since there was no expectation that the other one would know things, but a collegial kind of "good to chat from time to time" sort of thing.

    Know what I mean? Just like the manager from nine years prior.

    So we are in the same building, but not the same team....

    And then a different other day, a colleague in another group entirely who managed to embarrass me with his praise a bit asked me:

    Out of curiosity… you have been with int’l for a while, yes? Do you ever feel pressured by yourself or others to “move on and do something else”?

    And I guess the answers are yes and yes some times and yes some other times.

    I've even had a tempting offer or two.

    But there are still things I can contribute here, so I have not been giving into the occasional temptation yet.

    And I'm really not saying that It's Not Easy (the old song title) being an Ant in Alaska (the new song title). Because for the most part, it is -- and I find that I actually enjoy being an ant in Alaska. Kind of collegial isolation technique!

    You know, I debated not writing this blog. And then after I wrote it I debated not posting it. But sometimes you've just got to say WTF....


    This blog brought to you by(U+2f8d, KANGXI RADICAL INSECT)
  • Sorting it all Out

    Helping partners -- small steps can be a big deal

    • 0 Comments

    So it was yesterday from over in Hong Kong that friend/colleague/Microsoft MVP Martin Poon pointed out an article in Chinese entitled Microsoft 帶領合作夥伴協助中小企克服經濟逆境 (Microsoft partner will help lead SMEs to overcome difficult economic conditions).

    This is a really great idea and I hope that this is the kind of program that Microsoft will get behind in other parts of the world as well.

    I mean, with current economic conditions as they are, it is easy to see small-to-medium sized businesses feeling a very uncomfortable squeezing sensation, and I think it is great for companies large enough to weather the problems to step up and help others in the community who are making long-term investments in their technology.

    From the article:

    中小企支援計劃,協助本港小型及中型企業克服因全球經濟低迷而帶來的挑戰。Microsoft 將提供全面的資訊科技套裝,包括免費伺服器軟件、免費寬頻連線、伺服器折扣優惠及全面安裝支援,務求協助中小企提升營運效率及減低成本。

    雖然本港中小企慣於面對激烈競爭帶來的挑戰,但2008年金融海嘯所引發的經濟衰退和不明朗前景,仍然令他們進退失據,難以對短期及長期的業務營運問題作出決策。中小企支援計劃旨在協助本港中小企充分發揮其現有資源以適應現時經濟不明朗因素,並突顯方便易用的科技能夠協助企業提升生產力的優勢。

    Microsoft Hong Kong 總經理林向陽表示:「中小企是本港經濟命脈,佔本港企業超過98%,而且僱員數目約佔本港私人界別勞動人口的五成。面對經濟不景,IT提供強大的工具,能夠協助企業節省成本及改善營運。企業投資於適當的資訊科技,及了解如何充分運用現有資訊科技資源,有助開創新意念及營造有利環境,以便在經濟反彈時迅速增長。」

    Or, for those who may not be really up on their Mandarin, a fairly accurate translation:

    Microsoft Hong Kong has announced the launch of the "Innovation of SMEs bad days" plan to support SMEs (Subject Matter Experts), to help local small and medium-sized enterprises overcome the global economic downturn and the challenges it brings about. Microsoft will provide a comprehensive suite of information technology, including free-of-charge server software, free broadband connection, server discounts, and fully installed support to help SMEs enhance their operation and reduce costs.

    Although SMEs in Hong Kong are accustomed to face the challenges of intense competition, in 2008 with the financial tsunami triggered by the economic downturn and future uncertainty, they are still caught in a dilemma -- real challenges for short-term and long-term business operation and decision-making. Support schemes to assist SMEs in Hong Kong give full play to existing resources to meet the current economic uncertainty, and highlights the easy-to-use technology can help enterprises improve their productivity.

    Lin, general manager of Microsoft Hong Kong, said: "SMEs are the lifeblood of our economy, accounting for more than 98% of enterprises in Hong Kong, and Hong Kong's total number of employees in the private sector of the labor force. As the economy slows down, IT will provide powerful tools to help companies reduce costs and improve their operation. Enterprises have the opportunity to invest in the appropriate information technology resources, and learn how to make full use of existing information technology resources, help to create new ideas and create a favorable environment for the economic rebound. "

    Like I said, this is a really good thing to be doing, something that really underscores a desire to help people in the community out (in this case communities that Microsoft helped enhance and even create!).

    Hopefully this is one of many such programs -- there are SMEs all over the world who could benefit from this kind of thing...

    Another reason to like the company I work for, and that I like the company I work for. :-)

     

    This blog brought to you by ! (U+0021, aka EXCLAMATION MARK)

  • Sorting it all Out

    ELS is not ESL, or SLE, or SEL, or LES

    • 2 Comments

    Extended Linguistic Services.

    It is something I first mentioned last week, in From ____ to ____ to MUI to ELS -- World Ready @ the PDC!.

    You should also note that Kieran is talking about it in her blog (ref: What's new for you in Windows 7: Extended Linguistic Services).

    That blog of hers points to place on the Go Global site that talk about ELS with a nice huge paper, entitled Writing World-Ready Applications in Windows: Extended Linguistic Services in Windows 7. that paper has it all, from the whys to the features to the design to the API definition to some code samples.

    A very good start for people who want substantive information about ELS and what's coming.

    If you have the PDC build of windows 7 you can evn start playing with it (otherwise you'll have to wait for the beta or whatever).

    I'll be talking more about ELS eventually (mainly when it is more widely available to people). In the meantime, Kieran's blog and this paper should keep occupied those who are interested, for a bit. :-)

    The blog also points to a live ELS demo, being used by the Live team....

     

    This blog brought to you by(U+2cfa, aka COPTIC OLD NUBIAN DIRECT QUESTION MARK)

  • Sorting it all Out

    Being smart, by not trying to be clever

    • 0 Comments

     

    There are times that Microsoft Word is too smart for its own good

    The message I received via the Contact link was:

    Dear Michael,

    First, I would like to thank you for your great BLOG!  Well done!  I've spent many hours reading and studying various articles and have found all of them helpful AND entertaining as well.  And it's hard to do both of those things at once!

    Although I've worked in the Java world for years, I'm new to Windows programming.  So please forgive my ignorance!

    The app that I'm building involves using 2 new keyboard layouts.  So I built them using MSKLC.  Everything was fine except they did not quite work in Microsoft Word 2007 running on XP.  Most of the keyboard worked just fine.  In particular the Devanagari layout didn't quite work in Word although it works fine in Notepad, Wordpad, and Excel 2007!  It turns out that when I use a dead key to obtain some of the cerebral consonants in a glyph they don't combine!  And it only happens when the dead key letter is in second position!

    Let me explain more clearly with an example.

    Here is what is supposed to happen (see the Devanagari Unicode table):

    Let + be short for the words "followed by" in the example below

    0938 + 094d + 0925

    This sort of thing works just fine in MOST cases.  The 2 letters are combined into a glyph & are properly rendered on-screen

    However, if the last code point in the example is created by using a dead key (dead key + someKeystroke), the 2 letters do NOT combine into a new glyph as they do all the other times.  If you then move the cursor backwards on the screen to select the letters of the word, then the glyph fixes itself and properly renders!

    So I've found a way around this by simply not using dead keys but it makes for a less convenient keyboard mapping.

    What's happening here?  I saw an article where one of your readers said something about Rich Edit causing something like this.  It was a small and somewhat vague remark.  Could this be the case here?  I don't mean for you to spend a lot of time on this.  If you could just point me in the right direction to understand what is happening, I'd be grateful.

    Thanks again Michael.  Tell your boss that I said you're doing a great job!

    Dharma

    Well that was quite a nice note to get!

    I thank you Dharma, for all the kind words. I won't go so far as to say it's why I'm here, though I will say it makes the time here nicer!

    I'll forward the note on to my manager as per that last sentence, though of course since this a personal blog it is unknown how much it will impress since this isn't my actual job that you're talking about. But it can't hurt, at least. :-)

    Now to the actual technical question....

    In this example, we are taking a consonant

    U+0938, aka DEVANAGARI LETTER SA

    and combining it with another consonant

    U+0925, aka DEVANAGARI LETTER THA

    and making a conjunct with a bit of Unicode-esque Virama glue, so it will look like this:

    स्थ U+0938 U+094d U+0925

    Or maybe if you wanted to see a visible VIRAMA in there you'd add the ZWNJ, like so:

    स्‌थ U+0938 U+094d U+200c U+0925

    Anyway, you get the idea.

    Now at the very beginning I talked about how some times that Word is too smart for its own good.

    This would be one of those times.

    You see, Word tries its best to be helpful, something that folks have at times even made fun of, e.g.

    but even the most diehard of Microsoft critics will admit that Word really is trying be helpful -- this is why Word's competitor try to do the same kind of thing!

    But there are times when its attempts to pay attention to what you are doing, in this case how you are entering the text, actually seem to do more harm than good.

    You could perhaps try the advice in You're not the one out of sequence, and that's the Word (turn off the sequence checking) to see if that helps, though I don't think it will here. This is actually Word interupting its own Uniscribe layer since "there aren't any keyboards that would ever do this kind of thing" and it is saving some processing time trying to shape things that need no shaping.

    Thus it isn't changing what you've input like the sequence checking can do in its artificial-intelligence-style-autocorrect attempts (you mentioned yourself that moving the cursor back over the letters and selecting them will "fix them up". And actually saving, closing, and re-opening it seems to fix the text up, as dies scrolling text way off the screen and then scrolling back to it.

    In this case, Word's being so smart is mostly for its own benefit, not yours.

    Which makes it all the more unfortunate that there isn't a way to give Word some hints here on what is going on.

    Though the fact that it is not hurting the text being stored in the document gives you two potential workarounds, one of which seemed to work for me all the time and the other of which sometimes worked:

    1. You can ignore the problem and it will go away in the future when Word has to render the text when it is not watching you type it, or
    2. You can try to build in AutoCorrect sequences that replace the string with itself

    The third workaround (don't do the dead key on the "shaping" character) is one implied in the original description, but since that changes how the keyboard itself is trying to do its work, it's not one I would suggest.

    Perhaps if Word was not trying so hard to be clever, it could be smarter here?

    Though obviously the Word folks would triage the importance of this bug by how common it is -- and of course this kind of keyboard would be pretty uncommon (sorry, Dharma!). But I'll forward it on, either way....


    This blog brought to you by(U+0938, aka DEVANAGARI LETTER SA)

  • Sorting it all Out

    I think we're taking the wrong approach, mostly

    • 5 Comments

    In the past, I've done a lot of presentations on globalization and localizability issues.

    In different companies where I was brought in to do this, they were very well received, because generally a company is being asked to do the work to support another language and the people being trained found themselves quite hungry for the info.

    But when it came to conferences, most of the positive feedback went along the lines of "very interesting presentation" but in the checkbox for whether it was useful for their immediate work, often they'd say no. Because if a company is spending thousands on a conference, they don't usually have such a focused requirement. If the person even signed up for the talk, it was either curiosity and thought they'd see me cause trouble or maybe they'd heard of me or whatever.

    There are exceptions to this, like the Internationalization and Unicode Conference. But this only proves my point -- you could probably fit over 30 IUCs in a TechEd or a PDC. So you end up with the very small number of people being sent to a specialized conference, often with a generic requirement of "we need to support GB 18030" or "we have to do Japanese" or whatever.

    When it comes to Unicode support, NT shipped well over 10 years ago and quite a few applications out there still don't support it. Slides like

    Language Matters! 

    are only interesting for shock value -- they aren't going to convince anyone who isn't already convinced, and looking their own presentation to justify it to upper management.

    Because how many companies are thinking of shipping their software overseas and not just shipping it as is?

    MUI is a cool technology, but it is not of general interest to anyone other than people trying to build in-box drivers in windows who are told they must support it by contract. People are given an assignment to ship the product in Japan; they don't wake up and say "we should support 10 languages in a switchable fashion" for the hell of it and then the sales people cn figure out what to do with it."

    The flaw is that by trying to get people interested based on some nebulous notion of "best practices" it is hard to get people interested.

    Best practices for globalization? Manuel Garcia O'Kelley Davis would say null program.

    But when asked to do presentations on security issues with string comparisons or the consequences of  user settings breaking applications, I often get a lot more interest.

    People care a lot more about consequences than they do about nebulous features (since selling software in another country is a lot more complicated than just these issues -- there are legal issues that by themselves would block most people from ever even considering it).

    I mean, a lot of the PC game industry and a lot of the driver industry "supports" Unicode. 

    I put supports in air quotes because they may not be and in many cases probably aren't doling anything outside of the ASCII range.

    But they support Unicode because the OS underneath them does and they want to avoid the extra OS conversions.

    I guess what I'm saying as is that we have to stop trying to appeal to "best practices" or the miracle of dynamically supporting UI in 40+ languages.We need to focus on:

    • the bugs that break their own code as it is;
    • security issues caused by their code as it is;
    • performance issues in their code as it is.

    People care about those kinds of issues a lot more than they care about good globalization or localizability.

    Unless they are are already in those markets or want to be acquired by a company who is that pays attention to whether this work is already done, of course!

    Now I have a couple of regular readers here.

    But mot of the traffic comes from people searching for information on bugs or issue hitting them now.

    All of this is why I think we're mostly taking th wrong approach....

     

    This blog brought to you by ? (U+003f, aka QUESTION MARK)

  • Sorting it all Out

    Strange control over CTRL and control characters

    • 0 Comments

    Via the Contact link, Rich asks:

    Hi Michael, I've been reading your (very detailed and useful) series on keyboard layouts. There's one thing that's puzzling me with respect to the post about the Caps Lock state (Getting all you can out of a keyboard layout, Part #8).

    When you are processing a standard key that's not part of a dead key sequence, you have some special-case processing for control chars that tests whether the VK value minus 0x40 is the same as the control character.

    if((rc == 1) &&
    (ss == ShiftState.Ctrl || ss == ShiftState.ShftCtrl) && )rgKey[iKey].VK == ((uint)sbBuffer[0] + 0x40))) { ... }

    However, there is a subtle difference in the equivalent code that processes a dead key combining character:

    if((((ss == ShiftState.Ctrl) || (ss == ShiftState.ShftCtrl)) &&
    (char.IsControl(basechar))) ||
    (basechar.Equals(combchar)))
    { ... }


    In this case, you use char.IsControl(basechar). What is the difference between this and the test for VK - 0x40?

    Many thanks
    Rich

    A very reasonable question, one that I don't really cover in the blog proper.

    It is actually to avoid a situation that kind of happens in keyboards.

    The very first (pre-alpha) version of the tool that eventually became known as MSKLC but at the time was just something we knew as "the keyboard tool", and a feature that I have talked about previously in Accessibility, Internationalization, and Keyboards (#3: MSKLC's UI) and Do your utmost to be conventional (and then pimp, q.d. or p.r.n.).

    The ability to load an existing keyboard layout.

    The very first version of that code did not include the CTRL and CTRL+SHFT shift states, but the second version I demonstrated a week later did. And while I was working on that code I ran into a problem - the fact that the various keyboarding API calls I was using would treat almost every single keystroke in the particular shift states as if it had a control character in it -- unless a letter or something else was assigned.

    We realized that the code was just always saying these were here unless something else was added.

    There is a certain symmetry in CTRL putting in "<Control>" characters, and you can see why if you ever spend time in the DOS prompt and you actually hit some of these keystrokes -- because this is how the are sent.

    After a conversation about this where Simon pointed out that just because Windows is going to simulate them being there is no reason to add them explicitly in the keyboard tool. So we decided to strip these automatically added characters, since they really were not part of the layout.

    For MSKLC 1.3, we decided to just always strip anything that made char.IsControl return true.

    Just 'cause, you know?

    Anyway, it turned out that broke some stuff, namely some applications that work at lower levels and don't get the benefit of the automatic additions of these special characters. This is a bug I have mentioned previously in blogs like Usage (customer intent) vs. Design (developer intent) and when we broke TELNET as I mentioned in Michael's Keyboard Laws for Developers, Part 3. With a special blessing from the test team (the original triage team said the bug did not meet the bar) the fix for this bug was snuck into MSKLC 1.4.

    To limit the impact while maximizing the chance of fixing all of the known problems, the original char.IsControl code was removed and replaced with the test limiting the "ignoring the character" when it was the exact control character under the letter in question or a small list of specific control characters that applications depended on.

    But neither the tester nor I could see any reasonable purpose to doing the sane with dead keys -- who would involve a control character with a dead key? That would make no sense!

    So the dead key check did not include the extra filtering.

    And for this series, I essentially went through the same process as i had the first time the code was written, except I remembered what had been done -- so I just put the same checks in, to get comparable results.

    It seemed obscure enough that the explanation would never be needed; a comment would only call attention to it.

    Well, that was my thinking at the time. Clearly Rich proved to me that there are people who can be as detail-oriented as I can be. :-)

    Anyway, that is why the two different checks exist. Because every reasonable keyboard should be able to be loaded properly with it, including a bunch of unusual cases....

     

    This blog brought to you by U+001d, aka <CONTROL>, aka INFORMATION SEPARATOR THREE)

  • Sorting it all Out

    You think herding cats is hard? Try herding CATS. Hell, try herding KATS!

    • 3 Comments

    It reminds me of a joke that I actually experienced a while back.

    My cat of blessed memory (Tamara Penelope Kaplan, or Tammy for short) had done something that required stitches.

    The veterinarian was quite helpful and she (the veterinarian) gave her (the cat) the stitches, and to keep her (the cat, not the veterinarian) from chewing on them or whatever, she (the veterinarian, not the cat) put a collar around her (the cat, not the veterinarian) that she (the veterinarian, not the cat) would take off her (the cat, not the veterinarian) in a few days.

    I'll never forget the recommendation she (the veterinarian, not the cat) left me with.

    "Try to have her take it easy for the next few days," she told me.

    At the time, I thought but did not say that I wasn't sure how much more easy she could take it without being in a coma. The cat got up when she felt like but also was quite happy to sleep most of the day away more often than not. What, she was supposed to sleep 21 hours a day instead of 18? :-)

    Later on, I realized that I had more influence over the President's foreign policy agenda than I do over telling a cat what to do. which made the whole recommendation question where the veterinarian had ever in fact had a cat living with her!

    You can't make a cat do much that the cat doesn't want to do.

    Now the expression herding cats has been around for a while, and is generally meant to refer to (as suggested in the Urban Dictionary here):

    Any difficult activity of uncertain outcome involving numerous and often competing factions that is of dubious value. 

    If you have ever owned or possibly even observed a cat for any length of time, You will probably understand what the phrase means at a pretty basic level.

    Perhaps you do not have a cat living in you house, in which case the preceding text from this blog is all kind of lost on you.

    Just take it as read that cats do not lend themselves to being herded.

    Then of course there is CATS.

    Believed to be an organization rather than a person,  this main antagonist in the whole All your base are belong to us thing (which you can read about here), we are talking about a person who is definitely not the type of person who responds well to being driven or directed to do much anything that he doesn't want to do.

    Herd CATS?

    Whether it is an organization or a person, it is not bloody likely that will be happening. Any more so than one might be able to herd cats.

    Though to be honest, it might be easier in the end to herd either cats or CATS then it would be to herd KATS.

    KATS, you see, is the Korean Agency for Technology and Standards, and we know how stubborn people involved in National standards can be.

    Of course, KATS is pushing the agenda we have all known and loved related to a real dislike for

    • the use of the conjoining Jamo in Unicode to construct modern Hangul syllables, and
    • the way that Unicode normalization equates the modern Hangul syllables with their decomposed conjoining Jamo.

    It does not matter that this is how the script works, putting together these smaller pieces to form the bigger pieces.

    And it doesn't matter that for the normal Korean user who types in the Jamo to get the Hangul that this relationship is so firmly embedded in their understanding that many have trouble understanding what the government in South Korea is talking about when they express such negative feelings for normalization, anyway.

    One could say that

    The opinion of KATS in relation to Unicode Normalization is not exactly normal.

    And one could also say that

    Trying to get them to change their mind in response to linguistic, technical, or even alphabetical concerns is an objuect lesson in how hard it is to herd KATS!

    I think it was William Shatner who said that "irony can be pretty ironic, sometimes." Did we really expect KATS to prove the cliché about herding cats?

    Isn't that ironic, in both the genuine and Alanis senses? :-)

     

    This blog brought to you by(U+1159, aka HANGUL CHOSEONG YEORINHIEUH)

  • Sorting it all Out

    Inspiration, and a code chart

    • 4 Comments

    Way back in September after I did that presentation at the Internationalization and Unicode conference that I mentioned and provided the slides of in Behind the Proposed Change to Tamil in Unicode (five different ways), Scott sent me the following via the contact list:

    Michael,

    After your talk today, I was inspired to put up a Unicode syllabary chart for Tamil on the Tamil Script page in the English Wikipedia, complete with  the new Tamil named sequences from Unicode 5.1, in the hopes of building support for the current Unicode encoding model.  Anyway, you can check it out if you're curious:

    http://en.wikipedia.org/wiki/Tamil_script#Tamil_in_Unicode

    If you find anything horribly wrong, I'd be happy to fix if you let me know about it, so you won't have to violate your policy of Wikipedia non-editorship.

    I just hope this doesn't earn me death threats!  ;)

    -Scott

     I think that what Scott did here was excellent, and I did not note anything horribly wrong at all....

    And it humbles me to think that I helped inspire it.

    Because even though that is the secret hope I have for some of my talks (especially including this one), it is really awesome to see it spelled out in such a way.

    The chart he provided was similar top but not the sam as the ones I provided in Learn Tamil in 30 Days (or something like that), and help people look at Tamil in Unicode the way that they might learn Tamil, something the simple code allocation chart would never be able to do -- in its own way something Uniciode cannot do without prioviding this same crucial bit of infomation in a familar form.

    Thanks, Scott -- both for this and for supporting my non-interferece policies WRT Wikipedia! :-)

    Which reminds me that I promised to talk more about some of the issues I didn't have time to cover in the talk. I'll be sure to get on that....


    This blog brought to you by(U+0bb9, aka TAMIL LETTER HA)

  • Sorting it all Out

    The sort order of the Language Bar (and Michael is in heaven on this one, other than...)

    • 8 Comments

    At this point, I am convinced that people are afraid of the Suggestion Box.

    Because even if the question would completely make sense to be there, people still send it to me via the Contact link and thus become part of a big file of potentiasl questions that I may or may not get to at some point that no one else never sees.

    I think I am going to have to reword my text meant to dissuade people from doing that or something....

    Anyway, one of the sent-via-the-Contact-link-but-shoulda-been-put-in-the-Suggestion-Box question I got was:

    Is there a way to (manually) rearrange the order of input languages in the language bar (and also possibly the different keyboard layouts under each of those languages)?

    Excellent question!

    Well, actually I'll say excellent questions since there are two in there.

    These days it does seem like the two subject areas I do the most advisory work in are still keyboards and collation, despite the fact that my actual job is so very different now. So it is interesting to get a question involving changing the order of lists of keyboards, the same way it presumably would be interesting to get a question on how one would type in a sort. :-)

    Unfortunately, the answer to the first question is no; there is no documented or supported way to change the order of the languages in the Language Bar.

    Also, just as unfortunately, the answer to the second question is also no; the keyboard layouts underneath each language also have no documented/supported way to be re-ordered.

    For both questions, the intent was to ask if there was a way to do them manually, but there is also no programmatic way, either.

    As features go, this is one that in my opinion would make some sense to provide, if not programmatically then at least manually. Because if you have two or more layouts then being able to control not just the default for each new input queue but also the ordering of the various lists when lists exist would make some real sense, for the sake of usability.

    You know, handled in a way similar to the whole "lock the taskbar" type functionality where if you turn off the checkbox you are free to drag stuff around a bit. I could get behind that kind of a feature. Especially in my case where I am constantly adding and removing random keyboards for various reasons, but even users who have a more static list might want a simple re-ordering they could do once and then not have to worry about again....

    Though surprising as it may be, Windows features are not always triaged according to what I personally thinks might make sense. :-)


    This blog brought to you by(U+1961, aka TAI LE LETTER TSHA)

  • Sorting it all Out

    This is not yet my take on DirectWrite

    • 2 Comments

    More news out of the PDC. :-)

    I had a few people point out after they saw the talk I pointed to in From ____ to ____ to MUI to ELS -- World Ready @ the PDC! (the one that I liked the content but didn't care for the title, and I though could have used more material) the one called Windows 7: Writing World-Ready Applications, another very interesting talk.

    One called Windows 7: Introducing Direct2D and DirectWrite, a presentation you can see right here.

    Now people who would talk to me aren't gonna go ga-ga about Direct2D in front of me, sinc they know that it is not the sort of thing that impresses me.

    On a good day I'll nod and smile and then move on to the next thing.

    On a bad day I'll snarl out something deprecating, like they sell that crap at the airport before moving on to something I find more interesting.

    A man who lives so much of his life in source code and DOS (well, CMD) prompts is just not gonna always get excited every time you have a new faster way to do cool graphics.

    Nothing personal, but I am not their target customer, and they kind of do sell that crap at the airport, from my point of view. :-)

    The people who pointed out the talk to me, they were talking about DirectWrite.

    They don't sell that crap at the airport....

    This is something I do plan to talk about more at some point, though most likely not so much right away since no one has really had a chance to look at it yet.

    For now, I'm going to be cautiously optimistic and withhold any other judgment.

    I have been burned getting too enthused about this kind of thing before. In retrospect, I found reasons to be less excited, as I have pointed out in stuff on text stacks in Khmer, and I'll tell you about all the text stacks, and the recent Silverlight as Esau: selling its implementation for a pot of interface.

    The Khmer thing coming up in both of them is coincidence; I'm talking about deeper issues.

    And when I do get around to assessing DirectWrite, it will be on the basis of my core issues that you have heard me blog me about in the past:

    • CJK text handling
    • vertical text
    • script coverage -- both breadth and depth
    • complex script support
    • ease of use
    • interoperability issues
    • parity or the lack thereof
    • the generic question about yet another entry in the collection of text stacks
    • what's missing -- what scripts, what languages, what scenarios
    • the downlevel story
    • the customers
    • the standards/Unicode/community story - de facto and de jure conformance and leadership
    • the managed/native split

    All of these issues are huge in my mind, and not very many of them make sense to beat up a pre-beta for, and not that many would be top of my list in a beta, either.

    But I'll talk about them. and how issues I have talked about before stack up against the new kid on the block.


    This blog has no Unicode sponsor yet, but lots of characters are jockeying for position for the upcoming ones....

Page 2 of 3 (33 items) 123