Blog - Title

July, 2008

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    On installing and removing fonts, Part 3: I'll love you forever, or at least for this session.

    • 7 Comments

    Dedicated to an Easter Egg poem, and its author...

    Previous blogs in this series:

    This blog in the series is a slight detour....

    And I am going to go slightly out of order now.

    The off-course jaunt started in my mind last time in a response to Part 2, regular reader Andrew West commented:

    Surely the font installer does not need to know who exactly may or may not be using the font; it just broadcasts the WM_FONTCHANGE message and well-behaved applications will handle it and update their font information accordingly.

    Now Andrew is quite right -- WM_FONTCHANGE is just the message that a responsible installer/uninstaller would send here.

    Unfortunately there are many problems with the architecture -- some the fault of installers/uninstallers that are indirectly the fault of the difficulty of the task, and others the fault of the complex process that nevertheless has failed to keep up with some of the complexities of the underlying platform:

    • Lots of installers out there tend to be about as responsible as Nathan Scott's mom when she's bombed;
    • Even more uninstallers out there tend to be worse;
    • Since install and uninstall have detectable consequences, forcing this extra step is a little unfair;
    • Starting in Vista, the work to mitigate shatter attacks keeps a process from communicating with elevated processes -- meaning the more important an application is, the less likely it is to get the message;
    • The message does not go outside the session, meaning any other session remains unaware of the change -- woe to those who make these changes on a Terminal Server or the world of Fast User Switching!

    You get the idea.

    The design here is quite sensible for a bygone era, but to be honest something better might really be considered.

    But I digress.

    The point is that even a "well-behaved application" that sends the message and a "well-behaved application" that receive it might not ever be able to communicate well enough to be told that something is up.

    Despite the fact that the contents of a directory and the contents of a registry key were altered.

    So in the end, the rules (if you work within the framework above) end up being quite simple:

    1. If the font is there and the new one is determined to be the same font but of a lower or equal version , then you do not have to do anything;
    2. If the font was not there, then you can add it, add the registry information, and send the message; everything is as good as it can be, but other sessions and elevated processes might have to wait until next time;
    3. If the font is there and the new one is determined to be the same font but of a greater version, then you have to do all of the install work but it is much safer to require the reboot and do the actual file replacement at boot time.

    So pretty much the only time that you never have to do anything is when the font in question is already installed. when you didn't have to do anything.

    Yeesh. Talk about over-engineered! :-)

    In the next part of the series we'll talk about the mechanics of the install and uninstall that must happen....

    And in a later blog I will talk about my opinion of regular reader John Cowan's suggested "solution", here. :-)

     

    This blog brought to you by ƒ (U+0192, aka LATIN SMALL LETTER F WITH HOOK)

  • Sorting it all Out

    What kind of English were you looking for? We only seem to have one in stock....

    • 13 Comments

    A question came up the other day that some regular readers might find vaguely familiar:

    We have been using the CultureInfo.CurrentUICulture to get the default UI language of the client OS. It works for locales like en-US (which is the default) and for other language locales like de-DE, es-ES, fr-FR, it-IT, ja-JP, ko-KR, pb-BR, ru-RU, zh-CN etc., but NOT getting the locales which are other flavours of English like en-AU etc., instead returning en-US in those cases.

    The language locales are picked based on client operating system for the language or if not from the regional settings languages etc.,.

    For example, if we set the locale en-AU everywhere in regional settings and in also in MS Office, its still picking up only en-US.

    We are trying to get the locale from our Add-in in MS Office using the .NET 1.1.

    Please throw some light on how we can get the locales which are other flavours of English.

    Ah,. the hazard of being UICulture-based running on Microsoft products!

    So maybe you remember one of the following blogs from the past:

    and more.

    The problem is the same -- despite the commonly known and understood fact that these various dialects are not all 100% mutually intelligible, companies like Microsoft, in an effort to save money, tries to enforce a single language version of many different products.

    The exceptions to this are few and far between, e.g.:

    • Chinese
    • Portuguese
    • Norwegian (where we actually skipped a Nynorsk LIP, as I described here)

    And despite all the lip service (pun intended) that people pay to the need to support "local experiences", despite complaints from former MSFTies like Mike Williams or not-yet-quite-former MSFTies like me, despite the work of cartoons like Darby Conley's Get Fuzzy with the multitude of cats who visit from non-US English speaking places like Manchester that many can't understand and even more random references, no one thinks the problem is bad enough to bother with.

    No one wants to "get" the problem here though.

    When I think about the nightmares associated with time zones and all the brave efforts to fix longstanding problems that only were able to get traction when they directly impacted people at the executive level in Redmond, I wish some similar solution were possible here -- like localize all of Microsoft's products into UK English or even better Australian English and have these versions on the computers of every executive, technical fellow, and Distinguished Engineer at Microsoft.

    How many days would they have to use products while they struggle to understand the words before it would become a mandate to care about local experiences in all of the other places that English, Spanish, French, German, Italian, Chinese, etc. are spoken besides the few places we localize to....

    But to be honest I don't see how it could be accomplished. And since no one gets made an executive by finding ways to have stuff cost more money, the problem perpetuates itself.

     

    This blog brought to you by z (U+007a, aka LATIN SMALL LETTER Z)

  • Sorting it all Out

    The situation was quite grave when I realized how "tepid" those hot keys were

    • 5 Comments

    Yesterday in The keys are so hot, they're smoking!, I did something that you won't see me doing very often.

    I waxed on at length about a feature as a really exciting thing for people to use.

    I'll now admit that when I did that, it was mostly a setup.

    Because today I am going to prove that:

    • I wouldn't make the worst tester in the world
    • Even Vista has a bug or two in it
    • The "Hot keys for input languages" feature has weird bugs in it

    So we will start by adding three keyboards under three different languages to my usual setup -- in this case Latvian, Lithuanian, and Luxembourgish (The "L" Word languages in Europe):

    This will give us the following view in that Advanced Key Settings tab:

    Because this is Vista we will get the three shift states and twelve character choices:

      

    The two characters at the end are ~ (U+0073, aka TILDE) and ` (U+0060, aka GRAVE ACCENT).

    We're gonna need a few more keyboards here for what I want to do. Let's go back and add three more, under those same languages (just for fun):

    Okay, this will give us the list to die for:

    Six entries plus the English I started with.

    Okay now, to quote Antonio Banderas from Desperado right before he started shooting up the bar, Let's Play.

    Among these six keyboards, let's use that Change Key Sequence... dialog, to add hot keys to each of them:

    Okay, do you see What we have here?

    • Ctrl + ~
    • Ctrl + `
    • Ctrl + Shift ~
    • Ctrl + Shift + `
    • Left Alt + Shift ~
    • Left Alt + Shift + `

    Now if you have a US keyboard you might notice the problem here.

    Yes, that is right -- the tilde is the shift state to the grave's base state.

    In other words, there are not six unique states here at all -- the user interface just thinks there is. All of the TILDE entries without a SHIFT in them have an implicit one built in, and all of the GRAVE entries with a SHIFT do not exist since the SHIFT changes the characters.

    Oops!

    Let's hit the Apply button now and see what happens:

    Uh oh! Somebody silently removed our CTRL + GRAVE ACCENT that was on the Latvian keyboard.

    But the rest were left.

    Let's fix it and try again:

    Okay, that one stayed fixed but now two others are gone.

    Third time, will it be a charm? Let's see:

    Um, not exactly -- not only is one missing, but another one is now duplicated on the list but without the hot key. And we can't add it back either as it is no longer on the list:

    Weird.

    And kind of broken, given that we have now broken the UI and half of the ones that are there are not going to work the way we want them to anyway.

    If you keep doing this, eventually you hit a non-steady steady state of something like this:

    Ick.

    But let's pretend that this is not a problem for a moment.

    And we'll look at some keyboard layouts:

      

     

    All of this kind of underscores the point from Punctuation keys can make lousy shortcuts, because even if we do not know whether these hot keys are CHARACTER based or VIRTUAL KEY based, we know that neither the characters nor the virtual keys can be relied on for being in all keyboards or in the same positions even when the are present.

    Sometimes they are there  but the VK values are different, sometimes they are on other keys, sometimes they are actually dead keys (and believe me the interaction when they are dead keys almost deserve their own bug report!).

    So basically between half and all of these entries are going to be kind of broken, depending on the layout you choose.

    And of course there are those duplicated entries to help us keep it all in perspective:

    Then we can try Estonian instead since both the character and the VK are different there:

    As this will give us more somewhat broken entries for hot keys, unless you do them this way:

    Though even if you do them right for the English keyboard (to switch to Estonian), you may not have them right for the Estonian keyboard (to switch to English) which does not have them both there.

    Not much room for gray area here -- there are just a whole heap of bugs here that have as far as I know never been reported and some of which have existed for any versions.

    Ick.

    Kind of provides a possible answer to Centaur's wondering:

    I wonder what is the target audience for these hot keys.

    Fictional users? :-)

    I'll talk about the next problem tomorrow....

    (Keep in mind we are keeping away from the IME topic, for now!)

     

    This blog brought to you by ` and ~ (U+0060 and U+007e, aka GRAVE ACCENT and TILDE)

  • Sorting it all Out

    Show me the money? How?

    • 3 Comments

    Regular reader Jan Kučera's latest attempt to thwart my fantasy of emptying the Suggestion Box went something like this:

    Hello again,

    I'm trying to display prices, in a currency the user chooses. I've abolised keeping my own formatting info for each currency, and decided to assign a culture to the currency, and let the .NET to format the price using its data (I hope this is a good decision :)), so the list of available currencies is like USD -> en-US, GBP -> en-GB etc.

    Now the question, what culture should I choose to format the price in euros? Are the formatting rules the same for whole union and I can just randomly choose one country?

    Or is there any IFormatProvider for whole EU?

    Thanks!
    Jan

    PS. Well by checking all cultures which has € as a currency symbol I see that every country formats the currency in a different way, so this is not going to work. So is the suggested way to use the USD/GBP strings and format the price as a number using user's culture?

    It is funny that the question of formatting is so prominent here -- isn't the big problem the actual conversion? The formatting seems like the easiest issue....

    As I mentioned in Show me the [small]money!, I am quite fond of Cloanto Currency Server for the actual conversion.

    For the formatting, I would take the NumberFormatInfo of the user's culture, and change the currency symbol, but leave everything else exactly as is. Because a user's expectations of the way numbers work does not change just because they convert, and although some would understand the differences in formatting, others wouldn't. And the ones who wouldn't could easily end up misunderstanding the data they are seeing....

    But taking a step back, it is really not such a big deal here.

    At least for most situations.

    You might be wondering why I say that if I talk about the potential for confusion and such?

    Well, it is true that I might misunderstand the currency value if things like decimal places change

    But think about the scenario for a moment:

    I'm trying to display prices, in a currency the user chooses.

    This kind of implies the typical online commerce scenario, when most commonly one is converting from some unknown currency into their own. And in this case, for the desired target of everything being in a user's own currency with their own formatting, they will be getting the same final results either way.

    In short, the difference between "formatting with user preferences" and "formatting with the currency's preferences" disappears when dealing with the user's own currency.

    And of course as I mentioned in Show me the [small]money!, a method to get the culture from the currency doesn't exist in the .NET Framework or Windows, so unless you deal with a third party product or do all of the work yourself, the whole "formatting with the currency's preferences" is hardly going to be all that easy to do anyway.

    This may be why Jan was doing it himself, in fact!

    So now I will take yet another step back and suggest finding some component that does all of the work here. Money gets one into all kinds of regulatory rules and laws about commerce. It is much better to find a component that can help indemnify you if they screw it up, rather than being ruined if one makes a mistake.

    Strange advice coming from someone who works for a company with such a huge "let's develop everything here" bent on things, but (a) Microsoft has the time and money to spend on such a plan as wellas money to pay for their mistakes when they make them, and (b) there are several cases where I think they are wrong in this approach....

     

    This blog brought to you by $ (U+0024, aka DOLLAR SIGN)

  • Sorting it all Out

    The murder-suicide? It wasn't me. But my heart goes out to her and her family and her friend (if not him)

    • 3 Comments

    Nothing technical in this post, sorry!

    "It had absolutely nothing to do with me, and I had absolutely nothing to do with it."

    I had to tell many people the above line throughout the day.

    It had to do with the headlines all over the Internet:

    It happened at the Archstone Redmond Campus apartments, which is where I live.

    The shots were fired a little after nine, when I think I was already scooting over to Microsoft.

    There were sirens while I was heading down 156th at the time, but it was a fire engine and an ambulance so I don't think they were related.

    The five articles I mention above were all links that people sent me (via email and over IM) throughout the course of the day, people who were concerned an wanted to make sure I was not somehow involved with the incident of the estranged husband with the .357 in his waistband who shot his wife with a 9-mm while she was heading to work (at Microsoft) from the apartment she was staying at.

    I wasn't.

    I don't know who she was, or who he was, or who the person she was staying with is. From the picture in the King5 article, it looks like it was near outside the office on Building Z and I am in Building E -- though like I said I wasn't even home when it happened, even if it had been right outside.

    All day, people who kept asking me about it, through the day -- eight in all with the five articles.

    One even asked if I was the person the woman who was killed had been staying with.

    Its just that you've had friends stay over before when they needed a place to live, and your facebook Relationship Status says "It's complicated." This guy was out to lucnh enough to shoot his wife. So I kind of put two and two together?

    2 and 2? They got 22, this time. My life isn't *that* complicated, by any means.

    One of the articles even stated the friend was female. Which I'm not. Of course each person didn't see each story -- that was just people like me.

    It made the day kind of surreal, to be honest.

    The most recent one was a comment in Facebook after I mentioned it in my status:

    Sadly, I have to admit you were the first person who popped in my head when I heard the news story. Josh & I used to live there, too.

    That is when I decided to write this actually.

    Back to the various new pieces... a neighbor's quoted statements I found quite frankly perplexing, things like:

    If it was a random act of violence I would be concerned... But this is an isolated incident. It could happen anywhere.

    I'm hoping this was just a misquote. I certainly hit a parse error on that one.

    The police PIO was not much more reassuring, from the PNWLocaleNews.com coverage:

    "It all happened in a matter of seconds," said Bove, who added that the man had a .357 magnum in the waistband of his pants during the shooting.

    This makes it better, the other gun? Well I guess it means there might be two fewer guns out there.

    Who was well-served by this coverage? I know I wasn't, and none of my friends who contacted me really were. No one was. So is this what the press is reduced to now? The people's right to know things that people really didn't have any actual need to know?

    And I wondered about this guy I don't know who shot himself and the woman he shot and the friend of hers he didn't, and I realized that no matter how weird my day was, it had nothing on theirs. There is something really horrifying about the whole incident, and the series of tiny articles, and the people who would email me.

    What about the people who emailed the friend? All of the people who emailed me just wanted to make sure everything was okay, but what if I was the one in the situation and a friend had just been shot?

    I wasn't angry at the people who contacted me, but most of them were worded weirdly enough that I probably would have been, if I were involved.

    No one really trains folks on how to do those "just wanted to make sure everything was okay" calls. And as far as I know there is no Miss Manners column about it, either.

    I actually watched the news tonight on several channels, which is kind of a departure for me (I am mostly a Stewart/Colbert man for news these days), curious about what the coverage would be. I suppose I should be grateful that it when it was mentioned it at least came before the weather and local stock info and especiially before the Cheeto a woman found that was shaped a bit like Christ on the cross (the woman dubbed it Cheesus, of course -- didn't Heinlein have one of those in I Will Fear No Evil?) -- Cheesus was a MYQ2 exclusive.

    The news reports were pretty much a rehash of the earlier stories, and all they added was a little of the art of the police moving around on the property. Which is why I am a Stewart/Colbert person now -- they at least add something to their coverage (though the downside is that they don't cover this kind of story at all since it really isn't funny).

    Wrapping all this up finally, I will say a prayer tonight, for both of the two people involved who didn't shoot at anyone, and wish them whatever support they can get. If you want to take a moment to send some positive thoughts out into the universe on this then I doubt the time would be wasted.

    Because the odd messages are the least of their problems, and there is really no way to make this better....

  • Sorting it all Out

    Who and how shipped which IME when and where?

    • 4 Comments

    The question seemed almost deceptively easy:

    Please help me with the location from where I can download Microsoft IME Standard 2003 for Windows. I have located the Microsoft Office IME 2003 [here], but could not find the IME 2003 for Windows.

    Actually, the Office and Windows IMEs are very closely related, and there are not going to always be downloads for them.

    Like in this case.

    Here is the big list that as colleague over in support was able to provide:

    • Office IME 2007
      IME Version: 12.x
      Included with: 2007 Microsoft Office System
    • Microsoft IME
      IME Version: 10.x
      Included with: Windows Vista
    • IME 2003
      IME Version: 9.x
      Included with: Office 2003
    • IME 2002
      IME Version: 8.1.7xxx
      Included with: Windows Server 2003, Windows XP x64
    • IME 2002
      IME Version: 8.1.4xxx
      Included with: Windows XP
    • IME 2002
      IME Version: 8.0
      Included with: Office XP
    • IME 2000
      IME Version: 7.1
      Included with: Windows Me
    • IME 2000
      IME Version: 7.0
      Included with: Windows 2000, Office 2000

    This seemed like a very handy list to have, so I figured it would make sense to put it in a blog -- that way next time the question comes up I'll remember it. :-)

     

    This blog brought to you by(U+0d10, aka MALAYALAM LETTER AI)

  • Sorting it all Out

    The Limonata foil "condoms" are in no way ironic

    • 4 Comments

    It is no secret that I am a huge fan of San Pellegrino Limonata.

    And it is not much of a secret that the "European" can size of 330ml leads to a slightly smaller can then the typical 12oz one usually seen on the US side of the puddle.

    The new cans (which were discussed previously here)

    do have one "feature" I did not mention previously -- the foil 'cap" on their tops.

    Now I am just old enough that watching movies with cans that use pull tabs them in them as a plot device (e.g. WarGames when Matthew Broderick dials a number on a pay phone without change using one) are ones I remember watching back when it seemed normal since all cans had them -- even though now I am astounded at a design that led to so much littering of pull tabs throughout the world.

    But this foil top is intended mainly to deal with one of the long-standing complaints of the design most companies use now -- the fact that a can might become a landing pan for dust/dirt either during transit or while sitting in the store or at home waiting to be opened and used.

    I suppose it is good at that.

    Though to be honest the six-packs come completely wrapped in clear plastic which means the tops of the cans were already quite protected.

    Now the foil "condoms" are protected by the plastic so they won't get dirty either.

    This is not ironic, really -- just short-sighted,

    What the hell purpose do these foil condoms serve other than to be a huge step backward in the world of new opportunities to have trash that people will either not recycle or will just use to litter?

    I am hoping this distinctive oddity does not catch on with other canned drinks....

    I'm not going to litter with any of mine, for what it's worth. :-)

    Note to Microsoft internal folk: RED stuff is for the benefit of people, GREEN stuff is for the benefit of the planet.

    Kind of interesting (though also not ironic unless you misuse the word in more of a Britney or Katie sense than an Alanis one) that the San Pellegrino people changed the can color from being green when they made this particular earth-unfriendly change!) and it is good to know that the color scheme has applicability outside of the world of Microsoft, given their (also not ironic) refusal to stock the beverage.

     

    This blog brought to you by ޠ (U+07a0, aka THAANA LETTER TO)

  • Sorting it all Out

    Behind facebook status like: "...somewhere between 'Addictive' by Faithless and 'Addicted' by Juliana Hatfield."

    • 3 Comments

    I've been in the habit of using a musical metaphor in my facebook status information lately.

    And even more occasionally in my Windows Live Messenger status1.

    And then most occasionally in my Office Communicator status2.

    And a few of my friends have asked me about it -- like what they meant, why I was doing it.

    Sso I figured I should blog about it to explain.... :-)

    The underlying problem I am trying to solve is the one I first defined in a blog from a month ago, the one entitled Doing a thorough brushing of the gift horse's teeth.

    Stated simply, the problem I defined there was:

    I can't reasonably express an appropriate [musical] subset that will fit into 4 GB.

    But then I started thinking about it.

    I pick the music I listen to based on my mood, for the most part. That is true.

    But it is hardly always true.

    I mean, the reason that songs like Joe Satriani's One Big Rush are such great songs to just put on at particular times is that they are great songs to just psych yourself up for some interesting challenge or issue or situation3.

    You don't choose to put on such music because you are in the mood to get psyched up for something. A situation requires it, and so you put on the music to help get you in the frame of mind to be able to do it.

    For lack of a better way of putting it, you alter your mood based on the music you play.

    Okay, fair enough.

    But then how best to choose what to listen to?

    I decided to try a simple technique:

    • Figure out where I am at the moment (maybe emotionally, maybe socially, maybe professionally, maybe not even me but instead a friend -- just some interesting theme);
    • Figure out a song that kind of captures that theme;
    • Look at this slice of life as more of a spectrum than an endpoint;
    • Think about where that spectrum might take me;
    • Decide, if I were just listening to music while traveling on it, where it might take me.

    From there, I have two end points, each represented by a song. They share a particular thematic connection -- it could be a comparison, or a contrast, or a point-of-view thing, or any other kind of commonality. But it becomes quite easy to pick songs to go on that 4 GB Zune that fit within that spectrum.

    Mentally this might be the very way I (subconsciously) build playlists, or chose what records or songs to listen to.

    Now I found the above technique to be way too easy --  there is simply so much music on so many subjects, it was like a kid on a candy store asked to pick two sweets.

    So I added a last condition.

    Make sure the two songs also share a more mundane connection as well as the deeper one above.

    It now becomes like a crossword puzzle style problem -- there are kinds of rules or structure surrounding but there is enough creativity to allow for creativity to provide reasonable variation, and entertainment.

    (The analysis that will follow are *very* over-simplified, but they make the point for purposes here!4)

    Thus I end up with a facebook status something like:

    Michael is (thematically) somewhere between Pink Floyd's "Stay" and Lisa Loeb's "Stay".

    Now these two songs share a title, obviously.

    The Pink Floyd song (from Obscured By Clouds) refers to a one night stand, with the next morning and the guy saying

    I rise, looking through my morning eyes,
    Surprised to find you by my side.
    Rack my brain to try to remember your name
    To find the words to tell you good-bye.

    And the Lisa Loeb song (from Tails) refers instead to a woman who ended a relationship for all kinds of really good reasons, and then suddenly realized (since she missed him) that

    You said that I was naive and I thought that I was strong
    I thought, "hey, I can leave, I can leave."
    Oh, but now I know that I was wrong, 'cause I missed you
    Yeah, I miss you

    Either way we hit relationships that didn't really have the staying power to make it, so the ultimate result is the same, but the "Loeb" relationship is certainly more one to be proud of, one at least involving no "walk of shame" like the "Floyd" relationship does.

    Ultimately showing how relationships -- as they become more mature, more substantial, and yes more adult -- are for the most part still going to end, no matter what.

    But we still get more out of the more adult ones. :-)

    Anyway, other "theme" statuses have included:

    • ...somewhere between "Addictive" by Faithless and "Addicted" by Juliana Hatfield.
    • ...somewhere between Dr John's "Rock" and Begonia's "A Hard Place".
    • ...somewhere between "Childhood's End" by Iron Maiden and "Childhood's End" by David Gilmour.

    And probably another dozen or so I have used and several hundred that have occurred to me.

    But each one has [as least] three meanings:

    • the mundane one that anyone can see even if they know nothing of either/both songs or even either/both artists;
    • the deeper one involving some aspect of part of the underlying meanings/themes of the two songs;
    • the deepest one involving the exact occurrence or memory or story or experience that I either know of or heard of that inspired me to add it.

    But even if you only have one of these it can still be fun... I may even blog about one of them again some day!

     

    1 - Very few of my friends are also Windows Live Messenger contacts -- so most people don't see this.
    2 - I never launch Office Communicator other than to turn my work phone forwarding on or off, so I never remember to update this status.
    3 - Younger readers can mentally replace this with a song like Le Tigre's TKO.
    4 - I could probably blather on about any of these pairings for at least 62.4 minutes longer than any normal human could stand to listen to me do it!

     

    This blog brought to you by ݉ (U+0749, aka SYRIAC MUSIC)

  • Sorting it all Out

    Let's save some time and call them all IRregular expression engines

    • 4 Comments

    Way back in December of 2007, aaron asked in the Suggestion Box:

    Your recent In SQL Server, A-Z [...] might not mean the same thing:

    It got me thinking, a whole post dedicated to the problems of mixing regular expressions and i18n would be very interesting.  Some questions i've always woried about but never tested:

    • '\b' word boundaries, do they incorrectly show up when surrogate pairs or combining characters are involved?
    • '\b' word boundaries, are there / should there be characters that form word boundaries only sometimes.  It's plausible in some interpretations that "hy-phen" has only two word boundaries, at the begining and end, but in reality is has 4, as '-' is not a '\w' character.  But do other unicode characters have some sort of weird identity.
    • If i have an accented character as two code points (combining), does / should '.' (or '?' in Win32 regex) match the character and the accent, or just the base character?
    • how wide is the definition of '\w' word character?  Does it / should it ever change based on the current user locale/language?

    Most importantly

    • how likely is your average regular expression going to be i18n unsafe?  what are the common pitfalls to avoid?

    Note: for 'should / does', i'm asking all of (a) what do you (Michael Kaplan) think it _should_ do, and (b) what do some common implementations do (for instance, the .Net System.Text.RegularExpressions.Regex class, or the new TR1 regex in Visual Studio 2008, or Win32 with FindFirstFile and friends)

    (Oh, and your blog is awesome!)

    #aaron

    Hopefully the long delay before I got to responding did not change his opinion of the blog. :-)

    I'll start off with a quote from Jamie Zawinski:

    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

    This is to help set expectations realistically. :-)

    I'll start by saying that most modern regular expression implementations do have certain features that are particularly good for internationalization, such as Unicode storage semantics. Such features are pretty much essential in most cases. Some of them build in Unicode property type information and most of those keep up with recent versions of Unicode.

    With that said, most of the ones that I have worked with are otherwise very primitive, not properly handling Unicode normalization/canonical equivalence, known in .NET as the "text element" semantic. This leads to some problems with Aaron's first and third points above with sequences of characters that should be treated as equivalent to some other character or sequence.

    Most also have no notion of exceptional cases such as the one in Aaron's second point above -- to support these things you have to build up complex expressions to try to handle the exceptional cases. If one is lucky they are included in samples, but usually only the very simplest ones tend to be built in.

    And none that I have ever worked with properly handle locale specific differences, the thing that I have referred to here in the blog as "sort elements" -- what users in a particular language think of as a single character, the kind of thing that Aaron hints at in the fourth point above.

    For more on sort elements and text elements, blogs of mine like Sort element vs. text element are a good place to start.

    The theory of all of that good property support often runs into the kinds of problems described in No Regex in the Unicode room! (and no sex in the champagne room, either!) and 4400 (*not* 'The 4400') and 'The 44' (*not* 'The 4400'), where what the engine does manges to fall short of what one might expect from an implementer of information coming out of the Unicode Character Database....

    And I'm not going to pick on Microsoft's implementation, which is probably about average here. Most suffer from the complicated nature of the data in the UCD when their comparatively simplistic implementation tries to use the data.

    Which then leads to the last question, the one about how common the "i18n unsafe" expression problem might be expected to come up. On the whole, I expect it is way more common than people realize, as the nature of the more complicated cases requires built-in expressions much more complicated than the definitions that are usually present....

    An ideal implementation plan for such an engine is covered in Unicode Technical Standard #18: Unicode Regular Expressions, whose own summary states: "This document describes guidelines for how to adapt regular expression engines to use Unicode." Though many fall short of that ideal here (the only reason I don't say all here is that I have not tested every engine out there, but all the ones I have used and/or dabbled with and/or tested have issues).

    Now going back to that original series of blogs about SQL Server, it is clear that problems I point out in that series and in posts like Wild[card] thing, You make my CHAR sing and With SQL Server (and SQL itself) comes the illogic of 'trailing spaces' (and the myth of fixed width) are more than anything else to do with SQL Server choosing to draw that line between appropriate behavior and simple definitional consistency in a better place that regular expressions tend to do. Which leads to inconsistencies in the documentation and limitations/flaws in the syntax (which was not made to handle things this complex either).

    I must admit that I find myself more comfortable with where SQL Server sits here, rather than where regular expressions do. :-)

     

    This blog brought to you by(U+0990, aka BENGALI LETTER AI)

  • Sorting it all Out

    Pay no attention to the man behind the curtain, or the language on the file

    • 9 Comments

    So thinking about the design of MUI, some interesting thoughts came up in a conversation the other day.

    Let's take NOTEPAD.EXE for a second.

    If we move to the WINDOWS directory and hit ALT+ENTER we get it's Properties:

    Let's move over to that Details tab:

    Hmmm. I thought that we were all language-neutral now. Why is the file' version resource claiming that the file has a language of English (United States), anyway?

    Let's look over in the language-specific directory and see what the NOTEPAD.EXE.MUI file looks like there:

    Wow, no VERSION properties at all!

    Wait, maybe that is an issue with the filetype. Let's copy the file and remove the .MUI extension:

     and try again:

    Ah, there we go. So it has a VERSION resource and the language is tagged.

    Of course this proves nothing -- the original file makes the same claim, even though all of its resources are gone (as we learned in Random irreverent thoughts about the Ultimate Fallback).

    So let's look at some of the other language files, like Arabic:

     

    or maybe Hebrew:

     

    And maybe we could mix things up a bit.

    Like looking at the Arabic file under a Hebrew user interface language:

    or the Hebrew file under the Arabic user interface language:

    Okay, so the language files are being marked.

    The only weird thing left is that the language neutral file, the one that even Mark Russinovich discovered the hard way is language neutral (ref: Random irreverent thoughts about the Ultimate Fallback), is marked as having a language.

    Well, it turns out that the language splitting functionality in the Resource Compiler (RC.EXE and RCDLL.DLL).

    Now RC.EXE has several flags related to creating .MUI files:

    /fm mresname

    RC creates one language-neutral .RES file and one language-dependent (MUI) .RES file using script-file. This option must be used together with the /fo resname option. RC names the language-neutral .RES file resname.res and names the language-dependent (MUI) .RES file mresname.res.

    /fo resname

    RC creates a .RES file named resname using script-file.

    If the /fm mresname option is also set, RC creates one language-neutral .RES file and one language-dependent (MUI) .RES file.

    /g1

    If /g1, is set, RC generates a MUI file if the only localizable resource being included in the MUI file is a version resource. If /g1 is not set, RC will not generate a MUI file if the only localizable resource being included in the MUI file is a version resource.

    /j loctype

    Localizable resource types RC places into the language-dependent (MUI) .RES file. If the /q option is also set, this option is ignored, and the information in the RC Configuration file takes precedence.

    /k overtype

    Overlapping resource types that RC places into both the language-neutral .RES and the language-dependent (MUI).RES files. The resource types that are specified by the /k option must be a subset of those that are specified by the /j option. For example, –J2 –J3 –K3 specifies that RC places resource type 3 in both the language-neutral and language-dependent (MUI) files. If the /q option is also set, this option is ignored, and the information in the RC Configuration file takes precedence.

    /q Mui.RCConfig

    An RC configuration file that follows the RC Configuration File format. The RC Configuration File format enables components to self-describe resource information such as resource versioning, MUI file path, resource types and items. This file specifies which resources go into the language-neutral .RES file and which resources go into the language-dependent (MUI) .RES file. This option, and the information provided in the RC Configuration file, override the command line options /j and /k.

    Notice that in none of that talk of splitting resources does it ever make any claims about changing the language of the "neutral" .RES file it creates as part of the splitting.

    Not that it wouldn't make sense for it to do that (since it took the time cause a de facto change to the language of the resource by removing all of the language-specific information), but that work item would have some interesting consequences, which I can talk about more some other time, perhaps....

     

    This post brought to you by and (U+0e9d and U+0e9e, a.k.a. LAO LETTER FO TAM and LAO LETTER PHO TAM)

  • Sorting it all Out

    Why I couldn't ever really do something like Twitter

    • 5 Comments

    So I was at the Woodland Park Zoo tonight, where Aimee Mann was playing.

    She has missed Seattle her last couple of times coming through this part of the country, so it was nice to her and hear some of the songs from the new album, live.

    Though as venues go, it isn't my favorite.

    There are just way too many people there who seem much more interested in going to the zoo with their kids and getting to take them to a show that they can get into free than people who are actually interested in listening to music, you know?

    I don't mind kids. I used to babysit, I have worked in after-school programs and day care centers. I was even a nanny for a year!

    But taking them to see an Aimee Mann show means that you have to be okay if the occasional curse word comes out of her mourh, since some of the words do....

    Plus it was weird but the place I was sitting had a huge bunch of Marc Cohn fans rather than Aimee Mann fans, which was also a little distracting....

    I ran into Kevin and Keith there and ended up sitting with them which was cool. Apparently we all like Aimee's music (it must drive Cathy nuts, now that I think about it; sorry, Cathy!).

    I also learned something very important tonight.

    You see I wanted to get the setlist for the show but I forgot a pen.

    So I decided to use the facebook status updates from my phone to record them.

    What I discovered was that I would hate Twitter -- I can't even stand updating the status line this often!

    But it ended up being a good show. I'm going to have to keep listening to the new album to see if gets me; I really did like some of the songs live that I hadn't gotten into before....

    The setlist:

    • Stranger into Starman
    • Looking for Nothing
    • Freeway
    • Phoenix
    • Great Beyond
    • Save Me
    • Wise Up
    • <an Elton John song I have blocked out>
    • Goodbye Caroline
    • Medicine Wheel
    • Deathly
    • Thirty One Today
    • How am I Different?

    There is potential here -- maybe I should head down to the Portland show where the venue will be less distracting to me, and see the show there....

     

    This blog brought to you by no Unicode character in particular....

  • Sorting it all Out

    Is that character in the font or isn't it?

    • 4 Comments

    Regular reader Yaytay asks over in the Suggestion Box:

    How can I find out, from code, (reliably and completely:-) ) which fonts support a given character?

    I've tried using GetGlyphIndices, but there are still some fonts that return non-zero values for glyphs they don't have.

    My comment from your blog here:
    http://blogs.msdn.com/michkap/archive/2007/01/31/1563080.aspx

    The utility converts a regex into a list of unicode characters (based on their name) and then when you select one of those characters it displays that character in all installed fonts that support the character.

    It is based on the GetGlyphIndices function.

    Unfortunately some fonts return a non-zero value for a given character, but don't actually support it (displaying the default rectangle when used).

    Is there a more reliable way to determine support for a code-point in a font?

    When I've got this utility working more reliably I'll make it available if anyone wants it.

    Rather than using GetGlyphIndices, I always found myself using GetGlyphOutline, instead.

    The bias probably comes from the work I did in MSLU to support GetGlyphOutlineW on Win9x platforms, that I mentioned a few years back in Getting all of the localized names of a font, to date the blog that still gets the most U+fffd-filled spam comments....

    Though if GetGlyphIndices is mapping code points to the notdef glyph, then GetGlyphOutline might be as well. So this my or may not be a solution.

    Perhaps you could take the hint from Getting all of the localized names of a font and grab the CMAP directly.

    Or even better try by a different route -- via a ScriptGetCMap call? Though this could get expensive across all of Unicode, across all fonts.

    But as the PSDK topic describes all of the information about dealing with the default glyph (aka the NOTDEF glyph), at least the function has given things some thought:

    This function can be used to determine the characters in a run that are supported by the selected font. The application can scan the retrieved glyph buffer, looking for the default glyph to determine characters that are not available. The application should determine the default glyph index for the selected font by calling ScriptGetFontProperties.

    The return value for this function indicates the presence of any missing glyphs.

    Some code points can be rendered by a combination of glyphs, as well as by a single glyph, for example, 00C9; LATIN CAPITAL LETTER E WITH ACUTE. In this case, if the font supports the capital E glyph and the acute glyph, but not a single glyph for 00C9, ScriptGetCMap shows that 00C9 is unsupported. To determine the font support for a string that contains these kinds of code points, the application can call ScriptShape. If the function returns S_OK, the application should check the output for missing glyphs.

    Kind of gives a roadmap to how to think about the problem, and inspires some confidence that they are on the right track. :-)

     

    This post brought to you by (U+fffd, a.k.a. REPLACEMENT CHARACTER)

  • Sorting it all Out

    On installing and removing fonts, Part 1: Do I know you, or some version of you at least?

    • 7 Comments

    Previous blogs in this series:

    0: A long journey begins with the zeroeth step

    One of the first things people do when they enter the room is make themselves known.

    If everyone knows everyone else they don't even need names, otherwise they give their name.

    If they are twins (and this has happened to me, within the last few years!) the twin who is there will identify which "version" they are.

    The version thing is pretty important, as you can see.

    Now version information is pretty straightforward in Windows....

    Hell, I can't even say that with a straight face.

    You can look at Raymond Chen's Blog, and blogs like this one and that one and this other one if you want a hint about how complicated it is.

    If you are internal to MS you might have seen the dozens of small utilities put up by developers and testers over the years, almost all of which properly update at a maximum just one of the two items to update (the string version of the resources and the binary one), and some did not even get one of them right.

    So version stuff on binaries is tough in Windows.

    But tough as it is, it is actually a lot easier than the version information in fonts, unless you are writing an application that walks through the font file data ad its purpose in life.

    Because in order to get the font version information, you have to walk through font file data.

    Specifically, you are looking for the name table.

    This table, this name table, has an entry in it that specifies the version.

    The entry might just be 1.0. Or it might be 1.0.3. Or it might be 867.5309. Or even 3.1415926.

    Crap, it could be Version 3.14. Or it could use the word Versio, or Versión, or Versione. You get the point.

    Or really anything -- it is a string, after all.

    Maybe we can hope for valid characters....

    The exact format of the string can be anything, and a font vendor/foundry can even change the format they use between versions, if they want. If they do they might even have a good reason for wanting to do that. But even if they did not, they can still do it.

    Hell, two different fonts from two different places can have the same name and the first of many comparisons you might do is with the name and the version. Just dig right in, like I did in Getting all of the localized names of a font but perhaps much more efficiently. :-)

    If you are building something that has the job to install a font and you need to know if a font already there is a prior version or a later version, you have to be able to parse that information so you can compare the two (and decide which one is newer).

    I find myself grateful about the fact that the twins I referred to previously were in two different disciplines - development and test -- and therefore I did not fancy heuristics to tell them apart; merely listening to what they opinions were would often quickly identify the one present, if the meeting itself did not!

    No such boon can be assumed with fonts. Your generic awesome font installation code needs to assume the worst.

    Since the worst can indeed happen.

    Okay, this seems like a fair enough start, hinting at inner depths. Next time I'll dig into some of those depths a bit....

     

    This blog brought to you by(U+FF46, aka FULLWIDTH LATIN SMALL LETTER F)

  • Sorting it all Out

    Oceania is at war with WM_SETTINGCHANGE; Oceania has always been at war with WM_SETTINGCHANGE

    • 9 Comments

    The question that came to a managed code alias was easy enough to see:

    How can I either (1) detect that a system environment variable has changed, or (2) get the current system-wide setting of an environment variable (not necessarily the value of the variable as it was set when my app was launched)?

    Windows apps are notified via the WM_WININICHANGE, but I’m looking for a way to do it in a .NET app.

    I'll deal with the second paragraph first -- why is it that everyone wants to do stuff the ".NET way" even hen asking a purely Windows question that even when there is a 100% managed way one has to realize that under the covers there will be numerous native things going on, whether pinvokes or other. Is this a naive code purity thing, a desire to eat the steak without thinking about the cow?

    Okay, we'll move past that. Someone ignored the manged-only piece and suggested they take a look at the WM_SETTINGCHANGE message, but there was a less than positive response to that idea:

    Thanks for the response.

    I don’t know how the... link will help---when I change an environment variable, the WM_WININICHANGE message is sent to all top-level windows (tested on Vista), and not the WM_SETTINGCHANGE message.

    Also, one question I forgot to ask is: if I only can capture some notification that the system environment changed, then that’s not enough; how do I get the contents of the new system environment?

    I laughed when I saw that one, I mean what with the following defined in winuser.h since the late 90's:

    #define WM_WININICHANGE                 0x001A
    #if(WINVER >= 0x0400)
    #define WM_SETTINGCHANGE                WM_WININICHANGE
    #endif /* WINVER >= 0x0400 */

    Classic!

    Like that old Bill Tush skit with the teenagers who argue because one loves the words but hates the lyrics, while the other insists that the lyrics are awesome but the words suck.

    Only not contrived!

    In fairness, the WM_WININICHANGE message and the WM_SETTINGCHANGE message don't go out of their way to help here -- they in fact seem designed to do the opposite, really.

    You know, like they put on their Eagles hats and sing "Relax, said the night man, we are programmed to deceive" without realizing they have the words wrong; they don't realize the topics are writing checks that the header files can't cash....

    Whether one looks at the amusing text in the WM_WININICHANGE message:

    An application sends the WM_WININICHANGE message to all top-level windows after making a change to the WIN.INI file. The SystemParametersInfo function sends this message after an application uses the function to change a setting in WIN.INI.

    Note  The WM_WININICHANGE message is provided only for compatibility with earlier versions of the system. Applications should use the WM_SETTINGCHANGE message.

    With a claim that the wParam is not used and just a small message about the lParam:

    A pointer to a string containing the name of the system parameter that was changed. For example, this string can be the name of a registry key or the name of a section in the Win.ini file. This parameter is not particularly useful in determining which system parameter changed. For example, when the string is a registry name, it typically indicates only the leaf node in the registry, not the whole path. In addition, some applications send this message with lParam set to NULL. In general, when you receive this message, you should check and reload any system parameter settings that are used by your application.

    All of which is most amusing when you look at the very different text in the WM_SETTINGCHANGE message that gives numerous details about the values of both parameters.

    Though ultimately the funniest part being the requirements for both functions:

    Client Requires Windows Vista, Windows XP, or Windows 2000 Professional.
    Server Requires Windows Server 2008, Windows Server 2003, or Windows 2000 Server.

    This is something that makes very little sense given just the header file alone, let alone the pragmatic knowledge between the two topics and the past knowledge of many a Windows developer. :-)

    The language of the documentation, it can be its own dialect of English at times....

     

    No Unicode character wanted to support this blog except some of the old ones that were removed from Unicode as a part of the 10646 merger

  • Sorting it all Out

    Developers are really not generally ones for spontanenous make-out sessions (aka Code jocks can talk themselves out of anything)

    • 11 Comments

    Just got home a little while ago, and I am definitely BWI (blogging while intoxicated). This is something that a friend of mine warned me about, but I think it'll be okay.

    The only real risk is that sometimes you can have an idea that seems like a really good one at the time (since you are drunk) but once you are sober you realize was not as good as you thought it was.

    Kind of the beer goggles approach to blogging for software developers in that weird place that I call:

    Not too drunk to write code, but way to drunk to be checking code in.

    Many developers (as well as some testers and program managers) who have similarly blurred work/life balances will know what I am talking about.

    Anyway, I was with some friends at a club and we ended up having a very random conversation.

    We were at the bar, so those things can happen.

    Someone else who was getting a drink was telling her friend that she had seen What Women Want the other day (that movie I talked about back in I am 20 out of 21 and flexible on the capital punishment issue) and she was asking her friend if she ever was sitting with someone who they suddenly found themselves making out with.

    They got their drinks and left, but the question lived on with us.

    I pointed out it has happened to me, though I don't usually remember initiating anything -- I am much more of an "almost initiate but back off at the last minute" kind of guy, which allows it to either happen eventually or never happen, depending on the preference of the other person.

    While still taking the first -- potentially embarrassing -- step of admitting interest.

    The other times that spontaneous make out sessions happened, it was either completely the other person, or maybe no one initiated and it just happened. Like spontaneous combustion or something.

    And no, this is not a "being drunk" kind of thing -- I learned years ago to keep it holstered when drinking; it is just better for everyone.

    For example, I didn't make out with anyone tonight. :-)

    It reminds me of an incident from nearly two decades ago -- I was at a party at Johanna's house, the last party where I ever seriously drank beer. I had way too much, and I was sitting with Johanna out on the stoop. Suddenly I realized something:

    Michael: Jo, can I ask you something?
    Johanna: Sure.
    Michael: I have to throw up now. Should I go (a) go inside to the toilet, (b) go behind the bushes over there, or (c) do it right here on the sidewalk?
    <<pause while Johanna, who was also pretty drunk, thinks about this>>
    Johanna: If you do it out here then people will get it on their shoes later when they leave. But if you try to go inside you may not make it due to lines. I'd go with the bushes.
    Michael: Very sound reasoning. will you excuse me for a moment?
    Johanna: Certainly.
    <<pause while I go off to throw up>>

    Now what was most odd about this was the way that a pressing need/want to do something came up, yet there was a surrealistic pause to analyze the issue and weigh options. Who the hell takes the time, or at least spend the time they have in that particular way?

    But have you ever found yourself in one of those situations where you are irrevocably committed to that kiss that you know will become a make-out session in a real Liz Phair Why Can't I? sort of way (ref: here and here), yet you take the time beforehand to analyze it with the other person?

    I am not sure what the hell this is -- I mean the only thing you can really accomplish here is to talk yourself or the other person out of it. Maybe it is intentional auto-sabotage? But it has happened, and it kind of makes me think that these rare spontaneous make-out sessions weren't my idea (since if they were I wouldn't trying to be talking anyone out of anything).

    Though it isn't like this is such a common occurrence that I have a real statistical universe from which to draw conclusions.

    But I was having lunch with a friend the other day, and I remember at a previous lunch she related something like this happening at a party. Though she hadn't mentioned any attempts to talk anyone out of anything so I suspect that maybe this is just me (or anyone who thinks themselves generally unworthy?).

    The decision of the group of people (my friends who were there and the people sitting around us who got into the conversation) was that for most people it only happens when one or both of the people involved have been drinking, and there are seldom huge conversations beforehand. for normal people the spontaneous make out sessions are rare but when they do happen they are truly spontaneous make out sessions.

    The secondary conclusion (based on my experiences and one other guy's -- a guy who was also a software developer) was that software developers are the only ones who would make the mistake of talking their way out of it, of snatching defeat from the jaws of victory and wearing it like a shawl. And that we should really try to work on that.

    I agreed to take it under advisement (though the situation is not all that common these days so it feels like a fairly theoretical point....

    So how about regular readers -- any spontaneous kissing with people you haven't kissed before?

    And if so, is it really spontaneous or so you do your damnedest to talk your way out of it first?

    And finally, are you a software developer? :-)

     

    This blog brought to you by ? (U+003f, aka QUESTION MARK)

Page 1 of 4 (50 items) 1234