Blog - Title

August, 2008

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    UCS-2 --> UTF-16, Part 0: The intro, sans content

    • 14 Comments

    Okay, this blog is going to serve as a warning that a whole bunch of blogs in this Blog are about to happen about a particular topic.

    The topic is one I have kind of talked about before.

    The difference in software between UCS-2 and UTF-16, and what is involved with migrating code that "covers" the former to code that covers the latter.

    Now the reason that this difference is interesting is just about everyone who is asking the questions (and there seem to be a lot of them, especially these last couple of weeks!), is handicapped by several issues, from incorrect assumptions about what works for them today to inaccurate picture of what they need to do to fix the problems to inappropriate plans for the scope of the work to plan.

    It's a mess, it really is. I'm actually even going go change some of the content of a training presentation that is coming up to cover this topic a bit more, too.

    Maybe I'll even mention this series! :-)

    Anyway, consider this the content-free introduction to this exciting series.

    If you are one of the people currently looking at this problem and are doing so with the unreserved joy one might feel for the removal of an impacted wisdom tooth, then this series is here especially for you! :-)

     

    All of the characters in Unicode are taking the long weekend off. I'll see if some of the non-characters stuck in town might want to sponsor....

  • Sorting it all Out

    What's in a name?

    • 13 Comments

    One of the core tenets of globalization and localizability of software is that making assumptions in formatting information will lead to bugs and limitations that will keep people in other cultures from properly using the software.

    There are two sides to this.

    On the globalization side, there is (for example) the formatting of numbers, dates, and times. There is the sorting of lists, and so on.

    On the localizability side, there is (for example) assumptions about word order in inserts that would violate the grammar of the target language (leading in many cases to grammatically poor sentences in the target language in order to accommodate the badly placed inserts).

    Then there are examples that actually span both globalization and localizability, like the names of people.

    I can't imagine what people do when they have to enter their name in an online form that insists on a name that is made of a single word first name, possibly a middle initial, and a single word last name -- none containing any punctuation.

    Right in Windows International we have many examples of names that violate such simplistic rules (rules which, though easing the complexity of software development and database storage, blithely ignores the reality of names throughout the world).

    Take for example Group Manager Jan Roelof Falkena.

    His last name is Falkena.

    Now in Jan Roelof's own words, "The use of double names (without hyphens) is fairly common back home."

    Thus his first name is not merely Jan, any more than Captain Jean-Luc Picard's first name is Jean. and putting Jan R. Falkena in such a form would be ridiculous, and not at all how his parents or he would have wanted his name expressed.

    Or take Test Lead Gerardo Villarreal Guzman.

    His first name is Gerardo.

    His last name is derived half from his father's name (Villarreal) and his mother's (Guzman). The hyphen is not used between these two halves, and the name itself becomes an interesting symbol of what singer/songwriter Gavin DeGraw referred to as "the birth of two souls in one". Which in my opinion is actually kind of a nice thing, culturally speaking.

    Now coming to the USA and knowing how inflexible so many process are about names, he might easily have been willing to simply go by Gerardo Villarreal and saved himself the grief (that is, for example, his name on Facebook), though the fact that Gerardo Villarreal Guzman is the name on his passport made that much more problematic for the company address book in other such places.

    To extend this a little bit, Gerardo Villarreal Guzman is married to Hortensia Ortiz Roffe.

    Their children are:

    • David Villarreal Ortiz
    • Paola Villarreal Ortiz

    Now the dropping of the maternally derived surnames from both parent's names is common and if you think about is one of the only way to really scale names across many generations, as I am sure neither David Villarreal Guzman Ortiz Roffe nor Paola Villarreal Guzman Ortiz Roffe would be terribly hasppy having to fill out forms with their names in them! :-)

    Though interestingly, when the names are more well-known due to political or economic or cultural influences the full name sometimes is retained, and in that case hyphenated -- thus if Gerardo were famous his children might be David and Paola Villarreal-Guzman Ortiz, or alternately if Hortensia were famous might have led to their names being David and Paola Villarreal Ortiz-Roffe.

    Though one could take such a practice with a cynical eye and look at it as a form of snobbery, I'd rather give such a practice a more culturally kind eye and look at it as just remembering identities that could have unique significance to others in the future.

    Even the other names mentioned above, from the fictional hyphenated French name Jean-Luc Picard (who would have to deal with the indignity of the Risa planetary computer system not allowing the hyphen) to the singer/songwriter Gavin DeGraw (who might sometimes be forced to go by Degraw due to a system not remembering the case of letters in the name -- which sucks -- or worse titlecasing -- which also sucks).

    And then there are readers of this blog like Gé van Gasteren and Jeroen Ruigrok van der Werven, both having names that would confound these systems.

    Or the way Japanese names are usually given in the form <family name> <given name>, well other than the imperial family.

    Or the different practices used in North and South India (the latter often not including a surname).

    The list could go on for hours -- I could have even included more specific examples like I did with Gerardo and Jan Roelof if I had more time to ask people for permission to "use" their names for more extended analysis).

    The fact is, the simplified structure of names "used in the United States" is kind of a lie anyway since many of these people live in the US.

    And thus while falling under the theoretical heading of a localizability issue, is probably better thought of as an issue that is important independent of the need to prepare for localization since this flexibility is required even in products that are not being localized, or in non-localized versions of products.

    Though it is also important in localization, so that localizers can reposition controls to meet the most common expectations for a target language.

    Which I guess gets back to answering the question What's in a name?

    Respect, or the lack thereof....


    This blog brought to you by(U+337b, aka SQUARE ERA NAME HEISEI)

  • Sorting it all Out

    A serious lack of overlap with 64

    • 10 Comments

    So it started over on the VistaWindows Experience Blog's blog Watch NBC’s coverage of the Beijing Olympics in Windows Media Center.

    Some eager folks who downloaded the client were dismayed to see it was 32-bit only. Brandon (the blog's author) mentioned in comments that there may be news on this in the future, so perhaps there is eventual hope.

    Garrett McGowan summed up my feelings nicely:

    Awesome! Except...after downloading the 10MB client, I'm informed that 'Sorry, 32-bit Windows Vista only.'

    Foot, meet bullet.

    I'm serious about 64-bit. I've been running 64-bit on my primary home machine since February 2007. I put up with the early, shaky device drivers. I put up with the late availability of Windows Live software and the incredibly late Windows Home Server Client. Actually no, I gave up on that one, along with Creative's X-Fi full-feature drivers (they finally dropped this week). But now this.

    When is the 'ecosystem' going to take 64-bit seriously?

    I agree 100%.

    I had one of my own situations like that recently. I had finally decided to dump the 50gb of space I had set aside for the 32-bit version of Vista on my MacBook Pro and just go full 64-bit and use the space for other purposes.

    Soon after, I had installed many language packs, and subsequently made a sad discovery.

    Language Interface Packs were not being made available for 64-bit.

    "How could that be?" I wondered. I mean, the build process for them is the same as for the Language Packs that are available for 64-bit. The only difference is that they end up building a bit faster since by definition that have fewer language resources in them. I knew they could easily exist, I must be mistaken in my investigation that seemed to indicate they weren't available.

    But it was no mistake -- someone had deemed that the overlap of people running on 64-bit hardware and the people who need Language Interface Packs because they need windows in their native language was apparently not high enough to really merit the resources to release the 64-bit LIPs.

    So there was no reason for the official build machines to build them since they weren't being released like the 32-bit ones were.

    And there was no specific reason to test them since they weren't being released like the 32-bit ones were.

    Lame.

    Though I suppose this is a trend that can be easily reversed when/if the determination is made that the overlap of hardware and language makes it more sensible, strategically.

    This is hardly isolated, mind you. There are way too many examples where people just don't think 64-bit is needed. At least not yet.

    But then like many I wonder how all of the predictions about the end of 32-bit on the horizon can be true when there are so many examples like this happening.

    In the meantime, we won't be watching the Olympics on our 64-bit machines while we are in India, which is maybe okay because the Olympics viewing is currently supported in the US only anyway.

    But that is a rant for another day....

     

    This blog brought to you by 𝅘𝅥𝅱𝅁 (U+1d163 U+1d141, aka MUSICAL SYMBOL SIXTY-FOURTH NOTE and MUSICAL SYMBOL SIXTY-FOURTH REST)

  • Sorting it all Out

    On being water soluble

    • 8 Comments

    I probably stopped liking milk by itself around the time that I was 10, and stopped liking it even in cereal and Kraft Mac & Cheese by the time I was 20.

    It just never seemed cold enough -- so it tasted like it was going bad or something.

    Hard to explain it, really.

    Anyway, soon after that I started not liking water, either.

    At first I switched to bottled water or nothing, which was less common then so usually I just had something other than water.

    Eventually even the bottled water just wasn't cutting it.

    I stopped drinking water before I hit 25, though very little after I was 22 anyway.

    You know at restaurants they always give you water, even if you don't want it.

    Just walking in someone who doesn't even talk to me gives me a glass of water that I'm not gonna drink.

    I can refuse it, but no water glass just begs for some random server to get you a glass of water.

    If you leave the glass there empty people keep trying to fill it, assuming I'm just a huge water drinker.

    Even turning the glass uoside down doesn't help; someone with a pitcher will pass by and turn it over to fill it.

    I finally just gave up; they fill the glass, and I just don't drink it.

    People would ask about it, I would just tell them that I am water soluable.

    They would be confused.

    So I'd ask them if they ever saw The Wizard of Oz.

    Of course they would say yes, so I 'd explain that the thing with the Wicked Witch of the West and the pail?

    That witch, I'd explain, was water soluble.

    This would lead to various reactions, usually amusement or bemusement.

    Occasionally people would be very clever and point out how much it must suck to live in Seattle.

    Other times I'd say it -- part of the shpiel, you know?

    I'd explain it is not entirely a blocking issue -- it just makes things a little messier....

    In recent times, I have found myself with people around me who consider this whole aspect of my personality to be kind of a health issue.

    They aren't all girlfriends or anything, but they do seem to have that "mother-of-all-the-living" complex and they are concerned that I do my best to make sure I drink nothing but Limonata.

    So they encourage me to drink water.

    I do now occasionally have water.

    And each time I do it I make sure to mention it.

    Because I hate the way it tastes, and if I have to suffer I at least want to get the credit for having done it....

    Now if I were a billionaire and I had these odd quirks then people would consider me eccentric. They definitely would after I added a post-mix fountain containing Limonata, which I would do if I were a billionaire.

    Hell, I'd do that if it were less than that even.

    But anyway since I'm not a billionaire, I think this just makes me weird.

    In case you didn't think so already....

     

    This blog brought to you by(U+4ddc, aka HEXAGRAM FOR THE ABYSMAL WATER)

  • Sorting it all Out

    The super-cool panel about Windows, .NET, and SQL Server -- now live!

    • 6 Comments

    Last night, at the .NET Developer's Association meeting, I got to see Kimberly Tripp and her husband do a pretty awesome talk about SQL Server index internals and fragmentation. In the conversation after I was reminded of a whole pile of SQL Server blogs that I have to jump in and do. I'll have to start getting on that....

    There was also a bit of news earlier in the day which was also very nice.

    About the super-cool panel about Windows, .NET, and SQL Server, which at long last is now available!

    This is a panel that I was involved with while I was over at TechEd earlier this summer.

    The title?

    Internationalization and SQL Server: Sorting out Collations between Windows and .NET

    The title might sound difficult to navigate, but it isn't -- just plug in your high quality parser and go for it! :-)

    I originally pitched the idea of this panel and though Steven Seim seemed enthusiastic, the people tapped to do it showed varying lesser degrees of enthusiasm, probably based on their respective levels of busy-ness they each had to deal with that week. Though everyone seemed up fr it by the time we were ready to go and kind of had the roadmap for the kind of stuff we wanted to cover.

    The idea came about by accident, originally based a phone conversation between Goldie and I probably six months or longer before all this, about some of the issues that SQL Server handles so much better than Windows -- like multiple version handling and real backcompat support. At the time I realized how much better of a place SQL Server was than Windows. It was almost like it was evolutionary steps beyond Windows because of those scenarios. And how it worked with managed code (aka the Hosted CLR) was yet another dimension to consider. It thought it would be useful to talk about it....

    The original list of participants for the panel:

    • Mary Chipman acting as the moderator
    • Goldie Chaudhuri from the SQL Server side
    • Michael Kaplan (yours truly!) from the Windows side

    This was kind of the original plan for the panel, and I think it would have been a perfectly fine and useful presentation of information.

    But then, enter Kent Alstad from Strangeloop, and The Real World! :-)

    Kent was our wildcard, our man of mystery, the one who took the panel and really helped us turned it up to eleven -- made it about some real customer-facing issues that developers are all too often hit with, and hit with hard.

    Like I said, the original plan was a good one.

    But what the four of us did? It put the original plan to shame and turned "good" into "great" faster than you can say morebetter!

    The description of the session:

    The collation support in SQL Server has been based on Windows since 7.0, even though the data has never been the same between them in any version of SQL Server or Windows. This panel will look at the technical and philosophical differences between them and the effect on results, as well as the impact of the .NET Framework on both of them—and the effect of all of these various differences on developers of any or all of them.

    And this is up to and including how to get both the best possible results, and the most consistent ones (hint: the number 2008 may come up three times, genuinely!).

    You can see the session on the big list of panels and talks with all the others here:

    And you can download it as podcast or video -- a WMV, an MP4, or an MP3, and see and/or hear the whole thing.

    This was a lot of fun to do, and hopefully some people here will find it fun to watch!

    For those who are interested, the other video I was involved with (a conversation between Mary and I, entitled Beyond FxCop-style Internationalization in .NET) is also available up on that same site in both video and podcast. Though not as cool as the panel, it also has some interesting stuff in it....

    Enjoy! :-)

     

    This blog brought to you by(U+a31a, aka YI SYLLABLE SOP)

  • Sorting it all Out

    To some, the name might be the WRONG SINGLE QUOTATION MARK

    • 6 Comments

    In the last item in the Suggestion Box as of the time I wrote this blog, Gé van Gasteren asked in comments to A more usable Dutch keyboard that works properly?, over here and here:

    Thanks, Michael, for giving me the full treatment! Interestingly, the great job you did would look great in the MSKLC documentation, but 99% of it was wasted on me, because I had gone through all that, whereas 1% (one little remark) suggested a possible solution -- or as close to a solution as practically possible.

    But first re. the problem: What you describe certainly looks like it should work, and it does in Test Keyboard Layout. But when I generate the installer and actually install the layout, I get the problem I mentioned in my post: The quote key stops working as a dead key and produces two curly quotes with each keystroke.

    This does not happen when I don't assign U+2019 to it but the spacing acute U+00B4, possibly because that one is in the ASCII range (as I mentioned in a later post, added as a comment to the first suggestion).

    So if you have really installed the layout you created in your post and it worked correctly for you, there is something wrong with my XP setup, or the thing only works properly in Vista, or whatever.

    Now for the brilliant 1%:

    First you talk about switching off that 'brilliant quotes' feature, and right after that about calling product support. I guess that latter bit would be a long shot, a tall order, and what not.

    But that gave me this idea, much easier to implement and to get consent for:

    Microsoft should ship all Dutch-language software packages with the default for the smart quotes feature set to "disabled". Tadaa!

    This simple measure would make all non-typographs produce straight quotes ' and " when typing. Not beautiful, but correct. Those interested in typography would switch the feature ON, and would usually (hopefully...!) be interested enough to use the proper curly quotes in the special cases I mentioned.

    The only wish after that would be to have U+2019 more easily available, e.g. on Alt-Gr-quote. But I think you wrote somewhere that existing layouts are never (never!) changed, so I'll learn to live with that.

    So how to make such a suggestion to product support and make it stick?

    ----------

    Reading the series about Table Driven Text Service, there may be a better way (now or soon) to implement smart quotes:

    If it is possible for applications to switch such tables on and off, there could be a setting in an application called "Convert quotes", with options "On", "Let me choose", and "Off".

    A Dutch-language application could have "Let me choose" as the default, and at typing a quote, a choice box with ‘  ’ and ' could pop up.

    I'm not sure I understood it rightly (after reading all ten installments in one session, my brain is a bit frazzled by the Chinglish) that several tables can be active simultaneously and on top of each other like CSSes (e.g. one for auto-correcting, one for quotes, one user-defined, etc.) but that seems necessary to make this kind of switching practical.

    And, apart from making the "smart quotes" smarter, this approach has the advantage that it is customizable!

    Okay, first I'll start by pointing out that in the only version of the keyboard lyout I can still find on my machine that I was playing with, I did not have U+2019 defined on the key itself.

    Though I did have it as the spacing version of the character at the bottom of the dead key table, as you can see here:

    http://www.trigeminal.com/images/dutchest.png

    So it may be that when I thought I was saying that there was no bug here that there may be one -- I cannot keep from getting two ’ (U+2019) characters if I define the dead key on U+2019.

    Anyway, as I typed I was getting the right character showing up, so I was pretty much paying the most attention to that.

    Which points to the definite workaround -- in fact, though I did not change the "name" on the dead key, MSKLC supports changing it -- even to RIGHT SINGLE QUOTATION MARK, if you like -- so you can make it anything you like and essentially never even see that the character ever, except in the people who are looking at WM_DEADCHAR messages, or calling functions like ToUnicode/ToUnicodeEx.

    For the most part, this means nobody. :-)

    The bug leads to the title of this blog, and the pseudo-rename of U+2019 to WRONG SINGLE QUOTATION MARK makes for a very nice linguistic back-formation, something that I don't think I have seen before....

    I'll make sure that other bug gets reported. It isn't as simple as being an ASCII only issue but there is a problem here so it needs to be figured out.

    Now there is a lot of other content here in terms of suggestions or thoughts, and I especially liked the really solid attempt to solve some oft he problems related to smart quotes that have come up over time. Though I think they are interesting ideas, moving to use a text based TSF text profile would be a huge change for a lot of users, so ther would need to be a really large number of people who needed this kind of functionality.

    I don't think we are really quite there yet in the Netherlands, even enough to ship an updated keyboard like the one suggested above, let alone a new TIP.

    Plus there is the lack of support for additional shift states which would definitely need to be addressed before anyone would even be willing to take a look at it!

    Then finally there were the suggestions about contacting product support.

    I do know one thing, for sure.

    If more of the customers who contacted me complaining about the "smarter quotes" feature in Word, or the other "feature" with CTRL+ALT shortcuts in Word that stomp on ALTGR characters (the feature that Marc Durdin dissected in an article I mentioned in The key to key messages is a key contribution), then perhaps the folks on the Word team would have the impression that they should mke it easier to alter these features than the current user interface allows. :-)

    I am going to play around with the TIP idea a bit, in any case. There are some really interesting possibilities that would be allowed here....

     

    This post brought to you by(U+2019, aka RIGHT SINGLE QUOTATION MARK)

  • Sorting it all Out

    '-yet someone is clearly doing their job horribly wrong...'

    • 5 Comments

    There is an uncomfortable amount of truth in a somewhat offensive metaphor the recent XKCD comic:

    Voting Machines

    I won't claim the analogy shouldn't get panned (as the tooltip kind of hints it will be and I might agree perhaps ought).

    Though it does help underscore an all-too-common problem in logic.

    And rationalization.

    Perhaps we could all strive to have more defensible chains of logic in our thought processes, rather than relying on the shaky logic of the unrelated precedent.

    This blog is not aimed at anyone in particular; the people guilty of such lapses "know" they are doing The Right Thing; it is why the guy hanging out in the shadows of The X-Files in the first episode (CSM) was also the one Mulder and Scully visited in the end of the series as well -- such people are survivors. :-)

     

    This blog brought to you by(U+2fae, aka KANGXI RADICAL WRONG)

  • Sorting it all Out

    The Bidi Algorithm's own SEP Field

    • 4 Comments

     

    There are many nice things that I can (and sometimes do) say about Unicode Standard Annex #9 (Unicode Bidirectional Algorithm), which I will call for the rest of this blog the UBA in order to avoid the repetitive and tiresome nature of "Unicode Bidirectional Algorithm". I know thast it is not a pronoun, but saying all those nouns over and over again really does wear you down so whatever shortcut works. :-)

    Anyway, what was I talking about?

    Oh yeah, about how there are so many nice things that I can (and sometimes do) say about UBA.

    This blog is not about any of them.

    Instead, this blog is going to focus on two particular limitations that in my opinion make the UBA less useful in software.

    I am thinking mainly about Windows, but after listening to people who work on the Mac and in Linux I think this is really a platform agnostic set of issues.

    Now I know some people think the issues are with input, but really they aren't. I mean I mentioned in blogs like Mirroring and Keyboards are complicated but that isn't what makes this really hard for application developers, most of the time. And it isn't why applications have bad or inconsistent behavior, by and large.

    In fact, it is not the input itself that is to blame but the rendering -- so cursor movement and all that are interesting but most of it is okay often enough that people would probably not notice problems if other things weren't going on.

    Plus, those other items are kind of subject to some variability based on platform and expectations, so while recommendations are nice these are not the blocking issues.

    I am therefore going to be looking elsewhere.

    The two issues I am focusing on here are:

    • The influence of and lack of guidance about "higher level protocols", and
    • The inability to handle multilingual text by default
    Now these items are ones I started really jumping into with other blogs like Mixing it up with bidirectional text and The Bug(s) Spotted, aka Design flaws are worse than bugs and The mythical nature of bidirectional support, and where the wheels come off the wagon.

    The simple problem is best stated as:

    The Unicode Bidirectional Algorithm cannot handle text from both left-to-right and right-to-left languages together in the same line of text.

    That is it, right there.

    Sure the UBA has all of that hand-wavey text about "higher level protocols" but all theyr eally did was create their own SEP field.

    You know what an SEP is, right?

    It's a Douglas Adam thing, so I'll let him explain it:

    An SEP is something we can't see, or don't see, or our brain doesn't let us see, because we think that it's somebody else's problem.... The brain just edits it out, it's like a blind spot. If you look at it directly you won't see it unless you know precisely what it is. Your only hope is to catch it by surprise out of the corner of your eye.

    This basically also explains why Unicode hasn't dealt with the issue, since they rely "...on people's natural predisposition not to see anything they don't want to, weren't expecting, or can't explain..." and talk about higher level protocols as a way of saying that someone else has to deal with it.

    But I can look at things like this:

    and this:

    and I know that there are quite a few inadequate somebody elses out there.

    Even my Mac runs into those same problems. Even when the text is plain:

    http://www.trigeminal.com/images/TextEditBidi.png

    The section in the UBA about Higher Level Protocols show how much clients are left on their own to figure stuff out:

    4.3 Higher-Level Protocols

    The following clauses are the only permissible ways for systems to apply higher-level protocols to the ordering of bidirectional text. Some of the clauses apply to segments of structured text. This refers to the situation where text is interpreted as being structured, whether with explicit markup such as XML or HTML, or internally structured such as in a word processor or spreadsheet. In such a case, a segment is span of text that is distinguished in some way by the structure. 

    HL1.

    Override P3, and set the paragraph embedding level explicitly.

    • A higher-level protocol may set the paragraph level explicitly and ignore P3. This can be done on the basis of the context, such as on a table cell, paragraph, document, or system level.
    HL2. Override W2, and set EN or AN explicitly.
    • A higher-level process may reset characters of type EN to AN, or vice versa, and ignore W2. For example, style sheet or markup information can be used within a span of text to override the setting of EN text to be always be AN, or vice versa.
    HL3. Emulate directional overrides or embedding codes.
    • A higher-level protocol can impose a directional override or embedding on a segment of structured text. The behavior must always be defined by reference to what would happen if the equivalent explicit codes as defined in the algorithm were inserted into the text. For example, a style sheet or markup can set the embedding level on a span of text.
    HL4. Apply the Bidirectional Algorithm to segments.
    • The Bidirectional Algorithm can be applied independently to one or more segments of structured text. For example, when displaying a document consisting of textual data and visible markup in an editor, a higher-level process can handle syntactic elements in the markup separately from the textual data.
    HL5. Provide artificial context.
    • Text can be processed by the Bidirectional Algorithm as if it were preceded by a character of a given type and/or followed by a character of a given type. This allows a piece of text that is extracted from a longer sequence of text to behave as it did in the larger context.
    HL6. Additional mirroring.
    • Characters with a resolved directionality of R that do not have the Bidi_Mirrored property can also be depicted by a mirrored glyph in specialized contexts. Such contexts include, but are not limited to, historic scripts and associated punctuation, private-use characters, and characters in mathematical expressions. (See Section 6, Mirroring.)

    Clauses HL1 and HL3 are not logically necessary; they are covered by applications of clauses HL4 and HL5. However, they are included for clarity because they are more common operations.

    As an example of the application of HL4, suppose an XML document contains the following fragment. (Note: This is a simplified example for illustration: element names, attribute names, and attribute values could all be involved.)

    ARABICenglishARABIC<e1 type='ab'>ARABICenglish<e2 type='cd'>english

    This can be analyzed as being five different segments:

    1. ARABICenglishARABIC
    2. <e1 type='ab'>
    3. ARABICenglish
    4. <e2 type='cd'>
    5. english

    To make the XML file readable as source text, the display in an editor could order these elements all in a uniform direction (for example, all left-to-right) and apply the Bidirectional Algorithm to each field separately. It could also choose to order the element names, attribute names, and attribute values uniformly in the same direction (for example, all left-to-right). For final display, the markup could be ignored, allowing all of the text (segments a, c, and e) to be reordered together.

    When text using a higher-level protocol is to be converted to Unicode plain text, for consistent appearance formatting codes should be inserted to ensure that the order matches that of the higher-level protocol.

    This information is so helpful that implementers can't even have their text look wrong in a consistent way -- every implementation has their own mistakes.

    Even in plain text, when the whole higher level protocol is arguable.

    And yes you can solve all such cases with RLM and LRM and RLE and LRE and PDF, sure. But with no standard on how to apply these in plain text, or how to make the standard itself pass my own "smart as an 8-year old" test (something those eight-year olds can do in cases like the above and in harder cases like in The mythical nature of bidirectional support, and where the wheels come off the wagon).

    Certainly some cases are exceptional, but the default case is mixed language text is broken now.

    More importantly, the "islands of text of one language in a sea of another language" is also broken. For no good reason, really.

    Perhaps the organization that Microsoft and all of these other big companies pay ten times the price of an Optimus keyboard a year to needs to start doing a bit of higher level work here, rather than passing the buck to random protocols.

    Because it is clearly our problem (and everyone else's)....

    Which makes it theirs! :-)


    This blog brought to you by U+200e and U+200f (aka LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK)

  • Sorting it all Out

    Keyboard Layouts, everywhere!

    • 4 Comments

    I have been pointing to the website that has Windows keyboard layouts:

    http://www.microsoft.com/globaldev/reference/keyboards.mspx

    on it for some time.

    Now this is a great site, you may have seen it before:

    http://www.trigeminal.com/images/KeyboardLayoutInfo01.png

    Though it is not perfect.

    One big problem is has is actually described right there on the page.

    Do you see it?

    Here, I'll emphasize it:

    http://www.trigeminal.com/images/KeyboardLayoutInfo02.png

    Now you can see I'm running in FireFox here. That is kind of all I do now.

    Staring after a bizarre incident with an IE plug-in that hung IE7 dozens of times a day.

    I decided to live with it and be a good little serf and only gave up after the first time a 75% done blog was basically lost.

    I uninstalled IE7 and put in FireFox right after.

    Mostly a great experience, though the experience with Windows keyboard layouts was really less than ideal.

    Here, I'll show you:

    http://www.trigeminal.com/images/KeyboardLayoutInfo03.png

    So would you class this as an usability problem or an accessibility problem? :-)

    We found a web developer to fix it up but that didn't work out, so finally we decided to just do it all ourselves.

    After a bit of work we migrated everything to the GoGlobal site I have mentioned a few times before today until we were all set, and now it is live at

    http://msdn.microsoft.com/goglobal/en-us/bb964651.aspx

    Exciting?

    You can even shrink it way down to:

    http://msdn.microsoft.com/bb964651.aspx

    if you want. and now you get to this exciting new site:

    http://www.trigeminal.com/images/KeyboardLayoutInfo04.png

    and that site works everywhere.

    In FireFox:

    http://www.trigeminal.com/images/KeyboardLayoutInfo05.png

    and in Safari:

    http://www.trigeminal.com/images/KeyboardLayoutInfo06.png

    and in Opera:

    http://www.trigeminal.com/images/KeyboardLayoutInfo07.png

    and of course in Internet Explorer:

    http://www.trigeminal.com/images/KeyboardLayoutInfo08.png

    All of the bugs that had been plaguing these various browsers previously, including:

    • Empty display devoid of any discernable keyboard layout
    • Non-functional shift keys
    • No ToolTips (which would contain character names and dead keys)

    are all fixed now on this exciting new page on GoGlobal!

    Of course now I can once again use the page in my default browser and get good results.

    There is still one more thing to do and we'll be taking care of that soon.

    But in the meantime, enjoy this new page on GoGlobal, today. :-)

     

    No sponsors needed for this blog; I did this one pro bono....

  • Sorting it all Out

    A technology is worth: $0; A sample showing how to use it: $0; A debug-able sample: Priceless!

    • 4 Comments

    Any time someone from Microsoft talks about some exciting technology that is easy to use, there is often a good faith basis for you, the customer, to assume they might be blowing smoke up your ass.

    In fact, in most cases, you have a built-in affirmative defense you can use to defend yourself if they call bullshit on your claim of shenaningans.

    That defense is based on the simple fact that they usually don't include actual samples!

    If the technology is so easy that no one has time to put together a good sample where people can see the technology and understand it well enough to apply it, then it is obviously not so easy, and the claims to the contrary are from people who are so busy talking about how easy it is to use that they probably have never actually used it.

    Examples of this phenomenon that i have mentioned previously can be seen in TSF and Uniscribe, two technologies that if you ask me are harder than brain surgery.

    And I can state that with some authority, because I have witnessed several brain surgeries in a previous career, and all of them but one (a transsphenoidal resection of a pituitary tumor) were less complicated than full implementations supporting the features of either the Text Services Framework or Uniscribe. :-)

    Now previously one could have put MUI in that same category, since although it had all of that cool documentation I mentioned, it didn't have a sample.

    But then they added one. :-)

    You can see it described extensively here under the article entitled MUI Application Sample.

    And you can find it in the samples you get with the Vista and Windows 2008 SDK.

    And it puts the files right on your machine....


    Though not all is perfect if you use the project and solution files to build them through Visual Studio.

    Well, if you want to run and debug the project.

    To fix the problems, just right-click on the EN-US project node:

    Choose the Properties node.

    Yes, that node should have a ... after it since it launches a dialog, but I am not the UI police!

    Look under the Debugging item under Configuration Properties, like so:

    then you just have to change the Command option to

    $(SolutionDir)%(OutDir)\$(TargetFileName)

    and the Working Directiory option to

    $(SolutionDir)%(OutDir)\

    like so:

    And then the sample should b able to be debugged from within Visual Studio. :-)

    Samples go a long way to proving the ease of a technology.

    But debug-able samples? Priceless!

     

    This blog brought to you by 𐑧 (U+10467, aka SHAVIAN LETTER EGG)

  • Sorting it all Out

    Its the End[UpdateResource] of the world we know it

    • 4 Comments

    It was late last week when Maksim asked a very interesting question via email to one of those large aliases at Microsoft:

    SUBJECT: EndUpdateResource failing after adding cirtain number of items with UpdateResource

    Hi,

    It appears that there is a bug (or undocumented behavior anyway) with BeginUpdate/Update/EndUpdateResource functions.

    When I am adding more than certain number of resources this way, EndUpdateResource returns with error ERROR_INVALID_DATA. The exact count of items is not always the same and varies depending on the length of resource names and resource types that I have.

    After running several experiments I have discovered that that the problem occurs according to following formula:

    (Cumulative Resource Names Length) + (Resources Count) * 25 + (Cumulative Resource Types Length) + (Resource Types Count) * 13 > 2040

    Can someone please say if there is a bug and if my assumed formula is correct? Or may be there is some other workaround apart from doing EndUpdateResource after adding each resource.

    My source code is below, the dll where I updated resources is a simple dll without any code:

    #include "stdafx.h"
    #include <string>
    #include <iostream>

    using namespace std;

    wstring MakeLongName(size_t length) {
          int randomNumber = rand();
          TCHAR buffer[65];
          ZeroMemory(buffer, 65);
          _itot_s(randomNumber, buffer, 65, 10);
          wstring randomPart = buffer;
          length -= randomPart.length();
          wstring result;
          result.append(length, 'X');
          result.append(randomPart);
          return result;
    }

    int _tmain(int argc, _TCHAR* argv[]) {
          CopyFile(L".\\testdll.dll", L".\\testdll1.dll", FALSE);

          HANDLE hLibrary = BeginUpdateResource(L".\\testdll1.dll", TRUE);
          if(hLibrary==NULL) {
                cout << "Failed to BeginUpdateResource. Error: " << GetLastError() << endl;
                return 1;
          }

          for(long i = 0; i < 10; i++) {
                BYTE data[100];
                ZeroMemory(data, 100);
                wstring longName = MakeLongName(230);

                if(! UpdateResource(hLibrary, L"Y", longName.c_str(), MAKELANGID(LANG_NEUTRAL, SUBLANG_NEUTRAL), data, 100)) {
                      cout << "Failed to UpdateResource. Error: " << GetLastError() << endl;
                      EndUpdateResource(hLibrary, TRUE);
                      return 1;
                }
          }

          if(! EndUpdateResource(hLibrary, FALSE) ) {
                cout << "Failed to EndUpdateResource. Error: " << GetLastError() << endl;
                return 1;
          }
          return 0;
    }

    I had not seen this cone up before, but this is a function I have found interesting since all the way back when we the resource updating functions in MSLU (described here).

    The answer to this particular riddle came from developer Paul:

    EndUpdateResource fails if it cannot extend the .rsrc section of your DLL. I’ve seen this happen if the .rsrc section isn’t the last section in the image – and that’s frequently the case (a few experiments show that .reloc usually follows .rsrc using the Microsoft linker). Annoyingly, LINK.EXE always seems to insert a .reloc section, even if you have a resource-only DLL. (The formula you discovered is an approximation for “the .rsrc section cannot be extended”.)

    Now as to whether this is a bug of by design....

    It really is by design.

    Twice.

    Now I am not going to dig into the format of PE files, since for that you can look at:

    to get the lowdown here.

    So for the first by design we'll look to the linker.

    When the Microsoft Linker (LINK.EXE) does its work it makes a lot of sense that it makes the .reloc section last rather than the .rsrc section, because the latter is more or less gunk that is alread compiled by the Microsoft Resource Compiler (RC.EXE) and which it does no t really need to modify -- it just has to align, while the former is the section that it arguably has to do some of it hardest work in to have all of the relocation entries.

    Matt also has a less cynical reason he mentions in that second article:

    Working backwards from the end of the executable, if there is a .debug section in the OBJs, it's placed last in the executable. In the absence of a .debug section, the linker tries to put the .reloc section last because, in most cases, the Win32 loader won't need to read the relocation information. Cutting down the amount of the executable that needs to be read decreases the load time.

    Then for the second by design we'll look to the EndUpdateResource function and its cousins (BeginUpdateResource and UpdateResource), though really that first function I mentioned is the real bad boy here.

    While it does a bunch of work inside the .rsrc section, it doesn't start mucking around a whole bunch with the rest of the PE file. Reordering sections just fall a bit outside of its current beat, if you know what I mean.

    Paul had some thoughts about workarounds:

    If you have control over how “testdll1.dll” is created, you might be able to figure out how to manipulate the PE sections so that .rsrc always goes last. In my code, I was able to start with a hand-crafted resource-only PE file which had only a .rsrc section.

    Matt's first article gives some info on removing the .reloc section:

    If you do decide to remove relocations, there are three ways to do it. The easiest is to specify the /FIXED switch on the linker command line. Alternatively, you can run the REBASE program with the -f option on your executable. REBASE comes with the Win32 SDK. The third way to remove relocations is the new RemoveRelocations function in the Windows NT 4.0 IMAGEHLP.DLL. My sample code below shows how to use RemoveRelocations.

    Though to be honest this is something I try to avoid, especially with /FIXED, because I have seen multiple sources that suggest this to be a bad idea for two reasons:

    • If the file has to be relocated then it simply won't load, even if it's a resource-only DLL, unless you load it via LoadLibraryEx with the LOAD_LIBRARY_AS_DATAFILE type flags;
    • On debug builds, it seems that sometimes the Microsoft Linker still adds a .reloc section, even if you pass /FIXED, something I have not seen documented.

    Though your mileage may vary.

    And of course someone could write a tool to simply do the reordering of these two sections in the binary; the principal thing to worry about (and the easiest bit to mess up) is not aligning things properly, but that isn't too hard, so it might be worth just grabbing the source from Matt's PEDUMP (used in the last two articles on the list above) and the code to remove the .reloc section from the second one to use as a start and then working to just write the whole file out with these two sections reordered.

    Now if someone were to decide to fix it -- to unmark the by design flag on it -- whose job would it be?

    On the whole I'd say the fix should be in the EndUpdateResource function, for several reasons:

    • If my conjecture about the linker's operations is true, there is no need to make its work more complicated here;
    • There are very good reasons to not formally document or tie down the rules of image layout produced by the linker -- something that fixing this issue would do;
    • The potential performance benefit to putting the .reloc which is often not needed at the end and the .rsrc which is usually needed not at the end just makes sense;
    • The only people who might care about the section order are the people who call the EndUpdateResource function, so changing the rules for how everyting is built when only a small number of people would need it would be less than ideal;
    • The limitation itself is clearly in the EndUpdateResource function, and there are real benefits to having bugs fuxed where they are instead of architecting around them.

    Of course now we get to the really unfortunate aspect of all of this.

    In Windows, there are some components with specific owners, and others that are really considered to be very shared, with no specific owner who would be responsible for daoing major updates.

    Many times that "no owner" status comes in code that has not required changes in a long time.

    Code of that sort often finds new owners via the "Chess move" theory of development -- i.e. "you touched it, you own it", but the resource updating functions (BeginUpdateResource, UpdateResource, and EndUpdateResource) have proven quite resilient to this, with people who modify it managing to be able to avoid becoming owners except within the scope of their own changes.

    So finding someone to volunteer to own this particular change could prove to be a challenge (especially since one can fall back on the whole by design thing!).

     

    This blog brought to you by(U+32ae, aka CIRCLED IDEOGRAPH RESOURCE)

  • Sorting it all Out

    A blog on getting a Blog on the blogs.msdn.com Blogapalooza

    • 4 Comments

    Idan asks (via the Contact link):

    Hello Michael,
    I am reading your blog and wondering, can I start a blog on MSDN blogs too? Who is the address for this request?
    Thank you, Idan.

    I looked at the information about setting up blogs, and ran across the following information:

    Who can create and/or author a blog on blogs.msdn.com or blogs.technet.com?

    §  At this time, only full time employees of Microsoft may create a blog. 

     

    So it looks like the only way is to get a job working for Microsoft. :-)

    For that kind of thing I of course highly recommend Jobsblog!

    After you get hired then you can create a Blog here and blog away....

     

    This blog brought to you by ? (U+003f, aka QUESTION MARK)

  • Sorting it all Out

    SYSTEM_FONT hasn't been the system font since Reagan's first term in the White House

    • 3 Comments

    When you call the WM_GETFONT message to get the font from a control, the documentation says:

    The return value is a handle to the font used by the control, or NULL if the control is using the system font.

    And then when you look at the GetStockObject function, which mentions:

    SYSTEM_FONT -- System font. By default, the system uses the system font to draw menus, dialog box controls, and text.

    It seems like the most intuitive thing in the world to assume they are talking about the same thing.

    They aren't.

    In fact, as Raymond Chen pointed out in What are SYSTEM_FONT and DEFAULT_GUI_FONT? and I pointed out in DEFAULT_GUI_FONT really stinks, SYSTEM_FONT is not the system font.

    Or to be more accurate, once upon a time it was the system font.

    Many years ago.

    When internationalization support was a joke and language support was a dream.

    And Reagan was starting to think about his re-election campaign.

    So they really could rename the constant to SYSTEM_FONT_CIRCA_1992, deprecate the old one, and save everyone the confusion. :-)

     

    This blog brought to you by ! (U+0021, aka EXCLAMATION MARK, a character that was quite proud to be in the system font of even the earliest versions of Windows)

  • Sorting it all Out

    On describing poorly (and on not listening)

    • 3 Comments

    You may recall blogs like How do I feel about lstrcmpi? I think it blows.... where I pointed out a case where an entire division full of developers have essentially been doing something wrong for over a decade.

    I have even put out the straw man argument that perhaps if everyone gets it wrong so consistently then perhaps changing the function to meet might be easier than changing the expectations of so many developers.

    Now there are other times that we can run across this kind of problem.

    Like the one I ran across today as I was reviewing a ton of code.

    It is the problem I discuss in The user locale of the system account is not the system locale.

    There is important (if subtle) message in that blog.

    That message, quite simply put, is

    The default user locale of the SYSTEM account is not the default system locale.

    This is a point that I am physically unable to make any clearer.

    They aren't the same.

    The former, on the one hand, is changing the default user locale on the first tab in Regional and Language Options (RLO) and settable by applying changes to the system accounts on the bottom of the last tab of RLO. No reboot is needed but programs already running under the system account may need to be cycled to find out about the change. And its value is not returned by GetSystemDefaultLCID.

    The latter, on the other hand, is settable by changing the Language for non-Unicode programs.on the last tab of RLO, an the reboot is absolutely required even if you cycle services running under the system account.

    These two settings are not connected functionally and are only connected in people's minds because of their respective older names both having the word SYSTEM in them.

    There is a third setting that is also unrelated but bolsters the notion of the separate identity of the SYSTEM account -- the one I describe in UI language of the LocalSystem account (which almost never shows UI) -- which is not returned by GetSystemDefaultUILanguage though it sounds like it maybe ought to -- but we'll set that aside for the moment. :-)

    Anyway, people keep making this same mistake, which leads to real problems when services are not running with the settings that the expect even if customers are informed about how to apply changes to default/system accounts and they properly follow the steps to do this.

    All I can do is just yell at the top of my lungs here so that future Google searches can pick up on it, using the words:

    The default user locale of the SYSTEM account is not the default system locale.
    The default user locale of the SYSTEM account is not the default system locale.
    The default user locale of the SYSTEM account is not the default system locale.
    The default user locale of the SYSTEM account is not the default system locale.

    And just hope that people are listening? :-)

    Plus maybe whisper loudly in some ears at work....


    This post brought to you by δ (U+03b4, a.k.a. GREEK SMALL LETTER DELTA)

  • Sorting it all Out

    If you can find an unsigned copy, it's worth an absolute fortune

    • 2 Comments

    I was pondering the other day.

    You see, Ed Ye is leaving our team, and heading back to family and homeland in China.

    We have worked together many times over the last many years so I'll definitely miss him being in an office not far from mine (he has always been on the other side of the building but the distance didn't scare me off!).

    Anyway, as a part of going back to China he is giving away a lot of books that he really can't ship back with him.

    So, like many others I am scrounging through the pikes of books outside his office.

    I grabbed a copy of Helen Custer's Inside Windows NT (mine looks a lot more beat up than his, despite the fact that his is older and I haven't opened mine in years)

    And a copy of Jeffrey Richter's Advanced Windows (his is the same age as mine but definitely in better shape, mine was falling apart!).

    Then suddenly I stopped.

    The copy of my book that he had me sign was there.

    The scene that came to my mind unbidden was from Notting Hill, one of my favorite movies for perhaps not entirely unfathomable reasons:

    {William returns to his desk. In the monitor we just glimpse, as does William, the book coming out of the trousers and put back on the shelves. The thief drifts out towards the door. Anna, who has observed all this, is looking at a blue book on the counter.}

    WILLIAM: Sorry about that...
    ANNA: No, that's fine.  I was going to steal one myself but now I've changed my mind.  Signed by the author, I see.
    WILLIAM: Yes, we couldn't stop him.  If you can find an unsigned copy, it's worth an absolute fortune.

    {Anna smiles}

    I look at the words I wrote and I wonder what I will do with the book that has a "To Edward" inscription in it. Maybe I should have just left it for someone else to pick up but I just felt compelled to figure out something to do with it.

    Now in theory I could just sell it somewhere so that some worthy soul who had been looking for it can have a copy.

    Obviously for a discount.

    Though it's in good shape and still has the CD (I think about Ed and wonder how does that man not kick the crap out of his books as I have done so often in the past!), it just seems weird to sell it at its market value (whatever that is).

    Like Hugh Grant said, if you can find an unsigned copy, it's worth an absolute fortune -- but the signed ones (especially when they contain a note to someone else) seem worth a bit less to everyone other than the person they were intended for.

    You will definitely be missed, Ed. You drove a lot of work that I value to completion and taught me more than you may know about software globalization and other topics over the years....

     

    This blog brought to you by < (U+003c, aka LESS-THAN SIGN)

Page 1 of 3 (45 items) 123