Blog - Title

August, 2008

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    The crackpots and those women

    • 1 Comments

    The title of this blog is an allusion to a first season episode of The West Wing.

    The crackpots reference is to the record amount of spam aimed at the Blog today, none of which was successfully posted.

    You're welcome! :-)

    So I was watching this episode of The West Wing yesterday. I'm not sure exactly why, I was just reorganizing some video and accidentally opened it. I decided I may as well take it as a sign of some sort and watch it, on high speed.

    Now all of this happened on Saturday.

    After lunches earlier in the week with friend and lead program manager/most likely to be identified as the "group C.J." Cathy Wissink, and friend and software developer/long path princess Kim Hamilton.

    Not to mention random emails from program manager and Access miracle worker Viki Selca and development lead and SQL Reporting Services miracle worker Nico Cristache.

    And especially after car shopping and lunch with friend and always smiling program manager turned product manager Maryam Gholami, and dinner with friend and kickass trainer/consultant authoring the ASP.NET certification exam Rachel Appel (at a dinner where among other things we talked about friend and amazing db upsizer/programming writer Mary Chipman, friend and consultant/SQL goddess Kimberly L. Tripp, and friend and inspiration/general manager Julie Bennett.

    Not to mention plans to have lunch this upcoming week with friend and marketing manager/HR sanity checker Gretchen Ledgard.

    One bit of the episode in particular struck me:

    Jed Bartlet: Look at this, will you?
    Josh Lyman: At what sir?
    Jed Bartlet: I don’t know why, but nothing makes me feel quite so good as the sight of colleagues, enjoying each other outside work.
    Josh Lyman
    : So, what were you guys talking about?
    Jed Bartlet: We were talking about these women.
    Josh Lyman: Yeah?
    Leo McGarry: We can’t get over these women.
    Jed Bartlet: Look at C.J. She’s like a fifties movie star, so capable, so loving and energetic.
    Leo McGarry: Look at Mandy over there. Going punch for punch with Toby in a world that tells women to sit down and shut up. Mandy’s already won her battle with the President. The game’s over, but she’s not done. She wants Toby.
    Jed Bartlet: Mrs. Landingham. Did you guys know she lost two sons in Vietnam? What would make her want to serve her country is beyond me, but in 14 years, she’s not missed a day’s work, not one. {pause} There’s Cathy, Donna, and Margaret.
    Josh Lyman: Mr. President, there’s something that’s been bothering me for most of the day, and while I know that this is an inappropriate time...
    Leo McGarry: No, what’s on your mind, Josh?
    Josh Lyman: I serve at the pleasure of the President, and it’s a great privilege that I will never forget.  I can’t keep this. I think it’s a white flag of surrender. I want to be a comfort to my friends in tragedy. And I want to be able to celebrate with them in triumph. And for all the times in between, I just want to be able to look them in the eye. Leo, it’s not for me. I want to be with my friends, my family, and these women.

    And then I thought about these ten women, all of whom work either for Microsoft or primarily with Microsoft technologies.

    Ten women with really very little else in common other than two X chromosomes.

    Yet I find myself in awe of each of them and the work they do, all of which is also very different.

    And all very inspiring, really (and not even a complete list!).

    I just can't get over those women....

    There are also women out there like McCain's VP choice to remind me that some women are actually quite singularly unimpressive, but that's a whole nuther story, and one much less inspiring!


    This blog brought to you by(U+2640, aka FEMALE SIGN)

  • Sorting it all Out

    UCS-2 --> UTF-16, Part 0: The intro, sans content

    • 14 Comments

    Okay, this blog is going to serve as a warning that a whole bunch of blogs in this Blog are about to happen about a particular topic.

    The topic is one I have kind of talked about before.

    The difference in software between UCS-2 and UTF-16, and what is involved with migrating code that "covers" the former to code that covers the latter.

    Now the reason that this difference is interesting is just about everyone who is asking the questions (and there seem to be a lot of them, especially these last couple of weeks!), is handicapped by several issues, from incorrect assumptions about what works for them today to inaccurate picture of what they need to do to fix the problems to inappropriate plans for the scope of the work to plan.

    It's a mess, it really is. I'm actually even going go change some of the content of a training presentation that is coming up to cover this topic a bit more, too.

    Maybe I'll even mention this series! :-)

    Anyway, consider this the content-free introduction to this exciting series.

    If you are one of the people currently looking at this problem and are doing so with the unreserved joy one might feel for the removal of an impacted wisdom tooth, then this series is here especially for you! :-)

     

    All of the characters in Unicode are taking the long weekend off. I'll see if some of the non-characters stuck in town might want to sponsor....

  • Sorting it all Out

    The Unicode web site gets a facelift

    • 0 Comments

    Yes, that is right -- press release info along the lines of:

    The Unicode Consortium is redesigning its home page to be more accessible to both new and frequent visitors, and to make it easier to see recent events and public review items. Please try it out and provide any feedback on the new design on the public mailing list: unicode@unicode.org. All links previously on the home page are still accessible through a menu on the left, and the older home page remains accessible for those who prefer it.

    New home page:  http://www.unicode.org/

    Older home page:        http://www.unicode.org/index-classic.html

    A nice facelift for people who spend time on that site.

    I kind of like the new page:


    It definitely feels less cluttered to me than the old page:


    Which is nice.

    And knowing all of the same info is there and especially that the old page is not being dumped with no notice are two other great pieces of information....

    Note that there is no equivalent or analogous update here, I am way too set in my ways to be authoring that kind of change. :-)

    Enjoy!

     

    All of the characters in Unicode are on their way spmewhere for a vacation; they will be either pleased or disturbed to find out that their home has been renovated, once they are back in town!

  • Sorting it all Out

    A blog on getting a Blog on the blogs.msdn.com Blogapalooza

    • 4 Comments

    Idan asks (via the Contact link):

    Hello Michael,
    I am reading your blog and wondering, can I start a blog on MSDN blogs too? Who is the address for this request?
    Thank you, Idan.

    I looked at the information about setting up blogs, and ran across the following information:

    Who can create and/or author a blog on blogs.msdn.com or blogs.technet.com?

    §  At this time, only full time employees of Microsoft may create a blog. 

     

    So it looks like the only way is to get a job working for Microsoft. :-)

    For that kind of thing I of course highly recommend Jobsblog!

    After you get hired then you can create a Blog here and blog away....

     

    This blog brought to you by ? (U+003f, aka QUESTION MARK)

  • Sorting it all Out

    Facebook says Happy Birthday, ±13.5 hours

    • 1 Comments

    I have the 'Birthday Alert" application on Facebook.


    One of those small non-descript applications , it relies on the fact that most of your friends probably enter the month and day (if not the year) in their profile.

    I know of one friend who actually put the birthday about a week ahead of the actual day, which might be a way to tell one's friends from one's friends! :-)

    You may notice that there are 175,909 active users.

    It was not developed by Facebook.

    I point that out so that no one thinks I am picking on Facebook here, specifically.

    Now another interesting number is the number of fans -- 3,288. Clearly, not everyone who actively uses it wanted to call themselves a fan....

    Though the notion of fandom on Facebook is kind of tenuous, it gets back to that whole oversimplified world of facebook.

    You know, where the only way to get certain kinds of information is to become a fan. Which leads to weird language constructs like

    and so forth.

    But I already talked about that; in this case there is not much they could add feature-wise to this one, so perhaps that is why so few people are fans.

    I myself am not a fan at the moment, hough mabe I should be.

    Just in case the developer (Jeff Piper) fixes the issue I mention here today? :-)

    The problem came up in the most interesting of ways for me. Here was the text from a few minutes before I started writing:

    Now of these two, Remi is in the Netherlands and Omi is in Bangladesh. I realized that this application had a time zone issue, so this afternoon I ignored the warning that Omi's birthday was tomorrow, andmentioned to him:

    And Omi, who I have mentioned before if your recall, quickly agreed:

    So what is the flaw, exactly?

    Well, people have birthdays. and they know when those "anniversaries of the eviction from their moms" are.

    But Facebook (or more specifically this one application), which has people from all over the world, is treating those dates as absolutes, even though it delivers the news to people who might be looking at the day very differently.

    Perhaps the behavior in Jeff's application is correct.

    But I have to admit I somehow feel that my birthday wishes were better received when Omi was celebrating his birthday in Bangladesh than when h would have been celebrsting it had he been visiting here in Redmond, WA, USA.

    It is a nice sentiment either day, to be sure.

    But wouldn't this application serve its users better if did not have the potential to be so far off in its warnings?

    As a by-the-way, Remi is climbing Everest soon, so he probably isn't focusing on birthday stuff.

    Now of course a solution here could be very complicated.

    Since the idea of the current location of the user is really not currently tracked by Facebook -- the location specified is not meant to be changed when one travels, and additionally I have at least one friend who puts his location as a place he likes despite the fact that he was born somewhere else and spends most of his time half a world away. Now perhaps my friends are outliers who are weird, but combined with the fact that you are not allowed to change our location more than a certain number of times within an amount of time suggests that frequent travelers are not expected to change location networks.

    Assuming Jeff would not want to add such a notion, sticking with the location is perhaps best.

    But with that limitation in mind, this application could be paying attention to the 12+ hour differences in the notions of date when comparing the location/time zone and date of birth of the one having the birthday and the timezone of the person reading the Birthday Alert.

    In fact, the best design of the application (in light of available information and the cost of asking for more or depending on other applications) might make a fascinating "PM/design/architecture" interview question, and the algorithm one might use to determine when to send the alert and how to describe it in text might make a fascinating dev/architecture interview question!

    Perhaps I should become of a fan of this application. Just in case Jeff decides to fix this particular limitation. :-)

    Then next the Events application (which I think does come from Facebook!) could solve the problem as well, since they have it -- leading to stuff like this:

    I don't make it to London nearly often enough, but being warned of this Saturday 3am event every week when it is Saturday 3am here in The Pacific Northwest is amusing enough that I keep myself on the "might attend" list every week just to get a smile when I think about time zone bugs in applications used all over the world.

    Add the idea of virtual events where all the attendees would be expected to "attend" at the same time, and this might become a great interview question!

    Anyone want to spec/develop a Virtual Birthday Party Facebook application for a take-home interview? :-)

    I am not interviewing anyone at the moment so please don't do this if you expect a definite job out of it. Though anyone who does it will definitely get their name forwarded on! :-)


    This blog brought to you by ± (U+00b1, aka PLUS-MINUS SIGN)

  • Sorting it all Out

    A technology is worth: $0; A sample showing how to use it: $0; A debug-able sample: Priceless!

    • 4 Comments

    Any time someone from Microsoft talks about some exciting technology that is easy to use, there is often a good faith basis for you, the customer, to assume they might be blowing smoke up your ass.

    In fact, in most cases, you have a built-in affirmative defense you can use to defend yourself if they call bullshit on your claim of shenaningans.

    That defense is based on the simple fact that they usually don't include actual samples!

    If the technology is so easy that no one has time to put together a good sample where people can see the technology and understand it well enough to apply it, then it is obviously not so easy, and the claims to the contrary are from people who are so busy talking about how easy it is to use that they probably have never actually used it.

    Examples of this phenomenon that i have mentioned previously can be seen in TSF and Uniscribe, two technologies that if you ask me are harder than brain surgery.

    And I can state that with some authority, because I have witnessed several brain surgeries in a previous career, and all of them but one (a transsphenoidal resection of a pituitary tumor) were less complicated than full implementations supporting the features of either the Text Services Framework or Uniscribe. :-)

    Now previously one could have put MUI in that same category, since although it had all of that cool documentation I mentioned, it didn't have a sample.

    But then they added one. :-)

    You can see it described extensively here under the article entitled MUI Application Sample.

    And you can find it in the samples you get with the Vista and Windows 2008 SDK.

    And it puts the files right on your machine....


    Though not all is perfect if you use the project and solution files to build them through Visual Studio.

    Well, if you want to run and debug the project.

    To fix the problems, just right-click on the EN-US project node:

    Choose the Properties node.

    Yes, that node should have a ... after it since it launches a dialog, but I am not the UI police!

    Look under the Debugging item under Configuration Properties, like so:

    then you just have to change the Command option to

    $(SolutionDir)%(OutDir)\$(TargetFileName)

    and the Working Directiory option to

    $(SolutionDir)%(OutDir)\

    like so:

    And then the sample should b able to be debugged from within Visual Studio. :-)

    Samples go a long way to proving the ease of a technology.

    But debug-able samples? Priceless!

     

    This blog brought to you by 𐑧 (U+10467, aka SHAVIAN LETTER EGG)

  • Sorting it all Out

    Collation backstory?

    • 2 Comments

    Clearing out some of the mailed-to-me questions that have come in through the Contact link....

    Peter O. asks:

    Hi Michael,
    I read your stuff ... and learn lots.
    I have been wanting to learn background stuff on collations  as a general subject.

    Can you point me to a succinct but clear book?
    Thanks.
    P

    I don't actually know of any good books on the subject, in fact I know lots of good books that hardly mention it at all (ref: Some sort of order to collation) despite the fact that there are some linguistic elements involved (ref: Collation can actually be linguistic).

    Books don't tend to cover a lot of the stuff that I think is really cool here, linguistically -- like the stuff I mention in Traditional versus modern sorts. I have even spoken at conferences where examples like those ande like the Turkish Iİiı or the way Lithuanian sorts Y after I or what a North Korean sort would look like there could be one on Windows manged to fuel the imaginations of people attending the presentation.

    At one point I was going to co-write a paper about it, I even still have the notes from a meeting we had about the research plan for the paper.I even decided to get over an older incident involving a publishing situation where a paper of mine as rejected and decided it would probably be fun. But everyone got really busy and we're all doing other stuff so we'll probably never get back to it. And though my would-have-been cowriter has the academic credentials to possibly give such a paper credibility, I don't. So I doubt I'd try to take it on myself.

    But a whole book?

    I'd buy if I knew of one; if I thought I had enough information for it and i was qualified and that anyone would buy it I'd probably write one.

    But I don't.

    And I don't.

    And I'm not.

    And they wouldn't.

    So I won't....

    Though like I said if there were such a book, I'd buy it. I have purchased numerous children's alphabet books from different languages (mainly during the time that article was being discussed but some before that even), though those probably don't count. :-)


    This blog brought to you by(U+1f00, aka GREEK SMALL LETTER ALPHA WITH PSILI)

  • Sorting it all Out

    Keyboard Layouts, everywhere!

    • 4 Comments

    I have been pointing to the website that has Windows keyboard layouts:

    http://www.microsoft.com/globaldev/reference/keyboards.mspx

    on it for some time.

    Now this is a great site, you may have seen it before:

    http://www.trigeminal.com/images/KeyboardLayoutInfo01.png

    Though it is not perfect.

    One big problem is has is actually described right there on the page.

    Do you see it?

    Here, I'll emphasize it:

    http://www.trigeminal.com/images/KeyboardLayoutInfo02.png

    Now you can see I'm running in FireFox here. That is kind of all I do now.

    Staring after a bizarre incident with an IE plug-in that hung IE7 dozens of times a day.

    I decided to live with it and be a good little serf and only gave up after the first time a 75% done blog was basically lost.

    I uninstalled IE7 and put in FireFox right after.

    Mostly a great experience, though the experience with Windows keyboard layouts was really less than ideal.

    Here, I'll show you:

    http://www.trigeminal.com/images/KeyboardLayoutInfo03.png

    So would you class this as an usability problem or an accessibility problem? :-)

    We found a web developer to fix it up but that didn't work out, so finally we decided to just do it all ourselves.

    After a bit of work we migrated everything to the GoGlobal site I have mentioned a few times before today until we were all set, and now it is live at

    http://msdn.microsoft.com/goglobal/en-us/bb964651.aspx

    Exciting?

    You can even shrink it way down to:

    http://msdn.microsoft.com/bb964651.aspx

    if you want. and now you get to this exciting new site:

    http://www.trigeminal.com/images/KeyboardLayoutInfo04.png

    and that site works everywhere.

    In FireFox:

    http://www.trigeminal.com/images/KeyboardLayoutInfo05.png

    and in Safari:

    http://www.trigeminal.com/images/KeyboardLayoutInfo06.png

    and in Opera:

    http://www.trigeminal.com/images/KeyboardLayoutInfo07.png

    and of course in Internet Explorer:

    http://www.trigeminal.com/images/KeyboardLayoutInfo08.png

    All of the bugs that had been plaguing these various browsers previously, including:

    • Empty display devoid of any discernable keyboard layout
    • Non-functional shift keys
    • No ToolTips (which would contain character names and dead keys)

    are all fixed now on this exciting new page on GoGlobal!

    Of course now I can once again use the page in my default browser and get good results.

    There is still one more thing to do and we'll be taking care of that soon.

    But in the meantime, enjoy this new page on GoGlobal, today. :-)

     

    No sponsors needed for this blog; I did this one pro bono....

  • Sorting it all Out

    Uniqueness of the Name and Description...

    • 0 Comments

    Jason Timms, in response to Not the coolest, in a comment, asked:

    I do have a question.

    How can I reuse a name / description of a keyboard layout that I have already built?  It seems that the name is copied in several places in the registry and won't allow me to reuse it.

    Can I safely delete one of these entries and rebuild it?

    Thanks again.

    Just so we all know what we are talking about, we'll look at the dialog that shows up when you select Project|Properties... off the main window and see this dialog pop up:

    The Name is actually the filename for the DLL, it will be used to create the file that MSKLC builds.

    The Description is the string that will be used as the layout name in the MSKLC-created setup and in the Text Services and Input Languages dialog in Windows.

    Now the check that is done here is looking to see if either the filename exists on the machine or the keyboard layout name exists in the user interface as a selectable or selected layout.

    Which points to the answer for how to build a setup that is based on a name and/or description that MSKLC complains already exists, of course.

    Uninstall the layout! :-)

    I remember why it was designed this way -- to avoid the confusion of trying to install the layout that was just built and being unable to, and especially avoiding the problem of multiple versions of a lsyout lying aroun and no clear easy sense of which is which. These problems came out as various bugs were reported over the time that the first version of MSKLC was being tested.

    Thinking about it now, there is I think a flaw or two to this particular bit of design, flaws that might be worth some thought in a future version:

    • The upgrade strategy could be embraced a bit -- allowing the build but informing the author of what was going on, perhaps even upgrading the existing installed layout?
    • The separation of what is installed vs. what is built on a machine could be considered -- since they are often two different things, and assuming otherwise can inconvenience an author.

    Just thinking aloud here, but it seems like MSKLC could do better here, maybe?


    This post is sponsored by every freaking character in Unicode 5.1

  • Sorting it all Out

    Making SQL Server operations slower (without explicitly trying)

    • 0 Comments

    So I was chatting with Kim and Paul after that .NET DA meeting I mentioned the other day. The one where Kim kind of laid out the way that SQL Server did its searches, whether unindexed, indexed via a clustered key, and indexed without one.

    Probably the best description of it I have ever heard, by the way. Kim really is a SQL goddess!

    After her talk about the internals followed by Paul's talk about index fragmentation, they had me thinking about an additional piece of information that might be of interest.

    I mentioned it in terms of how it can be a problem that can come up when you add multiple language-based indexes. Like you get when you use that technique I mentioned in Making SQL Server index usage a bit more deterministic.

    Basically you have the interesting case where some collations are basically identical.

    Like if you are dealing with Unicode columns and you use two collations that are only different due to the fact that they support different code pages. Like Latin1_General_CS_AS vs. Arabic_CS_AS vs. Greek_CS_AS vs. Hebrew_CS_AS vs. Russian_CS_AS, for example.

    Or one like Bosnian_Cyrillic_100_CI_AS vs. Bosnian_Latin_100_CI_AS vs. Croatian_100_CI_AS vs. Serbian_Cyrillic_100_CI_AS vs. Serbian_Latin_100_CI_AS (some of these are identical but split out for political reasons proving that in SQL Server 2008 the have learned the lession I mentioned previously and others are the same because there was no good reason not to include the identical data in two different sorts where the users would reasonably expect the data to sort for them properly in either script, another lesson that they have "learned" well by picking up the new data).

    The list goes on.

    If you create multiple indexes on a column in order to assure better international user support, this is a good thing.

    But if you go too far and create indexes that literally duplicate the same information as previously created indexes then all you are doing is taking up space (the size of the extra index) and hurting performance (the cost of index maintenance is 100% parasitic if the extra index is extraneous).

    I asked someone on the SQL Server development team about this and he confirmed what I noticed through experimentation. Though he was unclear on how common the scenario was (he technically has a point since the technique is really only well documented here at SiaO, though I am philosophically opposed to ever relying on poor documentation as a justification for a bug!).

    Now in regard to size luckily indexes are not huge since SQL Server does not use NLS-type sort keys for their indexes -- they use B-trees created via CompareString-type results.

    But the size hit is greater than zero and if one has a mulei-million row table keeping two B-trees around that return the same results is hardly in anyone's best interests.

    Plus the hit of adding entries when one has filled the level they are at in the B-tree and have to split the entry, but having to do so twice? A person is taking one of the really bad sides of index fragmentation and doubling it ir worse, for no reason!

    Note that there is no warning for when this happens -- the new index is cheerfully created (to the extent that SQL Server is cheerful about such things!).

    And although one could through research work to find out which collations give identical results for Unicode columns, it is not easy to find, and the information is really not exposed in any way to query it (other than asking me or something!).

    The problem is worse for Exchange and the Jet Blue engine (which may beg the question of why I slant this blog so much toward SQL Server, but I think that these kinds of scenarios are much worse conceptually in SQL Server given how huge the optimization information is there both inside and outside of Microsoft -- people just seem to care a lot about database optimization in SQL Server).

    But getting back to ESENT for a moment, the fact is that an e.g. JetCreateIndex2 call will cheerfully succeed for English, German, and Dutch, no matter how long each one takes -- and it will actually also take the huge space hit, far beyond the hit SQL Server does. Since it has no optimization to partially combine identical sorts -- and those three truly are identical, as I mentioned before).

    I suppose in the ideal world both database engines would have -- either through SQL DDL or function call -- two things:

    1. a way to determine if two indexes would be expected to return identical results, and
    2. a way to "modify" an index that would either be a) a drop and recreate or b) a change of the existing index metadata, depending on which was required.

    Since

    • none of these products do this work and
    • neither Windows nor .NET provide that kind of information

    there really is no good way for a developer to take advantage of this potentially huge size/performance optimization.

    Well, there is the whole "just ask Michael Kaplan" thing but that hardly scales. :-)

    So, is this important?

    Well, that kind of depends.

    If one is taking the time to be interested in index architecture and index fragmentation in general, and one cares bout multilingual applications, then I'd argue that yes, it can be quite important.

    Though I admit that YMMV (your mileage may vary).

    In the long run, adding the feature to query about or possibly even detect such cases is probably the best strategy for the various technologies and products. This would translate to features in NLS for the sake of clients like ESENT, and features in SQL Server.


    This blog brought to you by(U+1e43, aka LATIN SMALL LETTER M WITH DOT BELOW)

  • Sorting it all Out

    The sheer Shear of it all

    • 2 Comments

    Nothing much technical, or interesting, or useful -- just the blather, the whole blather, and nothing but the blather...

    It was a Twitter tweet, a Facebook status and also a Windows Live Messenger quote:

    I cannot be the only one in the world who was disappointed to find out that Shear Genius has nothing to do with Jules Shear.

    It seemed kind of innocuous to me, just a random thought that occurred to me.

    A memory from a while back when I first had the realization.

    When I tried to watch a TV program.

    Yet in the time after the "quote" went up, a surprising number of people contacted me -- people who had never heard of either reference!

    For the record....

    Shear Genius:

    Shear Genius is an American reality television series on the Bravo network that focuses on hair styling. Contestants engage in weekly elimination competitions until a winner is determined. The show is hosted by Jaclyn Smith.

    Shear Genius uses a combination of rules from Project Runway and Top Chef, two other Bravo network shows.

    There are two challenges in each episode. The first challenge, the Shortcut Challenge, is usually not for elimination, but in Season 1 Episode 3 one competitor was cut after the Shortcut Challenge. The Shortcut Challenge ranks the contestants based on a judging factor, usually technical skills. The winning contestant(s) may receive some benefit in the second challenge.

    The second challenge is the Elimination Challenge, which has each contestant style the hair on a real model or client given certain requirements or goals. After completing the hair, the model is dressed appropriately and a runway show, similar to Project Runway, is held for the four judges. After the show, the judges may ask questions of the contestants about their styling choices. The judges then confer among themselves and decide on the top and bottom styles. The top stylists are credited and a single winning stylist is selected, with the phrase "Your work is Shear Genius"; a picture of their style is displayed on the Allure Wall of Fame for the remainder of the competition. The bottom stylists are then identified, and the worst stylist is sent home with the show's tagline "This is your final cut."

    Both challenges were timed, and if additional materials were necessary, the contestants are given a limited budget for those supplies.

    Jules Shear:

    Jules Shear is an American singer and songwriter born in Pittsburgh in 1952. Although he has had only one minor hit as a performer ("Steady", which reached number 57 on the US charts in 1985), he has recorded almost twenty albums to date. He made his first appearance on vinyl with The Funky Kings; he also led the critically-acclaimed but commercially-unsuccessful pop group, Jules and the Polar Bears, along with later groups The Reckless Sleepers and Raisins in the Sun. He also conceived (and hosted the first 13 episodes of) the MTV series Unplugged.

    His songs have been more commercially successful in the hands of other artists, notably Cyndi Lauper, whose recording of "All Through the Night" reached number 5 on the Billboard Hot 100 in 1984, and The Bangles, whose recording of "If She Knew What She Wants" reached number 29 in 1986. Singer/songwriter Iain Matthews (still using the spelling "Ian" for his first name at the time) recorded an album of Shear's material, Walking A Changing Line: The Songs of Jules Shear, with synthesizer-dominated arrangements (and containing some previously unreleased songs by Shear), in 1988; Matthews had earlier recorded songs by Shear on other albums.

    Shear was the subject of a song by 'Til Tuesday, "J For Jules", after the end of his relationship with that band's singer, Aimee Mann. He also co-wrote the title track of that album, Everything's Different Now, with Matthew Sweet.

    Now I had never heard of the show, but I knew of Jules Shear from way back, and I always liked his songs -- the ones he sang, the ones he wrote, the ones he produced.

    So the first time I happened to notice a show titled Shear Genius my first thought was that if Lindsay Lohan and Gene Simmons and Denise Richards and Paris Hilton and Hulk Hogan and Kim Kardashian and all of these other folks could have a reality show then why not someone really talented who happened to be married to someone else really talented like Pal Shazar? :-)

    So I remember when I tuned in, and quickly -- very quickly given the format I discuss above -- found out what the show was really about.

    Note that I moved into the situation know the identity of the singer/songwriter but not knowing the TV show....

    Now the fact that so many people sent me messages asking what the hell the quote meant (with further questions making it clear that neither was known) means that both of these references were too obscure, and without the pragmatic knowledge of the identity of the man behind J For Jules the point of the line does kind of fall flat. The quote was a short line (short enough to fit in Twitter, which means it can't be that important since the best things in life happen after the 140th character) and thus there is no room to describe the backstory for the quote.

    That describing is something I honestly don't mind doing.

    All of this suggests that I much more naturally a blogger than a facebooker or a twitterer or an IMer, principally because I blather a lot! :-)

    The other suggestion was that I find myself in many arguably obscure references -- and by arguably I mean obscure to most though not obscure to some of the people that I know.

    And I wonder how much of that is just a self-conscious fascination with finding the valuables that are slightly harder to find rather than the ones so obvious that everyone finds them. And then discuss them, like a subconscious desire to bury myself in the allusions of the things that interest me and that I admire. Even though I lack the talent of a Pound or an Eliot, wanting to try to echo what they do in a small way, and express many of the things I love in ways that allow those who appreciate those things to enjoy the depth even as other bask in the shallower waters.

    And that makes me wonder, as I self-consciously describe the backstory because enough people failed to get the joke to make me doubt that it was identifiable enough. Like a reference from last October:

    Michael Kaplan: He is trying to prove one of those three lessons that Roger Fenwick learned while he was at Oxford, the one about verbiage....

    Ken Whistler: Ok, MichKa -- now you've descended into terminally obascure allusions. ;-)

    I'll talk about what I think all of that means some other day....


    This blog brought to you by 𐇽 (U+101fd, aka PHAISTOS DISC SIGN COMBINING OBLIQUE STROKE)

  • Sorting it all Out

    The Bidi Algorithm's own SEP Field

    • 4 Comments

     

    There are many nice things that I can (and sometimes do) say about Unicode Standard Annex #9 (Unicode Bidirectional Algorithm), which I will call for the rest of this blog the UBA in order to avoid the repetitive and tiresome nature of "Unicode Bidirectional Algorithm". I know thast it is not a pronoun, but saying all those nouns over and over again really does wear you down so whatever shortcut works. :-)

    Anyway, what was I talking about?

    Oh yeah, about how there are so many nice things that I can (and sometimes do) say about UBA.

    This blog is not about any of them.

    Instead, this blog is going to focus on two particular limitations that in my opinion make the UBA less useful in software.

    I am thinking mainly about Windows, but after listening to people who work on the Mac and in Linux I think this is really a platform agnostic set of issues.

    Now I know some people think the issues are with input, but really they aren't. I mean I mentioned in blogs like Mirroring and Keyboards are complicated but that isn't what makes this really hard for application developers, most of the time. And it isn't why applications have bad or inconsistent behavior, by and large.

    In fact, it is not the input itself that is to blame but the rendering -- so cursor movement and all that are interesting but most of it is okay often enough that people would probably not notice problems if other things weren't going on.

    Plus, those other items are kind of subject to some variability based on platform and expectations, so while recommendations are nice these are not the blocking issues.

    I am therefore going to be looking elsewhere.

    The two issues I am focusing on here are:

    • The influence of and lack of guidance about "higher level protocols", and
    • The inability to handle multilingual text by default
    Now these items are ones I started really jumping into with other blogs like Mixing it up with bidirectional text and The Bug(s) Spotted, aka Design flaws are worse than bugs and The mythical nature of bidirectional support, and where the wheels come off the wagon.

    The simple problem is best stated as:

    The Unicode Bidirectional Algorithm cannot handle text from both left-to-right and right-to-left languages together in the same line of text.

    That is it, right there.

    Sure the UBA has all of that hand-wavey text about "higher level protocols" but all theyr eally did was create their own SEP field.

    You know what an SEP is, right?

    It's a Douglas Adam thing, so I'll let him explain it:

    An SEP is something we can't see, or don't see, or our brain doesn't let us see, because we think that it's somebody else's problem.... The brain just edits it out, it's like a blind spot. If you look at it directly you won't see it unless you know precisely what it is. Your only hope is to catch it by surprise out of the corner of your eye.

    This basically also explains why Unicode hasn't dealt with the issue, since they rely "...on people's natural predisposition not to see anything they don't want to, weren't expecting, or can't explain..." and talk about higher level protocols as a way of saying that someone else has to deal with it.

    But I can look at things like this:

    and this:

    and I know that there are quite a few inadequate somebody elses out there.

    Even my Mac runs into those same problems. Even when the text is plain:

    http://www.trigeminal.com/images/TextEditBidi.png

    The section in the UBA about Higher Level Protocols show how much clients are left on their own to figure stuff out:

    4.3 Higher-Level Protocols

    The following clauses are the only permissible ways for systems to apply higher-level protocols to the ordering of bidirectional text. Some of the clauses apply to segments of structured text. This refers to the situation where text is interpreted as being structured, whether with explicit markup such as XML or HTML, or internally structured such as in a word processor or spreadsheet. In such a case, a segment is span of text that is distinguished in some way by the structure. 

    HL1.

    Override P3, and set the paragraph embedding level explicitly.

    • A higher-level protocol may set the paragraph level explicitly and ignore P3. This can be done on the basis of the context, such as on a table cell, paragraph, document, or system level.
    HL2. Override W2, and set EN or AN explicitly.
    • A higher-level process may reset characters of type EN to AN, or vice versa, and ignore W2. For example, style sheet or markup information can be used within a span of text to override the setting of EN text to be always be AN, or vice versa.
    HL3. Emulate directional overrides or embedding codes.
    • A higher-level protocol can impose a directional override or embedding on a segment of structured text. The behavior must always be defined by reference to what would happen if the equivalent explicit codes as defined in the algorithm were inserted into the text. For example, a style sheet or markup can set the embedding level on a span of text.
    HL4. Apply the Bidirectional Algorithm to segments.
    • The Bidirectional Algorithm can be applied independently to one or more segments of structured text. For example, when displaying a document consisting of textual data and visible markup in an editor, a higher-level process can handle syntactic elements in the markup separately from the textual data.
    HL5. Provide artificial context.
    • Text can be processed by the Bidirectional Algorithm as if it were preceded by a character of a given type and/or followed by a character of a given type. This allows a piece of text that is extracted from a longer sequence of text to behave as it did in the larger context.
    HL6. Additional mirroring.
    • Characters with a resolved directionality of R that do not have the Bidi_Mirrored property can also be depicted by a mirrored glyph in specialized contexts. Such contexts include, but are not limited to, historic scripts and associated punctuation, private-use characters, and characters in mathematical expressions. (See Section 6, Mirroring.)

    Clauses HL1 and HL3 are not logically necessary; they are covered by applications of clauses HL4 and HL5. However, they are included for clarity because they are more common operations.

    As an example of the application of HL4, suppose an XML document contains the following fragment. (Note: This is a simplified example for illustration: element names, attribute names, and attribute values could all be involved.)

    ARABICenglishARABIC<e1 type='ab'>ARABICenglish<e2 type='cd'>english

    This can be analyzed as being five different segments:

    1. ARABICenglishARABIC
    2. <e1 type='ab'>
    3. ARABICenglish
    4. <e2 type='cd'>
    5. english

    To make the XML file readable as source text, the display in an editor could order these elements all in a uniform direction (for example, all left-to-right) and apply the Bidirectional Algorithm to each field separately. It could also choose to order the element names, attribute names, and attribute values uniformly in the same direction (for example, all left-to-right). For final display, the markup could be ignored, allowing all of the text (segments a, c, and e) to be reordered together.

    When text using a higher-level protocol is to be converted to Unicode plain text, for consistent appearance formatting codes should be inserted to ensure that the order matches that of the higher-level protocol.

    This information is so helpful that implementers can't even have their text look wrong in a consistent way -- every implementation has their own mistakes.

    Even in plain text, when the whole higher level protocol is arguable.

    And yes you can solve all such cases with RLM and LRM and RLE and LRE and PDF, sure. But with no standard on how to apply these in plain text, or how to make the standard itself pass my own "smart as an 8-year old" test (something those eight-year olds can do in cases like the above and in harder cases like in The mythical nature of bidirectional support, and where the wheels come off the wagon).

    Certainly some cases are exceptional, but the default case is mixed language text is broken now.

    More importantly, the "islands of text of one language in a sea of another language" is also broken. For no good reason, really.

    Perhaps the organization that Microsoft and all of these other big companies pay ten times the price of an Optimus keyboard a year to needs to start doing a bit of higher level work here, rather than passing the buck to random protocols.

    Because it is clearly our problem (and everyone else's)....

    Which makes it theirs! :-)


    This blog brought to you by U+200e and U+200f (aka LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK)

  • Sorting it all Out

    In the name of God, St Michael, and St. George, I dub thee the SUBOptimus keyboard

    • 1 Comments

    A quick follow-up blog after Optimus: from science fiction to fiction to frustration to geek porn, in just 24 months...

    One of my regular readers, a colleage of mine, suggested someone at Microsoft who had one of these Optimus keyboards, he thought perhaps I'df be interested in taking a look and seeing one up close.

    Well, anything that excites Chris Pirillo as much as this piece of hardware did is probably worth a look, to say the least! :-)

    Turns out the guy who it was issued to was going to be working from home for a few days, but the computer was in his office. He'd talk to folks there and if I wanted to pop by and try it out, I was welcome to do so. :-)

    I scooted out to Red West and plugged in the exciting keyboard everyone was talking about....

    Well, it worked, as a keyboard.

    The first thing I did was switch layouts in windows with the Language Bar.

    No effect on the keyboard.

    I decided to look at the info in the box it came with, and it pointed me to the Configurator, online.

    Once I installed that:

    http://www.trigeminal.com/images/optimusconfigurator01.png

    I found I could switch the layout.

    All kinds of ones I tried out that I had installed, from Georgian:

    http://www.trigeminal.com/images/OptimusGeorgian01.jpg

    to Georgian again:

    http://www.trigeminal.com/images/OptimusGeorgian02.jpg

    to Arabic:

    http://www.trigeminal.com/images/OptimusArabic.jpg

    to Hebrew:

    http://www.trigeminal.com/images/OptimusHebrew.jpg

    But then the bad news....

    None of the IMEs I tried worked.

    And none of the Table Driven TIPs did, either (both the ones I created and the ones that were already on the system).

    Plus it didn't switch based on my selection in the Language Bar -- the Configurator merely gave the option to import and export the files to show the different faces, which I could then stick on the device if I wanted to.

    Now clearly this is easier than putting stickers on all the keys, or buying a whole new keyboard for each language.

    But for $1800 I'd really expect a lot more.

    A whole lot more.

    I may play with the Configurator a bit more and blog again.

    Perhaps I'll do a little comparison/contrast with what the OSK and the Tablet soft keyboard do.

    Or what I think Surface might be capable of doing if the right people are working on it.

    But I am really happy that I didn't spend $1800 on the Optimus (as I now plan to call it).

    When I can get wireless keyboards from Apple or Microsoft for $60-$70, the privilege of not needing to change keyboards so I can plus in thie wired one with a power cord, that isn't smart enough to change its faces when it is told to?

    Do you think it would be a moonlighting violation if I offered to write a service to do the lookup and switch work on the fly? At $1800 a pop and all the money they save by having such terrible customer service, they could probably afford to pay me pretty well to have the keyboard do the one thing it probably ought to....

    Never mind, if they pay as fast as they service customers, I'll have to leave the wages to someone in my will.

    This keyboard will henceforth be known to me as the SubOptimus, to me. I dub it thusly.

     

    This blog brought to you by O (U+004f, aka LATIN CAPITAL LETTER O)

  • Sorting it all Out

    What good is irony if it can't provide symmetry?

    • 2 Comments

    Long time readers may recall blogs from the past like Is Excel CSV misusing NLS functionality? and Excel to Led Zeppelin -- No 'in through the out door'.

    Yet another CSV weirdness that I thought might be of interest to SiaO readers came up the other day:

    Hello,

    I am trying to write some data that are Unicode into a CSV file, but I am running into some encoding issue, Excel cannot open the file.

    I am not using any transform, I am just reading some Unicode value from an XML file and write to a file by applying the  comma delimited file format.

    I was wondering if somebody would have some experience and would know how to create a Unicode CSV file in Vbscript.

    So far I can only save my file as ANSI.

    Thank you very much.

    Regular SiaO hero Paul Dempsey pointed out one suggestion for a workaround:

    A quick read on Scripting.FileSystemObject tells me I can do this:

    Dim fso, MyFile
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set MyFile = fso.CreateTextFile("file.csv", True, True)
    MyFile.WriteLine "v1Ⓐ,v2Ⓑ,v3Ⓒ"
    MyFile.Close

    And one of my favorite support engineers Malcolm Stewart (going back at least a decade!) pointed out the main issue that led to the original problem:

    Excel has a bug where it won't parse UNICODE CSV files that are comma delimited, though they will parse TAB delimited files. We have filed a bug with them on this before, but they have rejected the fix. 

    Now on the one hand this bothers me, given the whole point of CSV is Comma Separated Values and all.

    But then again it seems like you can't swing a cat around here without running into problems with CSV from the point of view of applications that will respect under preferences.

    In such a case, the functionality limitation of a product that leads to a misuse of language such as "Unicode CSV values cannot be comma separated" is not only kind of amusing.

    Though it is obviously that.

    Perhaps even a bit ironic if one ignores past behavior -- certainly it would not be expected by most people to be the case.

    But in its own way, this kind of bug is being consistent with the existing pattern!

    After all, if we are

    • not going to respect a user's regional/language preferences (as described in Regional Options) and
    • not going to respect a user's language (which requires Unicode for many people)

    is there a compelling precedent to respect the use of language in general? :-)

     

    This blog brought to you by U (U+0055, aka LATIN CAPITAL LETTER U)

  • Sorting it all Out

    Its the End[UpdateResource] of the world we know it

    • 4 Comments

    It was late last week when Maksim asked a very interesting question via email to one of those large aliases at Microsoft:

    SUBJECT: EndUpdateResource failing after adding cirtain number of items with UpdateResource

    Hi,

    It appears that there is a bug (or undocumented behavior anyway) with BeginUpdate/Update/EndUpdateResource functions.

    When I am adding more than certain number of resources this way, EndUpdateResource returns with error ERROR_INVALID_DATA. The exact count of items is not always the same and varies depending on the length of resource names and resource types that I have.

    After running several experiments I have discovered that that the problem occurs according to following formula:

    (Cumulative Resource Names Length) + (Resources Count) * 25 + (Cumulative Resource Types Length) + (Resource Types Count) * 13 > 2040

    Can someone please say if there is a bug and if my assumed formula is correct? Or may be there is some other workaround apart from doing EndUpdateResource after adding each resource.

    My source code is below, the dll where I updated resources is a simple dll without any code:

    #include "stdafx.h"
    #include <string>
    #include <iostream>

    using namespace std;

    wstring MakeLongName(size_t length) {
          int randomNumber = rand();
          TCHAR buffer[65];
          ZeroMemory(buffer, 65);
          _itot_s(randomNumber, buffer, 65, 10);
          wstring randomPart = buffer;
          length -= randomPart.length();
          wstring result;
          result.append(length, 'X');
          result.append(randomPart);
          return result;
    }

    int _tmain(int argc, _TCHAR* argv[]) {
          CopyFile(L".\\testdll.dll", L".\\testdll1.dll", FALSE);

          HANDLE hLibrary = BeginUpdateResource(L".\\testdll1.dll", TRUE);
          if(hLibrary==NULL) {
                cout << "Failed to BeginUpdateResource. Error: " << GetLastError() << endl;
                return 1;
          }

          for(long i = 0; i < 10; i++) {
                BYTE data[100];
                ZeroMemory(data, 100);
                wstring longName = MakeLongName(230);

                if(! UpdateResource(hLibrary, L"Y", longName.c_str(), MAKELANGID(LANG_NEUTRAL, SUBLANG_NEUTRAL), data, 100)) {
                      cout << "Failed to UpdateResource. Error: " << GetLastError() << endl;
                      EndUpdateResource(hLibrary, TRUE);
                      return 1;
                }
          }

          if(! EndUpdateResource(hLibrary, FALSE) ) {
                cout << "Failed to EndUpdateResource. Error: " << GetLastError() << endl;
                return 1;
          }
          return 0;
    }

    I had not seen this cone up before, but this is a function I have found interesting since all the way back when we the resource updating functions in MSLU (described here).

    The answer to this particular riddle came from developer Paul:

    EndUpdateResource fails if it cannot extend the .rsrc section of your DLL. I’ve seen this happen if the .rsrc section isn’t the last section in the image – and that’s frequently the case (a few experiments show that .reloc usually follows .rsrc using the Microsoft linker). Annoyingly, LINK.EXE always seems to insert a .reloc section, even if you have a resource-only DLL. (The formula you discovered is an approximation for “the .rsrc section cannot be extended”.)

    Now as to whether this is a bug of by design....

    It really is by design.

    Twice.

    Now I am not going to dig into the format of PE files, since for that you can look at:

    to get the lowdown here.

    So for the first by design we'll look to the linker.

    When the Microsoft Linker (LINK.EXE) does its work it makes a lot of sense that it makes the .reloc section last rather than the .rsrc section, because the latter is more or less gunk that is alread compiled by the Microsoft Resource Compiler (RC.EXE) and which it does no t really need to modify -- it just has to align, while the former is the section that it arguably has to do some of it hardest work in to have all of the relocation entries.

    Matt also has a less cynical reason he mentions in that second article:

    Working backwards from the end of the executable, if there is a .debug section in the OBJs, it's placed last in the executable. In the absence of a .debug section, the linker tries to put the .reloc section last because, in most cases, the Win32 loader won't need to read the relocation information. Cutting down the amount of the executable that needs to be read decreases the load time.

    Then for the second by design we'll look to the EndUpdateResource function and its cousins (BeginUpdateResource and UpdateResource), though really that first function I mentioned is the real bad boy here.

    While it does a bunch of work inside the .rsrc section, it doesn't start mucking around a whole bunch with the rest of the PE file. Reordering sections just fall a bit outside of its current beat, if you know what I mean.

    Paul had some thoughts about workarounds:

    If you have control over how “testdll1.dll” is created, you might be able to figure out how to manipulate the PE sections so that .rsrc always goes last. In my code, I was able to start with a hand-crafted resource-only PE file which had only a .rsrc section.

    Matt's first article gives some info on removing the .reloc section:

    If you do decide to remove relocations, there are three ways to do it. The easiest is to specify the /FIXED switch on the linker command line. Alternatively, you can run the REBASE program with the -f option on your executable. REBASE comes with the Win32 SDK. The third way to remove relocations is the new RemoveRelocations function in the Windows NT 4.0 IMAGEHLP.DLL. My sample code below shows how to use RemoveRelocations.

    Though to be honest this is something I try to avoid, especially with /FIXED, because I have seen multiple sources that suggest this to be a bad idea for two reasons:

    • If the file has to be relocated then it simply won't load, even if it's a resource-only DLL, unless you load it via LoadLibraryEx with the LOAD_LIBRARY_AS_DATAFILE type flags;
    • On debug builds, it seems that sometimes the Microsoft Linker still adds a .reloc section, even if you pass /FIXED, something I have not seen documented.

    Though your mileage may vary.

    And of course someone could write a tool to simply do the reordering of these two sections in the binary; the principal thing to worry about (and the easiest bit to mess up) is not aligning things properly, but that isn't too hard, so it might be worth just grabbing the source from Matt's PEDUMP (used in the last two articles on the list above) and the code to remove the .reloc section from the second one to use as a start and then working to just write the whole file out with these two sections reordered.

    Now if someone were to decide to fix it -- to unmark the by design flag on it -- whose job would it be?

    On the whole I'd say the fix should be in the EndUpdateResource function, for several reasons:

    • If my conjecture about the linker's operations is true, there is no need to make its work more complicated here;
    • There are very good reasons to not formally document or tie down the rules of image layout produced by the linker -- something that fixing this issue would do;
    • The potential performance benefit to putting the .reloc which is often not needed at the end and the .rsrc which is usually needed not at the end just makes sense;
    • The only people who might care about the section order are the people who call the EndUpdateResource function, so changing the rules for how everyting is built when only a small number of people would need it would be less than ideal;
    • The limitation itself is clearly in the EndUpdateResource function, and there are real benefits to having bugs fuxed where they are instead of architecting around them.

    Of course now we get to the really unfortunate aspect of all of this.

    In Windows, there are some components with specific owners, and others that are really considered to be very shared, with no specific owner who would be responsible for daoing major updates.

    Many times that "no owner" status comes in code that has not required changes in a long time.

    Code of that sort often finds new owners via the "Chess move" theory of development -- i.e. "you touched it, you own it", but the resource updating functions (BeginUpdateResource, UpdateResource, and EndUpdateResource) have proven quite resilient to this, with people who modify it managing to be able to avoid becoming owners except within the scope of their own changes.

    So finding someone to volunteer to own this particular change could prove to be a challenge (especially since one can fall back on the whole by design thing!).

     

    This blog brought to you by(U+32ae, aka CIRCLED IDEOGRAPH RESOURCE)

Page 1 of 3 (45 items) 123