Blog - Title

June, 2009

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    UCS-2 to UTF-16, Part 11: Turning it up to Eleven!

    • 1 Comments

    Previous blogs in this series of blogs on this Blog:

    Back when I was in school oh those many years ago, I remember learning all kinds of rules about writing essays and position papers and really anything meant to convince people of a point of view.

    You know, all that "Step 1: Tell em what you're gonna tell 'em, Step 2: Tell 'em, Step 3: Tell 'em what you told 'em" and so on. Basically narrowing the world to the specific point you want to make, making the point, and then expanding that point to make its connection to the world clear. You get the point.

    More recently, as I look at academic papers and books from people with advanced degrees, it is clear that they do something slightly different a lot of the time.

    Rather than ending on a strong note that reinforces prior themes, they end with the things that are not yet explored, the things not fully done yet.

    I originally looked at this as a sign of weakness -- why end with your weakest or least impressively thought out arguments, with the items that are not there yet?

    But over time I have reconsidered this view; there is a certain strength in making it clear that there is more out there. Making it obvious that grownup problems can't always be wrapped up and delivered with a bow on them.

    And that is where this last part of the whole UCS-2 to UTF-16 series will try to go.

    Over many successive parts I have discussed or dare I say it proven that the people who think that UCS-2 to UTF-16 is a shorthand are 100% correct, and the people who think it os just about surrogate pairs are dead wrong.

    UCS-2 to UTF-16 is about moving from Unicode code units to what the user will think of as a CHARACTER.

    And about how to plan out software behavior in a way that let's ordinary users who wouldn't know Unicode from UNICEF see the behavior they expect based on what they know of their actual language, rather than their understanding of the limitations in computers over the last several decades.

    But as I went through the series, I probably spent as much time pointing out failures in software to support this notion as I did successes. In software all over the place.

    When you get down to it, we are still quite a long way away from this ideal tht the average user would find most empowering; we still rely on people to conform to the limitations of our machines rather than causing those machines to conform to the understandings of the users.

    So in a way I have already been talking about the places that Microsoft and all of the other software companies are weak and unfinished and not fully implemented or sometimes even understood!

    So here, in Part 11 of the series, I will take this catalog of misunderstandings/bugs/failures and turn it up to eleven (to use the Spinal Tap expression) and go even further....

    Beyond characters there are if course words, and phrases, and clauses, and sentences, and paragraphs, and pages.

    And as I pointed out in blogs like The Bidi Algorithm's own SEP Field it is clear that once you get into issues more complicated than characters, we tend toward sucking just as badly.

    Or maybe worse -- in the case of characters it is failure to live up to Unicode's definition; in the case of more complex operations like bidirectional text a 100% conformant implementation will fall way short of typical native user expectations in even many of the most simple cases. We claim we are conformant, they say it requires higher level protocols to support reality, and thus we prove ourselves to be unable to reach the lofty goals of higher protocols.

    We're too busy stuck in muck because we're following the standard and the standard considers it to be too much to handle.

    How do we get past this and break the stalemate, exactly?

    Unicode doesn't seem to be interested -- they regulaarly fiddle with UAx #9 to fix bizarre corner cases while never even attempting to tackle the easy cases like the ones I've ben railing about all this time. The ones even a child can understand like

    C:\NAME ‎(BIG)‎\שם ‏(גדול)‏\NAME ‎(BIG)‎\שם ‏(גדול)‏

    and such.

    As I mentioned in a prior blog:

    No one wants to do too much beyond Unicode even though plain Unicode alone (without making use of higher level protocols to place control characters) is insufficient for handling these cases....

    Note that is also also one of the reasons RTL IDN is so complicated and looks so broken most of the time.

    It all amounts to A place where everyone blows, equally.

    So maybe Microsoft and the companies that claim to care about the end-to-end user experience should just choose to rise above this, to be high level protocols.

    Because claiming that we are done with our core support and can now add more advanced features, when we can't even handle characters and sentences is a little bit obnoxious of us, to say the least (especially if we aren't even trying to get better!).

    Now that this series is officially done, I'll maybe try in some future blogs and give some of my thoughts about what it might mean to be a higher level protocol....

  • Sorting it all Out

    Where do you want to go today^H^H^Hmorrow with MUI?

    • 2 Comments

    I have blogged about MUI (Multilingual User Interface) in Windows in the past.

    In particular (via blogs such as MUI is like your heartbeat; it's important and useful, whether you know about it or not) I have discussed one of the principal design limitations: that it is really architected for the sake of Windows and the user interface languages that it has installed:

    Now I myself have often railed about limitations in the MUI model on Windows that block people from working beyond what Windows does for itself, and the folks on the MUI team have gotten that feedback not just from unimportant people like me but from important people like ISVs and OEMs on behalf of IHVs and the IHVs themselves and from Office. And they are looking into how to address those concerns going forward.

    Now looking into such a thing requires more than just adding features, it requires a fuller understanding of what people are doing and trying to do out there, whether they are ISVs or OEMs or IHVs or whatever -- lots of those people, since individual requirements can vary and every set of requirements they have leads to a better chance of fully supporting a scenario.

    You know, for anybody who is building applications where multilingual UI support is a desired feature.

    To that end, the MUI team has been meeting with and talking to a lot of people, both inside and outside of Microsoft.

    And now, to cast and even wider net, they have put up a survey that people interested in their requirements helping to shape the architecture and design can fill out.

    You can find the survey here: Building International Software (Windows Live ID required).

    This is about understanding the many uses of the current design and and needs of future design to not only validate what is there today but to shape what could be there in the future -- so if you are interested in helping here then take a look at the survey!

    Questions or comments about it you can ask here, of course. :-)

  • Sorting it all Out

    EZ fwide[R]? It ain't all that. Roll it up and smoke it, you won't get very high....

    • 7 Comments

    Conformance to standards seems to be a pretty big deal these days in many different parts of Microsoft.

    Not all, mind you -- that is something I know about through both law of averages (there are a lot of groups at Microsoft!) and also some specific knowledge (I've chatted with a few of them now and again).

    But by and large if there is a standard related to the work people are doing at the company, then there is some degree of effort to conform. A statistically significaant trend, you might say.

    I was thinking about this other day when people were having kind of a conversation about the C runtime fwide function.

    The documentation is something like this:

    Run-Time Library Reference

    fwide
    Unimplemented.
    int fwide(
       FILE *stream,
       int mode;
    );

    Parameters
    stream
    Pointer to FILE structure (ignored).

    mode
    The new width of the stream: positive for wide character, negative for byte, zero to leave unchanged. (This value is ignored.)

    Return Value
    This function currently just returns mode.
    Remarks
    The current version of this function does not comply with the Standard. 

    Okay, it is being kind of upfront here on the whole conformance issue.

    The documentation hints that this function basically returns whatever it is passed in the mode parameter, so I'll be clear and say that is exactly what it is doing.

    Of course this makes the claim that the mode parameter is ignored kind of inaccurate -- it lives and dies by that parameter, but we'll let that slide.

    Let's look at the standard itself to see what conformance would mean, what it would look like. It is over in C99, a standard that Microsoft has taken around the dancefloor but not yet gotten fully busy with just yet:

    7.24.3.5 The fwide function

    int fwide(FILE *stream, int mode);

    The fwide function determines the orientation of the stream pointed to by stream. If mode is greater than zero, the function first attempts to make the stream wide oriented. If mode is less than zero, the function first attempts to make the stream byte oriented.Otherwise, mode is zero and the function does not alter the orientation of the stream.

    Returns
    The fwide function returns a value greater than zero if, after the call, the stream has wide orientation, a value less than zero if the stream has byte orientation, or zero if the stream has no orientation.

    This is rather vague -- either Microsoft's implementation is mostly right or the standard is saying this:

    • If mode > 0, try to make it wide (return 0 on failure);
    • If mode < 0, try to make it narrow (return 0 on failure)
    • If mode is 0, try to detect whether it is a wide stream or not and return the results (return 0 on failure).

    I could spend a little bit of time discussing how fundamentally useless this function would be even if this is the meaning and it were fully conformant; this seems like an important point. But we'll leave it alone for a moment and get back to this point in a bit.

    Now the fundamental issue here is that Microsoft's implementation of the file stream does not store any kind of attribute on it indicating its wideness or lack thereof. This is important as it means all the information about trying to change this attribute on a stream is not going to happen. In the words of Yoda:

    "There is do, and not do; there is no Try."

    In this case, Microsoft improves on Yoda a bit -- there is no "not do" either. :-)

    Now if you passed <0 or >0 then you wanted the function to try to do something, and it is returning that it succeeded. This does meet the letter of the law in regards to conformance but obviously violates the spirit since if you call the function once you might reasonably expect the call to impact what will happen later. And it won't.

    And if you passed 0, then you didn't want the function to change anything, you were just asking a question. The Microsoft explanation just answers the question with the same value that means "who knows?".

    It is probably just as well that the docs claim the function is unimplemented since by any sort of reasonable man standard it is probably not implemented.

    Would it be nice if you could write to/read from the stream using whatever functions you wanted (wide or narrow) and have it automatically do conversions to follow the behavior of fwide? Maybe. I tend to hate behavior that will silently but happily do a ton of conversion work that may not be required. But I can't claim that there aren't people who would find it useful.

    Irregardless, at this time it is not out there, so the results shouldn't be expected to conform.

    Now clearly conformance is, on the whole, a good thing. But one would be hard put to claim with authority that the lack of rush to support the standard for this one case would be likely to hurt anyone.

    Does anyone disagree?

    I mean, based on practical reasons, not lofty "it's the standard" type reasons, of course....

  • Sorting it all Out

    Those keys aren't going to be extended; they're dead!

    • 8 Comments

    Rimas Kudelis asked over in the Suggestion Box:

    Hi Michael,

    I was wondering if you know if there are any plans to extend the Windows keyboard driver in future? In particular, I'm interested about the limitations related to dead keys.

    Back in 2006 (in Only ONE WCHAR per dead key) you said that the dead key infrastructure is actually here only for legacy reasons. However, I'm pretty sure that you know how widely it's being used. One case where dead keys are very handy is adding stress marks to letters. For example, Lithuanian alphabet has 32 pairs of upper/lower letters, and that's fine, they all have distinct positions in Unicode. However, there also exist 34+34 possible combinations of these letters with the three stress marks used in Lithuanian, only 33 of them having distinct Unicode positions. Employing dead keys to enter them is a very intuitive solution. However, it doesn't work for 35 cases that have to be composed from the base letter and a combining accent mark...

    The answer is not going to inspire warm fuzzies, sorry. :-(

    The blog in question (Only ONE WCHAR per dead key) and the ones it links to (in particular Dead keys are not intuitive) clearly explain the architectural limitation, and the basis for the feature.

    And they explain that the only reason this feature exists at all is a legacy of typewriters not smart enough to handle not advancing on a letter when the accent is typed after the letter.

    I mean, is it really expected that people would WRITE the language this way? Would you write the accents and stress marks and then write the letter?

    Of course not!

    So, as I have said before many times, the way to do this is the way that even children in school learn to write the language, and that Unicode encodes scripts -- first type the base and then type the combining character or characters.

    It works well, it is well supported, and it gets the job done. For Vietnamese, and Lithuanian, and all of the other languages that need this functionality....

  • Sorting it all Out

    ทำไห้เผ็ดมาก ("Make it very spicy", with my crappy attempt at a Thai accent)

    • 8 Comments

    I love my Thai food very spicy.

    I don't have the same issues that Margaret Cho has with spicy food (ref: her Blog, in particular The Hotness), though I think that is because my body does not object to the food quite as strongly, or as loudly!

    Anyway, after being in Thailand many years ago and using the above phrase (it sounds kind of like tum hi pet mahk and is one of the few Thai phrases that stuck with me, albeit with my bad pronunciation)....

    I would go in to most Thai food restaurants in the US and since the phrase never worked even when I was with people who were Thai and ordered in Thai I assumed that the combination of my white face and bad pronunciation just made them play it safe. I assumed that either food as hot as one found in Thailand, or even on the same order of magnitude as hot, was simply unavailable to me due to the well meaning interference of strangers making me food for money.

    Four star, five star, six star, ten star -- more often than not this just mediated how many red pepper flakes they put in the food.

    And no offense to any red peppers that might be reading this, but they just don't have the punch.

    Recently I finally found out the magic words for when I am here (since even ทำไห้เผ็ดมาก wasn't cutting it), and I wish I had been smart enough to look with a Google^G^G^G^G^GBing search long ago).

    The phrase is Thai Spicy.

    You tell the waitress "I want it to be six stars, I want Thai Spicy," and then wait.

    Like a lover into sex games unsure whether she had been told the safe words, she will want to verify that you meant what you just said.

    You can throw in a "like I was in Thailand" to your affirmation, or just nod sagely.

    Either way, you can almost imagine them pulling out the other spice. The one treated the same way as the Novantrone that they used to infuse into me but not after being double gloved before they'd even handle the infusion bag, abiding by maximum sharps precautions.

    Perhaps I am imagining that part, but I usually can spot the server waiting to watch me take a bite so they can see if I really meant it....

    And at the Typhoon! station in the Microsoft Building 9 cafeteria, where they have occasionally piled the red pepper flakes on so thick that one lunchmare asked me is I wanted any chicken with my red pepper flakes , where they started putting them underneath the food instead of on top of it since it was making people unhappy to see it?

    There, they can't do Thai Spicy. When I ordered it the first time, they were so apologetic: "we don't have the spices here" they would tell me, unhappy enough that from then on I would order it "five stars, really five stars not Microsoft five stars, as close to Thai Spicy as you can get here" and they would smile and nod and do the best they can with enough red pepper flakes to choke someone not expecting 'em.

    At the Thai food restaurants that accept the Thai Spicy instruction, is it as hot as it was in Thailand?

    Who knows? That was years ago and miles away, and memories fade.

    I think it is on the same order of magnitude.

    The phrase doesn't work for Chinese food (which has never been as spicy for me, even in China or Taiwan or Hong Kong). I don't know what phrase to use here to get it spiciest, though.

    And the phrase doesn't work for Indian food (which has been even spicier in India but never even close in the US, no matter how I order it). I don't know that phrase, either. And that one probably would scare me a bit even if I did know it. I'll be cautious on that one if I find out what it is. :-)

    Too bad I can't just tell them to make it "Thai Spicy", since that  may be as hot as I can take my Indian food, too! :-)

     

    This blog brought to you by(U+0e17, aka THAI CHARACTER THO THAHAN)

  • Sorting it all Out

    No disassemble Number 5!

    • 0 Comments

    Very very offtopic!!!

    The question I got recently was simple enough:

    Michael, I googled iBot air travel and found your blog.

    So you actually flew with your iBot and they didn't dismantle it??

    My husband got one last summer, and has been driving back and forth to Mexico with it because we've heard that the airlines disconnect it and who knows what so that it's not operational when you arrive.

    What airlines did you use??

    I have had excellent luck with Alaska Airlines, using that sign I mentioned way back in From I SCOOT to IBOT, #1 of ??, with English on one side and Spanish on the other.

    I have never had any problem flying, and I have now made trips all over the place.

    Now I will tell you what I believe is the secret.

    I check the iBot PLANESIDE -- meaning I ride it up to the gate and check it right at the side of the plane. This is perhaps the secret to make sure no one tries to take it apart, sa folded up iBot does invite people the opportunity to take anything apart.

    On arrival, I occasionlly find people ignoring the sign (it is not them being stupid people -- hey never make this mistake on he outbound flight and it is basically the same people!) and trying to put the leg pieces back on the device but I tell them to let me do it and they stop.

    Now would I trust this to work when flying within India where they screwed up my Scooter? No. But inside the US with the leg pieces and the UCP removed and everything folded up, there is not a lot to go wrong -- and with planeside checkin you get the people with the least amount of time to mess with stuff and the most customer contact. I have now flown on 22 round trip flights since I got the iBot less than a year ago and have never seen them break anything on it or try to disassemble anything.

    Now I admit I have never yet crossed a border with it, and I imagine they may want to look closer at a border. But as a general principle if you are sitting in what is essentially a wheelchair, full disassembly is unlikely, even if they want closely inspect the chair and unzip the thing that zip (as TSA folks have also sometimes wanted to do). But disassembly of a wheelchair that has a laminated sign explaining the fact that it can't be unpacked? I have found  general lack of desire to break wheekchairs, and it is just as secure to inspect without disassembly anyway....

    Note that there are other bonuses to the planeside checkin, such as a better quality wheelchair and no need to tip the airport worker making less than minimum wage since he lives on tips. But that is just an added extra, the main thing is protection of the expensive medical device!

     

    This post brought to you by (U+267f, a.k.a. WHEELCHAIR SYMBOL) 

  • Sorting it all Out

    UCS-2 to UTF-16, Part 10: Variation[ Selector] on a theme...

    • 10 Comments

    Previous blogs in this series of blogs on this Blog:

    It has been a while since the last part of this series. Kind of coinciding with when I took a break here.

    Sorry about that! :-)

    This part kind of starts with a previous blog -- CharNext(ch) != ch+1, a lot of the time.

    No, wait. That isn't the one.

    It really came from a follow-up to that blog, namely We broke CharNext/CharPrev (or, bugs found through blogging?).

    Where I talked about the change of diacritics being thought of by GetStringTypeW as C3_DIACRITIC rather than C3_NONSPACING | C3_DIACRITIC.

    When I fixed this bug, I (wisely?) chose to make the check include C3_NONSPACING | C3_DIACRITIC rather than the minimal change of just C3_DIACRITIC.

    Because there is a whole class of character, pretty much added to Unicode at a time that made it new in Vista, that has the C3_NONSPACING classification.

    UNICODE VARIATION SELECTORS!!!

    These characters have the goal of changing the visible representation of the character preceding them, or (as the Unicode Display of Unsupported Characters FAQ states) be invisible, if not supported.

    Certainly either way it should never be treated independently of the character preceding it; any operation of selection or truncation must treat it like a part of the preceding character, as much as a low surrogate and its preceding high surrogate!

    Of course my GetStringTypeW discussion that I used to introduce the topic also points out the detection solution, one that would incidentally be used for a lot of the other cases I have discussed in the series.

    Of course this leads to another interesting case, which I will discuss next time....

    And maybe more on variation selectors and other related matters, some either time entirely.

     

    This blog brought to you by U+fe00, a Unicode Varation Selector.

  • Sorting it all Out

    When keeping things on a level Plane[ 1] doesn't work anymore

    • 2 Comments

    It has been over three years (in Every character has a story #4: U+feff (alternate title: UTF-8 is the BOM, dude!)) that I mentioned in an offhand manner about Notepad "Application is perhaps an overstatement; its just an uber-wrapper for a Win32 EDIT control."

    It never occurred to me that Chris Walker, the development owner of Notepad for over a decade, was a reader of my blog. He was. And he noticed. and commented.

    Oops.

    Now what I said was true only in the comparative sense of which technology is doing the drawing. But in addition to being wrong in the sense that Chris noticed (and has since forgiven me for!), I was also wrong in another sense, which I'll get to in a moment.

    By the way (I mention in a not entirely offhand manner), I'm actually on the Windows team.

    I checked in and reviewed code that will ship with Windows 7, and I officially signed off on a few features in Vista (I suppose I could have used the development equivalent of jury nullification and refused to sign off but it seemed kind of silly given the fact the things I owned kind of worked?). For Windows 7 I find myself in triage meetings pretty regularly, and in shiprooms as infrequently a I can get away with.

    Now you may wonder why I pointed this out.

    Well it had something to do with the mail I got the other day from someone who we will call Bob. Why? Because that is his name:

    Hi Michael

    More a topic for the Windows 7 guys rather than your blog. Using the RC Build.

    As an example, consider U+10901 PHOENICIAN LETTER BET introduced in Unicode 5.0. Displays in WordPad and Paint but not in Notepad. I used the Aegean font http://users.teilar.gr/~g1951d/download.html.

    Indeed, someone has broken Notepad! Guess this means the Edit control is down the tubes too.

    Furthermore I've been looking at Unicode 5.2 Egyptian Hieroglyphs and same problem in Notepad using the font I built with Fontlab when wrote the proposal with Michael Everson. In this case Paint and Wordpad appear broken too.

    Any idea how to report this problem? Pretty major incompatibility with Vista apart from being bad news for users of recent and upcoming versions of Unicode.

    Regards

    I work on Windows 7!

    Since this is actually about the ScriptString* functions in Uniscribe that do simpler higher level uniswcribe work and how Uniscribe deals with Plane 1 of Unicode, I am probably a good person to forward the note onto the people in Uniscribe who are (I trust they will pardon the expression I will use here) most culpable for the "pretty major incompatibility with Vista" as Bob put it.

    One of them can fill in the details here, but....

    Applications that rely on Uniscribe, which has quite a bit of knowledge about Unicode, are somewhat at the mercy of the fact that Uniscribe has some knowledgte about Plane 1 but a lot of targeted ignorance about other parts not being covered -- because then suddenly things that used to work by accident but never by specific deisgn will suddenly stop, once it starts paying attention to the plane.

    An occupational hazard for anything that "works by accident" I suppose....

    The workaround is simple enough at the micro level: ExtTextOutW with the ETO_IGNORELANGUAGE flag I have mentioned before. -- I even "renamed" the flag to ETO_STOPTREATINGMELIKEIAMSOCOMPLEXYOUMISERABLECONTROLFREAK for entertainment purposes, though in this case it would sound more reasonable, I think. :-)

    At the macro level, like Notepad, Paint, and Wordpad, this is not of much help, of course.

    Now sometimes (and I am not saying this time, but you never know?) there are specific things you can do in the font that will make things work better without turning things off -- once ranges get around to being covered in the OpenType docs....

    I almost forgot to explain the other place I was wrong about Notepad -- it was that "uber-wrapper for a Win32 EDIT control" bit. It is actually a wrapper around the Shell EDIT control, which is not exactly the same thing. Though it tries to be and most usually is. Perhaps I'll explain why another day....

     

    This post was specifically not sponsored by any Unicode character, for hopefully obvious reasons

  • Sorting it all Out

    Text can sometimes just look wrong when it seems like it shouldn't. Why?

    • 1 Comments

    The other day, Wayne Shu asked:

    Hello Michael:

       I have visited your blog, and know that you are an expert in Windows Uniscribe, here I have some questions about Uniscribe to ask you.

       now I am using Uniscribe to do a program to generate glyphs from a arbitrary unicode string.

       I use ScriptItemize, ScriptShape, ScriptPlace functions, but it does not work correctly for some font,
       for example the font "Arial Unicode MS",  the unicode string is "ཀྲུང་ཧྭ་མི་དམངས་སྤྱི་མཐུན་རྒྱལ་ཁབ།" (Tibetan  version of "People's Republic of China")

       ScriptShape always return USP_E_SCRIPT_NOT_IN_FONT for this string,
       but I have checked "Arial Unicode MS" using Character Map, "Arial Unicode MS" do support Tibetan characters.

       why ScriptShape return USP_E_SCRIPT_NOT_IN_FONT here?

      another question about function ExtTextOut.
      for the same font "Arial Unicode MS",  and string "ཀྲུང་ཧྭ་མི་དམངས་སྤྱི་མཐུན་རྒྱལ་ཁབ།", I have tried to use ExtTextOut to display it directly, ExtTextOut can display it correctly.

      even for some characters that "Arial Unicode MS" does not support, for example sinhala characters, ExtTextOut still can display these character correctly,

      why?  as I know ExtTextOut has indirectly invoke Uniscribe for displaying texts. but they behave so differently, does ExtTextOut do some font back internally?

    Thanks and forgive my poor english.

    --
    Best Regards.
    Wayne Shu

    Now when you stack his English up against my typos, I think he does pretty well, but that's just my opinion. :-)

    Anyway, it would be easy to dismiss the whole question with a quick reference to a previous blog of mine like Arial Unicode MS effectively [bites|sucks|blows].

    But in the case of Tibetan is by its nature more complicated, taking the relatively minor issue of a few characters in Bengali and taking it to an extreme.

    It is best to think of what Uniscribe does as a careful dance between the data inside the font and its own knowledge of Unicode generally and certain scripts within it specifically. This is not always true but generally is often quite true for anything requiring complex script processing (how true will vary with the script and is described within the documentation provided by Microsoft for that script).

    In the case of Tibetan, the need for a font with the correct supporting data is crucial.

    To some this may seem unfair -- this is not just different parts of Microsoft talking to itself; this the same parts of Microsoft (the Typography team) talking to itself! But in truth this is not the case, since (and this is a gross over-simplification that I might get into more specifically another day) a lot of the font "data" is the look of specific combinations that really amounts to the equivalent of substituting two or more separate Unicode code points with a particular grapheme that shapes better with surrounding text and often looks little like the original code points would by themselves. How could Uniscribe alone contain such data without knowledge of what is within the font? The split between shaping engine and font in such cases does make a lot of sense.

    And in this particular case (Arial Unicode MS and Tibetan) the answer is easy: the data does not exist in the font at all! It has the individual graphemes but a lot opf the rest of the data simply ain't there.

    Thus if you pick a font like Arial Unicode MS to do such work, you are never going to get the best result....

    For the other question, when one calls ExtTextOutW one gets some of the higher level Uniscribe functionality like font substitution on a per script basis (something the lower level functiins will never do), and as a benefit they will generally pick better fonts than Arial Unicode MS, so one will tend to see better results when the support is there.

     

    This post brought to you by(U+0f74, aka TIBETAN VOWEL SIGN U)

  • Sorting it all Out

    Almost winning an award can be pretty damn stressful!

    • 1 Comments

    A few days ago, and I'm really pleased about this, the Keyboard Layout Creator was up for a Microsoft Engineering Excellence award.

    It was funny, at the ceremony I was talking to one if the staff people, mentioning that we were up for an award and asking whether there was a ramp up to dais (gesturing to  the iBot). He looked horrified as he said no but I kept him from scrambling too fast. He calmed down then suggested I could always be right in front of the stage and all of the folks could be right at the edge around me. He eagerly asked if that would be okay.

    That would better than okay, I assured him. But we probably aren't going to win anyway, so we won't have to to have any special handling. I was just doing some due diligence....

    Inside I was quaking -- was that true?

    There is some pretty stiff competition, so most likely we wouldn't win.

    Damn, I should have said something earlier. They could have put in a ramp really quick, it wouldn't have been a big deal at all. If we win you're going to leave them scrambling to direct a senior VP to do a bunch of unrehearsed placements for no reason other than the fact that you couldn't open your mouth at the right time.

    Dammit.

    Maybe we won't win.

    I find myself invoking the will of the flying spaghetti monster to intervene here.

    They are reading off the honorable mentions now. There are several of these.

    Some I feel like we are not in their league, some we are in the same neighborhood, the same order of magnitude in their contribution. Maybe we'll get one of these, and no one will have to go up.

    Suddenly they call us, MS Keyboard Layout Creator is receiving an honorable mention.

    I exhale, realizing I had been holding my breath all that time.

    I can't tell whether the light-headed feeling is due to not breathing for so long, the relief of avoiding the on-stage circus, or the excitement of the recognition.

    I decide the latter is the least wimpy option so I choose to smile and go with that. :-)

    So there it is. We received an honorable mention at the Microsoft Engineering Excellence awards!

    It is really awesome having MSKLC recognized as an important contribution to both the keyboard ecosystem itself and the process by which keyboards are created and integrated into Windows. We changed that whole process and in fact fundamentally rearchitected it, massively improving quality while doing so. And after way over a third of a million downloads, a literally uncountable number of MSKLC-authored keyboard layouts were out there in the wild.

    Even Apple used MSKLC to create the Microsoft versions of their Apple style keyboard layouts for people who preferred them when running Boot Camp. Wow, a developer at Apple used a tool I wrote for a product they produced and released to customers, one they even charged money for.

    Somehow the honorable mention still seemed really cool, even cooler than the Apple thing. :-)

    Later at the banquet I am eating and answering questions about the iBot (as usual), and I find myself talking to one of the Engineering Excellence folks. I mention my panic attack about fear of winning and gratitude about getting the honorable mention, and he asked me what we were up for. I tell him and he seems impressed, telling me that one was a very close call, it could have even won. He asked me enough questions about MSKLC that I am convinced he isn't just blowing sunshine up my butt, and I point out that it is likely better things worked out this way for the sake of a smooth ceremony. He smiles and we talk about some of the winners after a bit has to excuse himself.

    I find myself in another huddle of people, this time one that includes John Devaan, a man I have actually met about five times before at various events over the last 13 years but none with the iBot so I'm sure he doesn't remember. He does after a moment remember the MSKLC proposal ("the keyboard one"), mentions himself that it was debated a bit. Outwardly I am calm and embarrassed, inwardly I am freaking out -- but I remember that we got lucky and MSKLC lost those debates.

    Eventually he excuses himself, after a bit I excuse myself too, and I head home. It has been a long day, but I need to head out for a drink. Or three.

    It is really pretty exciting to have received that honorable mention. Being stacked alongside a lot of the great tools, processes, and accomplishments that have won over the years, for example things like Script# and FxCop. It is quite humbling to be told one was close.

    And, given the poor planning at the end, kind of a relief things turns out the way they did! :-)

     

    This blog brought to you by(U+2aca aka SUPERSET OF ABOVE ALMOST EQUAL TO)

  • Sorting it all Out

    I've got [SCS]U under my skin....

    • 2 Comments

    Apologies to Frank for the title!

    Just last night old friend and compadre from the Unicode days¹ Doug Ewell asked me

    Hey Michael,

    This should be an easy question, but I can't find anything about how to do this in C#.  (I know, should have used Live or Bing or whatever instead of Google.)

    I'm talking about a System.Text.Encoder, of course, and I want to do it for SCSU.  And of course I want to do a decoder too. :)

    Thanks for any pointers,

    --Doug
    (brought to you by the byte 0x0E, which is the SCSU tag SQU)

    I probably should explain how Doug has an unhealthy² affection for SCSU (UTS#6: A Standard Compression Scheme for Unicode) and perhaps to a lesser extent BOCU (UTN#6: BOCU-1 - MIME-Compatible Unicode Compression), some compression schemes for Unicode which in theory can do  better job than programsd like WinZip because of their specific knowledge of properties of Unicode itself.

    Of course, both SCSU and BOCU are not your typical "encodings" since they are basically Unicode. But then again so are UTF-8, UTF-16, and UTF-32, and .Net has no trouble calling them encodings, so there should be no problems there.

    Anyway, under the old theory about what it means when no samples exist for a technology, I'll point to a sample here from Shawn of an overriding of System.Text.Encoding (with fallback behavior) that in his words "just reverses a-z, A-Z & 0-9 in ASCII". This should allow one to (with changes) allow one to implement the encoding work to support SCSU (or even BOCU!) if one wants.

    I would do it, though there is the whole "involvement with Unicode" thing that I grappled with previously, and I think it's best for the actual working sample to be left for someone without the conflict.

    Like Doug! :-)

    Now this is not "implementing an Encoder and a Decoder" but since you can get text in and out of SCSU I think it is good enough. Though perhaps there are nuances, I have never looked to closely at the differences. It should be enough to get started, at least. And if not, Doug will surely let me know!

     

    1 - I have made my peace with the fact that the connection I have with Unicode which really was confusing from a Microsoft standpoint since it is was largely outside of the company's own efforts to maintain that relationship is over. I take the Bulldog award as quite an ending achievement, like pitcher pitching a "perfect game" right before retiring. And thus I can talk about "the Unicode days" as a distinct entity. :-)
    2 - Not in the clinical sense, mind you. I am just having some fun at Doug's expense.

     

    This blog brought to you by(U+eeee, a private use character known for its resemblance to 50's housewives being scared onto chairs by mice!)

Page 1 of 1 (11 items)