Blog - Title

November, 2008

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Apocalypse Font (aka Guess they must have picked the wrong eight characters.)

    • 13 Comments

    The title of this blog is an allusion to Coppola's Apocalypse Now, and eventually I'll be quoting a bit of the Herr-provided narration (those are the pieces Martin Sheen read)...

    It all started with a seemingly innocent question the other day. It went something like this (product and component names removed to protect whatever might deserve protecting):

    We are hitting an issue where surrogate pair characters do not display correctly on localized builds, but display correctly on English builds. This appears to be because the MS UI Gothic font used in the localized builds doesn’t “automatically” do the correct font linking.  (This can be verified by e.g. opening Wordpad, setting the font to MS UI Gothic, and typing some surrogate pair characters—you just get squares.  If the font is something else, e.g. Arial, the font linking works correctly.)

    Is this a known issue with the MS UI Gothic font face? We are currently using one function to obtain the desired font face. Should we be calling a different function instead of this, or in addition to this?

    Now as it turns out, there were several different issues going on here.

    It start with the involvement of GDI font linking and Uniscribe font fallback, discussed previously in blogs like Font Linking vs. Font Fallback.

    First and foremost was the fact that this was what they call a tester scenario. Because of this,the actual supplementary, CJK Extension B characters in question were not ones that are in any version of JIS (including the latest JIS X 213), which is why they were seeing notdef glyphs (aka square boxes).

    Uniscribe largely stays out of the world of CJK (Chinese, Japanese, and Korean) text, allowing GDI font linking to so most of the work here. Usually this will guarantee that some ideograph will make an appearance, because as long as it is in one of those core CJK fonts, it will be on the screen.

    But there is one time when Uniscribe is completely involved and GDI font linking is not -- and that is supplementary characters.

    And Uniscribe is not quite as sophisticated in its efforts here -- it will see if the current font claims to support the Unicode supplementary ideographic plane (which contains e.g. CJK Extension B). If it does then the font will be used, even if there turn out to be some missing characters.

    For the Japanese fonts, such as MS Gothic:

    MS Gothic

    and MS PGothic:

    MS PGothic

    and MS Mincho:

    MS Mincho

    and MS PMincho:

    MS PMincho

    and Meiryo:

    Meiryo

    each font is actually pretty much limited to the 300-some CJK Extension B characters in JIS X 213.

    If you pick one of these fonts to display any other random Extension B ideograph, then you will get a square box.

    And if you pick a font with no Extension B support at all, then it will pick one font to look in, based on its algorithm and system locale settings -- thus if you choose Arial or Tahoma or Microsoft Sans Serif or Segoe UI, then you will possibly also get an ideograph!

    Korean does not have Extension B in any of its fonts.Given the gemneral tendency toward de-emphasis of Hanja in South Korea and the virtual illegality of it in North Korea, this is hardly a surprise (though this could change in the future if the customer demand drives change here).

    And for the most part Chinese has the widest support. Because whether one uses the Simplified Chinese SimSun-ExtB font:

    SimSun-ExtB

    or the Taiwanese style Traditional Chinese font MingLiU-ExtB:

    MingLiU-ExtB

    or the Taiwanese style Traditional Chinese font PMingLiU-ExtB:

    PMingLiU-ExtB

    or the Hong Kong style Traditional Chinese font MingLiU_HKSCS-ExtB:

    MingLiU_HKSCS-ExtB

    one has a much larger number of ideographs to choose from.

    The ranges are of course based on preferred glyphs in the PRC GB18030, Taiwan CNS11643, and Hong Kong HKSCS standards, respectively -- kind of the ultimate exercise of using a code page as a repertoire fence (something I have discussed before).

    But the bug did not quite end there.

    You seem it seems that the application had its own custom font choosing behavior, which in this case happened to be preferring the newer ClearType Simplified Chinese Microsoft YaHei font.

    A font that also has some Extension B in it.

    Eight CJK Extension B Ideographs, in fact:

    Microsoft YaHei

    These eight ideographs are:

    So far, these eight characters as a set seem to have no special relationship in China, Taiwan, Hong Kong, Macao, Singapore, Japan, Korea, or Vietnam, those being the major places where ideographs either are in use or have been within the last 1000 years.

    If the characters spelled something special, I'd assume it was some kind of Easter Egg in the font (imagine the challenge if coming up with such an egg that relied on eight Unicode characters displayed in code point order -- talk about a fun word challenge in any language!

    I am reminded of a bit from Apocalypse Now where Martin Sheen describes a report about Col Kurtz. Specially modified for the current situation, for the conspiracy theory minded:

    Late Summer-Fall 2008:
    The proper glyphs for ideographic text in the supplementary
    planes show up fine in Vista. Then in November in one font
    is noted the presence of eight specific ideographs. Two of
    them are in JIS X 213, three are from a list of Hong Kong
    Cantonese, one is from some from China. T
    he number of
    Extension B ideographs visible in the application in China
    drops off to nothing.
    Guess they must have picked the
    wrong eight characters.

    Kind of a stretch obviously. But still fun to write (had I time to really draw this one out it would have been as much fun in my opinion as that Matrix one!

    Whatever the reasons, their presence (due to the Uniscribe design here) can really break Extension B display support if someone is using the cool font with ClearType support.

    If I had to guess, I'd wonder whether they were in there as part of an experimental effort at looking at ClearType Extension B support that just never got taken out (why would they? It's not like they are wrong, except in the meta sense of their effect!). But again that is just a guess. Probably more likely than my Apocalypse Font scenario above! ;-)

    An interesting situation, in any case....

     

    This blog brought to you by 𠂇 𠂉 𠃌 𠦝 𡗗 𢦏 𤇾 𧾷 (U+20087, U+20089, U+200cc, U+2099d, U+215d7, U+2298f, U+241fe, and U+27fb7)

  • Sorting it all Out

    From I SCOOT to IBOT, #5 of ?? (sometimes it is phase 3 that is ????)

    • 12 Comments

    Warning: although slightly technical, this blog is mostly non-technical, and/or technical about stuff related to the iBOT. If the technical issues related to SQL Server and/or PASS interest you then they will probably show up in future blogs...

    Prior blogs in the series here and here and here and here.

    It's kind of funny.

    I just spent last week at SQLPASS 2008, and had a really great time. I was there as an "Expert" in the ATE (Ask the Expert) area, and I talked officially about migration, though I got a bunch of upgrade questions since customers don't tend to distinguish that much between migration (moving from any non-SQL Server database to SQL Server) and upgrade (moving from an earlier version of SQL Server to a later one).

    Plus I got many questions about Unicode (especially both UTF-16 and UTF-8), collation support in 2000/2005/2008, and also rich text support in SQL Server reporting services (drawing some on my typography knowledge, and my work with the RS team). It was really a lot of great conversations with customers and colleagues and partners and friends.

    Did I mention that I had a great time? Well, I'll say it again. I had a great time.

    And not just because I got to see friends and colleagues from years prior and find out what they are up to like Debra Dove (now a Group Program Manager!) and Tom Casey (now a General Manager!) and a whole bunch of others too numerous to mention, not to mention all the people I met in SQL Server marketing. I mean that was very cool, but that wasn't it.

    And not just because in my heart of hearts I'm still a databases person, just like I was way back almost in the beginning for me when Ashton-Tate's DBase II would spit out the "30 days hath September" rhyme when you gave it an out-of-range date (the first software "Easter Egg" I ever remember finding, running on an Osborne 1!). That was cool too, but that wasn't it.

    And also not just because the evening parties and after-parties are superior to any other conference I've been to, though they are (it just seems like SQL folks work hard and know when to play harder, and especially when to not talk about work). This is one of the many reasons I have not been blogging; to be honest I wasn't even sleeping much last week. Every developer should go to a few database conferences like SQLPASS. :-)

    The cool part I am referring to isn't that, either.

    As you can guess from the blog title, it was really because this was my first technical conference with the iBOT.

    I won't say that I was nervous, exactly. Though I admit I took the charger since I did not want to be stuck in Seattle with no way to get home!

    As one of the "Elite 300" ATE folks, I had $25,000 in SQL Bucks to give away to people who asked me questions. But I realized early on that iBOT questions couldn't count -- if they did I would have required a bailout from the program organizers since I'd otherwise be broke within half a day!

    To be fair there were a lot of people I talked to about SQL Server issues, too (as I mentioned before).

    But even some of those conversations started with a few iBOT questions -- people were just interested and curious....

    There was even one conversation with some of the WIT (Women in Technology) volunteers about the predictable way that iBOT questions tended to split on gender lines even when they were kind of the same question (e.g. "how does that chair stay upright?" from men versus "how do you keep from falling out?" from women).

    I was even prepared now, armed with the answer to the ultimate question, the one John McConnell (yes, the one who talked about When will we support Rongo-Rongo) asked weeks before, at a group mixer.

    You see, the most common question people would ask is some variation of "How is that thing balancing?"

    My iBOT shtick eventually developed more fully as the week went on, some based on the boilerplate in the iBOT FAQ's How does the iBOT® Mobility System work?:

    A revolutionary mobility system from Independence Technology, L.L.C. the iBOT® Mobility System utilizes our patented iBALANCE® Technology. It is custom-programmed and calibrated to the owner’s center of gravity. Reach forward in Balance Function to shake hands, and your iBOT® 4000 Mobility System moves with you. Lean back and it moves with you as well. It is subtle and responsive in a way no other mobility device can match.

    but it always eventually made it to the marketing point, about how the iBOT, unlike the Segway, uses six gyroscopes to do its work.

    This always impressed people though I suspect that was because mainly they don't really know much about gyroscopes. So there was no follow-up question along those lines.

    But John is one of those really smart guys who does know about things like gyroscopes, so he asked the very reasonable next question -- how does it use six gyroscopes?

    Now I even do know a bit about gyroscopes, but it had never occurred to me think of that next question, let alone ask it.

    To John I had to admit I never asked, but that I would find out.

    So I asked the iBOT folks.

    Perhaps not entirely surprisingly, the front line of the IBOT's helpdesk didn't know either. :-)

    But they tracked down a better answer, which I will give in a moment.

    It's funny (minor segue for a moment), over the course of the week at SQLPASS, I was reminded of the Underpants Gnomes from South Park, and their three step business plan:

    • Phase 1: Collect underpants
    • Phase 2: ????
    • Phase 3: Profit!

    The thing I noticed over the course of the week was that in the old days there would be some women who would be waving to me, looking at me with interest. They would almost invariably be looking at someone behind me or nearby, rather than at me.

    Now with the iBOT, they actually were looking at me. Well, technically they were looking at my ride, my iBOT, but they still wanted to talk about it. But at least they weren't looking at the person behind me, right? :-)

    Of course while it clearly appears that this "second phase" represented some form of progress, I really didn't know how to get to the third phase.

    In part because I'm not sure what it would be (I suppose having them being interested in me? I'm not sure!), but it still is easy to put it in phases and be curious about how to get to the third phase!

    Anyway, that answer.

    First, I'll borrow an image from Wikimedia to define Pitch, Roll, and Yaw in much less time than a verbal description ever would:

    Pitch, roll, and yaw... 

    If you absolutely must have words then the Wikipedia Flight Dynamics article can probably help...

    It has those other pictures that can help for people with the concepts, like these:

     

    Pitch
    Yaw
    Roll

    Now the gyroscopes on the iBOT are there to look at the movement of the chair, measure these forces.

    The computers in the iBOT generally work under the principle of three independent subsystems so that if any one of them is getting different results then what one would expect, you will either see corrections being made or end up in a warning state if none can be made. But that does not mean that the gyroscopes are all there to provide nothing but redundancy.

    Like with the Segway, They are laid out a bit more cleverly than that....

    Three of them are placed on the main axis of the chair, in a straight line from front to back.

    And the other three are off axis at various strategic places to handle detection of various other kinds of movement.

    Combined with the data of the chair's mode and factors like the acceleration and turning being applied, the computer system can determine three things:

    • where it is;
    • where it is moving;
    • where it is trying to move.

    With that information, it can also know if its progress is being blocked or hindered or accelerated in some way, so it can use its internal governor to speed up or slow down, to correct an incorrect situation, or to trigger a fault condition if it cannot recover from a problem without user assistance.

    The gyroscopes themselves are used to work together to provide the data rather than just having two items trying to get the same answer redundantly to re-check the answer -- because if the computer knows by how much the two gyroscopes should be different, then it can know when something isn't right. The combination is described on many sites online, like this one that talks about the Segway but which partly applies here too (the main difference being that the iBOT is not trying to get data on steering from the movements of the passenger, it is trying to get data to keep the passenger upright, in an inverted pendulum kind of system.

    Other sites like this one also have other somewhat useful descriptions:

    Like the Segway HT, the iBOT contains patented dynamic stabilization (iBALANCE) technology, an integrated combination of sensor and software components and multiple computers that work in conjunction with gyroscopes. Gyroscopes are motion sensors that help maintain balance. When the gyroscopes sense movement, a signal is sent to the computers. The computers process the information and tell the motors how to move the wheels to maintain stability. This electronic balance system is custom-programmed to the user's center of gravity, to monitor and respond to subtle changes in motion. Reach forward to shake hands, and the iBOT moves with you. Lean back and it moves away as well. The iBOT constantly realigns and adjusts its wheel position and seat orientation to keep the user upright and stable at all times, even when driving up and down curbs or inclines. In addition, the iBOT includes built-in triple redundant backup systems, as well as auditory and visual signals to provide even more safety and assurance. With input from the rider or an assistant, in "Stair Function" the iBOT utilizes gyroscopes and adjusts to the driver's center of gravity, climbing stairs by rotating wheels up and over each other. The iBOT can allow riders to stand up to the same eye-level as colleagues. The "Balance Function" of the iBOT can raise the rider to eye level for any number of business or social interactions. It lets the rider see over counters, and reach a high shelf in the office, kitchen or supermarket, safely and easily.

    And this page from Silicon Sensing, one of the component providers, gives some more info:

    The Segway PT is instantly recognisable across the world as a unique and alternative means of transport. Its press launch in December 2001 attracted enormous attention. But, by the time of its launch, Silicon Sensing had been working with the inventor of the Segway PT – Dean Kamen – for several years helping to develop and deliver its key design element – the balancing technology. This close relationship stemmed from our role in providing the balancing gyros for the IBOT®, the equally novel balancing wheelchair, from the same inventor.

    Technically speaking, the Segway design is classic implementation of the 'inverted pendulum control theory' – balancing a broomstick on your fingertip is another example of the same thing. But to enable an automatically-balancing system based on this theory demands the availability of sensing, processing and actuation, all of which are fast and accurate enough. And for a commercially-viable product to emerge, this further demands the availability of these technologies at affordable prices, with sufficient robustness and reliability, and being of a suitable size.  The overall system concept demanded that the Segway PT could always continue to balance if a component fails, whilst providing alarms and reversionary action to ensure that the rider is able to dismount safely.

    Being involved from the very early days, Silicon Sensing were able to propose and develop an innovative design, to be called the Balance Sensor Assembly, in which the size, reliability and affordability criteria were met through use of our VSG3-based silicon MEMS gyro technology. A key requirement was at least dual redundancy in balance sensing – and the desire for triple redundancy in at least the pitch axis. Although not immediately obvious, the other two axes of yaw and roll also required to be sensed for the situation in which the Segway PT is balancing on a slope.

    The resulting solution is ingenious. Rather than providing dual and triple redundancy on each axis separately, the gyros are set at angles such that, by applying trigonometry to any pair of gyros, it is possible to deduce pure pitch, roll or yaw in more than one way. In summary, the solution provides three ways of measuring pitch and two each of measuring yaw and roll. To complete the module, two dual-axis liquid tilt sensors are included which sense the true 'down' direction and thus the pitch and roll angles. Processing within the BSA – again duplicated both electrically and physically – continuously checks the sensor data and monitors for any failures.

    And then this document from BAE Systems, another one of the component providers, gives some more technical info on the nature of how the results of multiple gyroscopes are combined:

    Using Maths to support design engineering
    Ideas from maths are also important in engineering. The Segway HT makes use of a simple but ingenious bit of maths to reduce the number of silicon sensors used in the balance sensor assembly (BSA).

    Directions of motion
    There are three kinds of rotational motion that the Segway can experience; pitch, roll and yaw. These can be detected by a gyroscopic sensor.

    Pitch Stand up and lean forwards and backwards
    Roll Stand up and lean from side to side
    Yaw While standing upright turn from your left to your right

    For safety reasons each direction of motion needs to have two sensors, apart from pitch (the motion used to control the Segway HT) which needs three independent sensors. So, seven (rather expensive) sensors in all should be needed.

    Geometry to the rescue
    Cleverly, the BSA has one independent sensor (sensor 1) measuring just pitch and a set of four other sensors, angled such that each has two jobs. Sensors 2 and 3 both measure pitch AND roll. These are physically arranged so that positive pitch motion will cause both sensors to give a positive output signal. A positive roll motion will reduce the signal from sensor 2 and increase the signal from sensor 3.

    Sensor 2 measures pitch minus roll. We can write this as Sensor 2 = P – R.
    Sensor 3 measures pitch plus roll. We can write this as Sensor 3 = P + R

    If we add these two equations together like this:

    Sensor 2 = P – R
    plus
    Sensor 3 = P + R
    Sensor 2 + Sensor 3 = 2P (the +R and the -R cancel each other out)

    If we subtract these two equations from one another like this:

    Sensor 2 = P – R
    minus
    Sensor 3 = P + R
    Sensor 2 – Sensor 3 = P – R – (P + R)
    Sensor 2 – Sensor 3 = -2R (the +P and –P cancel each other out).

    The steering is done via a Joystick, and its way of doing the steering combines the way a boat would be directed (for the lateral directions) with forward and backward movement handled via forward and back in the joystick in a way that does not exactly match any that I have seen (usually this is handled elsewhere, in a throttle), thus leading to the not entirely intuitive backward movement that I am still learning.

    Usually I rely on the zero radius turns so I can just go forward; this is definitely slowing down my "backwards" learning. :-)

    Anyone want to take a guess on the difference between the Segway and the iBOT, with the former needing five gyroscopes and the latter needing six, in particular tp how the slightly changed mission of the one unit changes the way the gyroscopes are laid out? I give some of the answer away above in the Segway descriptions and the requirements of the two different units but not all of it.

    Anyway, no one ever asked that question John did, which in truth seems like a very reasonable question to ask in response to the six gyroscopes answer. No one else has yet managed to get that far....


    This post brought to you by (U+267f, a.k.a. WHEELCHAIR SYMBOL)

  • Sorting it all Out

    From I SCOOT to IBOT, #6 of ?? (The cost of standing still?)

    • 9 Comments

    Once again, a blog for your reading pleasure that is technical though not on the usual subjects. If iBOT crap bores you, then please skip gracefully!

    Prior blogs in the series here and here and here and here and here.

    It was actually in a response to that last one that regular reader and long-time friend and colleague Tony Toews asked:

    So I gotta ask a basic question.   Do the batteries last all day when you're spending large parts of the day in balance mode on two wheels?

    Now of course we know from past experience that it is the basic questions that can have some of the most complicated answers!

    Now at one level it is simple --they did manage to last that long while I was at PASS.

    And also simple at a more technical level, as the provided documentation for the iBOT describes it all.

    Of course the usefulness of that text:

    When tested in conformance with ISO 7176-4 (1997), the theoretical distance range while driving steadily on flat, level ground exceeds 15.5mi (22km) range in Standard, 4-Wheel, and Balance Functions.

    is somewhat obviated by the disclaimers of the following sentences:

    This data should only be used for comparison purposes with other power wheelchairs. The actual distance you can travel will depend on total weight carried, weather, surface conditions, driving style, other power usage (e.g., transitions between functions), and climbing stairs.

    and really it kind of tends to ignore the actual question Tony is asking -- which is a very reasonable question to have.

    Now ISO 7176-4 (Wheelchairs -- Part 4: Energy consumption of electric wheelchairs and scooters for determination of theoretical distance range) is useful as far as those things go, though it ignores some crucial information such as the vastly different characteristics of the Ni-Cad batteries of the iBOT from the Lead Acid Gel batteries seen in most power wheelchairs -- differences that will really impact actual usage. even 7176-4 says as much on its introduction:

    Distance range is also strongly dependent on the way in which a wheelchair is driven, and a single value for theoretical range can be insufficient to provide an understanding of the performance of a wheelchair. Two methods for determining theoretical range are provided in this part of ISO 7176, for driving and for manoeuvring. These values are intended to facilitate wheelchair comparison in a manner analogous to the extra-urban and urban fuel consumption figures published for motor vehicles.

    Funny, that's what I was about to say this was as useful as. :-)

    So why is it that a document that claims that various modes (Standard, 4-Wheel, and Balance) all will allow for analogous ranges?

    Well, technically it doesn't -- it just says they can all go THAT far, wihout claiming any one can go farther. Even if it can.

    Is there truly no power cost to the more heavy-duty 4-wheel mode even when it is on flat ground, analogous to the way that a car in 4-wheel drive mode needs more gas than one on 2-wheel drive mode?

    Maybe -- because as observers have noted, the wheels do not work independently; all four always turn even if two are off the ground.

    But is the power cost of the iBALANCE functionality (used in Balance mode and to a lesser but still present extent in 4-wheel mode) truly so negligible that the three modes are considered to be the same in tems of power cost?

    This just seems wrong....

    Every taximeter I have ever seen while sitting in a taxi charges for both the driving time and the waiting time by some unknown (to some) formula that one could clearly use to one's advantage if one wanted to charge too much. It would be easy to think the iBOT above such petty nickel-and-diming as the taximeter and such criminal designs as the one who would actively subvert one to make more money.

    But I have seen my battery meter decrease while I was just standing in balance mode in a presentation or at the bus stop waiting for the 545 to Seattle.

    So I know that such a thought is baseless.

    Even more than I "know" that the whole Ni-Cad memory effect is probably a crock since it hasn't really been proven and I am the sort of person who thinks that anything people "know but can't bother to prove" is more likely to be a crock than not. More on my opinions here in a blog post some other day....

    Because this one, I know through direct observation, through admittedly suspect measurements.

    I trust the battery meter on my iBOT like I have never trusted a battery meter before -- because this is the first time it ever really mattered that the gauge looked accurate enough that it could give me useful information.

    And that meter showed some cost to iBALANCE in the iBOT's Balance mode.

    Not a lot. But some....

    Could the decreased cost of some other factor cancel out the increased cost of the iBALANCE stuff, making all three modes really have the same range? Hmmm...

    This is, by the way, the kind of thing that irks me some. I mean, all of the testing and work they do and they can't just provide some data on average power usage of iBALANCE that could be compared as amount of time one could have instead been driving if the human dignity thing wasn't as important as the distance?

    Perhaps the next time I am reading a book I'll get in the iBOT in Balance mode and read until the battery drains two ticks on the battery meter, then move to 4-wheel mode and read until the battery drains two ticks on the battery meter.

    If either takes an unconscionably long time, I'll apologize for my the part of my irk not related to the fact that they have almost certainly has done this kind of testing but never bothered to publish the results, and be happy with the original non-information they provided. And if either time is significantly short enough, I'll be at full irk mode and try to look into a fuller, more formal amount of data to be provided....

    Until then, the question is still pending, Tony. Stay tuned. :-)


    This post brought to you by (U+267f, a.k.a. WHEELCHAIR SYMBOL)

  • Sorting it all Out

    If he don't like it, or me, then why the hell is he here exactly?

    • 8 Comments

    This blog is as off topic as you can get without a prescription from your doctor....

    Sometimes when one doesn't get the answer one wants, one can feel somewhat bitter about that fact.

    Technical problems with computers can cause a person to be particularly susceptible to that kind of reaction, actually.

    Though there can also be more to it, sometimes.

    Case in point: a response to a blog from almost two years ago Vista turns on everything, which explains how you can't turn of the Text Services Framework anymore, like you could in the old days of prior versions.

    Admittedly not great news for people who wanted to turn it off for application compatibility reasons.

    Anyway, the response that Luke sent on (with no return address):

    Useless as ever. You are nothing but a fool. This post is less useful than a broken key. I come here wanting to learn how to turn off advanced text services, and you take up several paragraphs to say "You can't". Don't ever attempt at helping anyone you useless 9 year old.

    There is something particularly hateful about these words that really gives me pause.

    It could be simple frustration leading to an emotional over-reaction, one that the seemingly anonymous nature of the Internet only encourages.

    And some of the words such as "I come here wanting to learn how to turn off advanced text services" though clearly the title doesn't even suggest that the blog is about Advanced Text Services at all. tend to clearly suggest a man who found Vista turns on everything via a Google search (as I have to admit so many do).

    And someone who found the blog by searching specifically for how to turn off TSF in Vista who read the whole post might get very frustrated about the "waste of time" and all.

    And the conclusion of the comment (Don't ever attempt at helping anyone you useless 9 year old) certainly does display a certain amount of impatience and frustration. The kind that makes people lash out in perhaps strange ways that seem vaguely inappropriate.

    Though there is something else in those words and others like "you are nothing but a fool", something that does not fit the picture -- it is not just the one blog that has Luke so unhappy with me. There really is something more going on here, running much deeper than a momentary frustration at not solving one single problem.

    And then the initial bit of the comment (i.e. useless as ever) really doesn't seem to match here either, and makes no sense in the context of someone who had never been here before and (after the mistake of visiting the one time) would never visit again.

    This is someone who doesn't like me, or maybe my online "persona", or maybe after having met me in person. Someone who just really finds no use for me whatsoever.

    It's funny, I think that some of the people who hate me the most spend more time dissecting my words for inappropriate meanings to prove their beliefs than the people who are actually fans. This Luke may be one of them, one of the people who just really doesn't care for the taste of my brand of chai.

    I used to talk with my friend Liz about this, and I have talked about it with Andrea too - in fact I've had this conversation with both of them long before I even had a Blog, nay before Blog was even a word. And both of them have pointed out that if I wanted to reverse everything I could, but that I speak with a very distinctive voice and would probably have a very hard time changing that since it mirrors the way I think about things.

    I gave up three decades ago (in the third grade) trying to please everybody, and have never had cause to think I made the wrong decision back then.

    The Blog is perhaps a megaphone, but not one that is changing what I say or how I say it all that much. I use it (and occasionally even abuse it!) in the same way that I would have done in any book or website or email or newsgroup post or presentation or conversation. I can name both people I have maddeningly frustrated and people I have ecstatically delighted. And I think I do serve a "net positive" purpose with what I do -- for myself, for my group, for Windows, for Microsoft, etc.

    And of course you do have a choice here -- you could just not read me if you don't like me, or what I say, or both.

    So Luke (or whatever your name actually is), if you want to come out from where you are hiding and tell me what your actual concerns with me are then I'd be happy to hear them or even discuss them. Or if you'd rather hide grudges or hatreds behind anonymous venomous messages then I suppose that is okay too.

    Though the likelihood of having either influence or impact is much greater in the former approach than the latter. A friendly suggestion. :-)

     

    This blog no sponsor, just as this sentence no verb.

  • Sorting it all Out

    "We" don't tell you how to spell *our* language in *yours*, so...

    • 8 Comments

    Now if you look at all of the following blogs:

    The real issue we are talking about (once everyone stops complaining, which can take a while!) is the problem I explain in Who owns English, exactly?.

    Of course if we "owned" English (assuming "we" could define who "we" are in this case!), then wouldn't we take all of the following and more:

    Angielski
    anglais
    Anglè
    Anglès
    angleščina
    Anglické
    Angličtina
    Anglis
    anglisli
    anglizča
    Anglu
    anglų
    Angol
    Engels
    Engelsk
    engelska
    englanti
    Engleski
    Englezã
    Englisch
    English
    English Hol
    Ingelesa
    Inggeris
    Inggris
    İngilis dili
    ingilizçä
    İngilizce
    ingles
    Inglés
    Inglês
    Inglese
    inglise keel
    TiếngAnh
    Αγγλικά
    англiйская
    англизча
    Английски
    Английский
    Англис
    англисū
    Англиски
    Англия
    Англійська
    Енглески
    Инҝилис дили
    Անգլերեն
    אנגלית
    ענגליש
    الإنكليزية
    انگليسي آمريكايي
    अंग्रेज़ी
    ஆங்கிலம்
    ภาษาอังกฤษ
    ინგლისური
    영어
    英語
    英语

    and have opinions about the way English is spelled in other languages?

    Perhaps since no one in Great Britain or Australia or Canada or the USA is dictating how the items in this list are to look,people sould not spend so much time trying to tell people how their language is to be spelled in English? :-)

    You might be living in Iran, and bothered by the English word Farsi.

    Or perhaps you are living in the Xinjiang Uyghur Autonomous Region of China and bothered by the English word Uighur.

    And so on.

    But it might be a good idea to take a deep breath and relax....

     

    This blog brought to you by E (U+0045, aka LATIN CAPITAL LETTER E)

  • Sorting it all Out

    The sort order of the Language Bar (and Michael is in heaven on this one, other than...)

    • 8 Comments

    At this point, I am convinced that people are afraid of the Suggestion Box.

    Because even if the question would completely make sense to be there, people still send it to me via the Contact link and thus become part of a big file of potentiasl questions that I may or may not get to at some point that no one else never sees.

    I think I am going to have to reword my text meant to dissuade people from doing that or something....

    Anyway, one of the sent-via-the-Contact-link-but-shoulda-been-put-in-the-Suggestion-Box question I got was:

    Is there a way to (manually) rearrange the order of input languages in the language bar (and also possibly the different keyboard layouts under each of those languages)?

    Excellent question!

    Well, actually I'll say excellent questions since there are two in there.

    These days it does seem like the two subject areas I do the most advisory work in are still keyboards and collation, despite the fact that my actual job is so very different now. So it is interesting to get a question involving changing the order of lists of keyboards, the same way it presumably would be interesting to get a question on how one would type in a sort. :-)

    Unfortunately, the answer to the first question is no; there is no documented or supported way to change the order of the languages in the Language Bar.

    Also, just as unfortunately, the answer to the second question is also no; the keyboard layouts underneath each language also have no documented/supported way to be re-ordered.

    For both questions, the intent was to ask if there was a way to do them manually, but there is also no programmatic way, either.

    As features go, this is one that in my opinion would make some sense to provide, if not programmatically then at least manually. Because if you have two or more layouts then being able to control not just the default for each new input queue but also the ordering of the various lists when lists exist would make some real sense, for the sake of usability.

    You know, handled in a way similar to the whole "lock the taskbar" type functionality where if you turn off the checkbox you are free to drag stuff around a bit. I could get behind that kind of a feature. Especially in my case where I am constantly adding and removing random keyboards for various reasons, but even users who have a more static list might want a simple re-ordering they could do once and then not have to worry about again....

    Though surprising as it may be, Windows features are not always triaged according to what I personally thinks might make sense. :-)


    This blog brought to you by(U+1961, aka TAI LE LETTER TSHA)

  • Sorting it all Out

    From I SCOOT to IBOT, #3 of ??

    • 7 Comments

    Prior blogs in the series here and here.

    I had to set a security code for the IBOT.

    Peter, the guy who was programming the code into the IBOT as soon as I had decided what it would be, was shaking his head as he watched me going through the exercise of choosing the code.

    He asked me if I was sure I wanted to make it as complicated as that, reminding me I'd have to enter the code again (requiring me to remember it).

    "Are you kidding?" I asked him. "That's just the first part!"

    "Okay, you're the boss."

    The code had to be good, though.

    Because the IBOT, unlike the scooter, has no key.

    I leasrned very early on not to leave the scooter in the hallway at work with the key in it.

    Nobody would steal it, but joyrides were certainly not beyond them....

    So with no key, I needed the activation code to be complicated enough that no one would be able to just get it.

    It was actually a scene out of Ocean's Thirteen that I was reminded of:

    Rusty: Okay, where's Eugene's trapdoor?
    Livingston: Under the Dragon, first machine on the left.
    Rusty: Got it. What's the secret?
    Livingston: Coin, 3 counts. Coin, 6 counts. 3 Coins, 5 counts. 2 Coins half count.
    Rusty: Could you make it anymore complicated?
    Livingston: That's just the first sequence....

    That's the sc ene. The complicated set of instructions to enable the "make the slot machines pay out" feature they conspired to get into the casino.

    Okay, perhaps slightly less at stake here. But it's the same principle....

     

    This blog broight to you by(U+2387, aka ALTERNATIVE KEY SYMBOL)

  • Sorting it all Out

    From I SCOOT to IBOT, #4 of ?? (with some pictures!)

    • 7 Comments

    Prior blogs in the series here and here and here.

    In response to I SCOOT TO IBOT, #2 of ??, Gwyn commented:

    Can you provide some pictures of the different modes? I'm not really sure what they all are exactly like.

    Very good idea!

    I had a few minutes and my camera so I decided to take some pictures.

    I had no one else here and I did not think of it earlier in the weekend when I was around some people, so this would have to be a solo operation. I'll probably try to get some more when there are people around, eventually....

    WARNING: Although I took these pictures of different modes while I was not in the chair, you should NEVER do that. EVER. The chair is calibrated with a me-sized person in it, and me not in it is not something that it knows what to do with!

    We'll start with standard mode.

    First from the front:

    IBOT standard (from the side)

    and then from the front:

    IBOT standard mode, from the side

    and then from the back:

    IBOT standard mode, from the back

    Special things to note in this mode (not including the I'm a PC sticker on the side, the Microsoft parking pass you can see in the front and the license plate you can see in the back, the latter two of which came from the Saab that is no more) -- this is the mode that is fastest -- up to 6.8 MPH.

    Note those extra caster wheels in the front -- they are only used in this mode.

    The control is fairly lousy so I really only spend time in this mode when I want to be parked or as low as possible or if I have to go somewhere in a hurry.

    Then there is the four-wheel mode.

    Here it is from the front:

    IBOT four-wheel mode, from the front

    and from the side:

    IBOT four wheel mode, from the side

    Now notice how those caster wheels are just kind of hanging there? They are not used in this mode.

    And this is the mode that can bring up the carpet squares if it used indoors. So that is something to not drive around in, inside.

    But it is very rugged and can take on some really steep hills (both up and down).

    I find it to be the best all-around travel mode, and the one I use (for example) to go to work with unless it isn't raining and I do not mind the extra few minutes of the next mode.

    Finally, there is balance mode.

    First from the front in its shortest view:

    IBOT balance mode (shortest), from the front 

    and from the back in its shortest view:

    IBOT balance mode at its shortest, from the back

    and then penultimately (and more impressively) from the side, in the shortest setting:

    IBOT balance mode at its shortest, from the side

    and then finally, and most impressively  is balance mode in its tallest view:

    IBOT balance mode at its tallest, from the side

    This difference may be impossible to see from the pictures, I think it'll have to have people next to it while I'm in it to be meaningful.

    Though there was one important difference which I took some brief video footage of.

    Basically, in the shortest balance mode, the empty IBOT shuffled back and forth a little bit as you can see in the low-res video here (WMV, ~679 KB zipped).

    But in the tallest balance mode, the empty IBOT was moving a lot and was clearly missing me, or some reasonable me-sized weight in there as you can see in the low-res video here (WMV, ~1 MB zipped).

    When taking it out of balance mode the fact that I confused it was even more apparent, as it struggled to try to balance a me-sized weight that wasn't there made it go forward at least five feet.

    I'll try to do this again when I have someone else there to do the video while I switch the modes. It really was quite a site!

    Anyway, hope that will do well enough until I have time to get some more made, with people so that they will be more useful for all the reasons I mentioned! :-)

    On a side note, there is an IBOT over on eBay right now for 12000 which, while definitely better than the full price might see more challenges getting are insurance company to cover it....

     

    This post brought to you by (U+267f, a.k.a. WHEELCHAIR SYMBOL) 

  • Sorting it all Out

    I think we're taking the wrong approach, mostly

    • 5 Comments

    In the past, I've done a lot of presentations on globalization and localizability issues.

    In different companies where I was brought in to do this, they were very well received, because generally a company is being asked to do the work to support another language and the people being trained found themselves quite hungry for the info.

    But when it came to conferences, most of the positive feedback went along the lines of "very interesting presentation" but in the checkbox for whether it was useful for their immediate work, often they'd say no. Because if a company is spending thousands on a conference, they don't usually have such a focused requirement. If the person even signed up for the talk, it was either curiosity and thought they'd see me cause trouble or maybe they'd heard of me or whatever.

    There are exceptions to this, like the Internationalization and Unicode Conference. But this only proves my point -- you could probably fit over 30 IUCs in a TechEd or a PDC. So you end up with the very small number of people being sent to a specialized conference, often with a generic requirement of "we need to support GB 18030" or "we have to do Japanese" or whatever.

    When it comes to Unicode support, NT shipped well over 10 years ago and quite a few applications out there still don't support it. Slides like

    Language Matters! 

    are only interesting for shock value -- they aren't going to convince anyone who isn't already convinced, and looking their own presentation to justify it to upper management.

    Because how many companies are thinking of shipping their software overseas and not just shipping it as is?

    MUI is a cool technology, but it is not of general interest to anyone other than people trying to build in-box drivers in windows who are told they must support it by contract. People are given an assignment to ship the product in Japan; they don't wake up and say "we should support 10 languages in a switchable fashion" for the hell of it and then the sales people cn figure out what to do with it."

    The flaw is that by trying to get people interested based on some nebulous notion of "best practices" it is hard to get people interested.

    Best practices for globalization? Manuel Garcia O'Kelley Davis would say null program.

    But when asked to do presentations on security issues with string comparisons or the consequences of  user settings breaking applications, I often get a lot more interest.

    People care a lot more about consequences than they do about nebulous features (since selling software in another country is a lot more complicated than just these issues -- there are legal issues that by themselves would block most people from ever even considering it).

    I mean, a lot of the PC game industry and a lot of the driver industry "supports" Unicode. 

    I put supports in air quotes because they may not be and in many cases probably aren't doling anything outside of the ASCII range.

    But they support Unicode because the OS underneath them does and they want to avoid the extra OS conversions.

    I guess what I'm saying as is that we have to stop trying to appeal to "best practices" or the miracle of dynamically supporting UI in 40+ languages.We need to focus on:

    • the bugs that break their own code as it is;
    • security issues caused by their code as it is;
    • performance issues in their code as it is.

    People care about those kinds of issues a lot more than they care about good globalization or localizability.

    Unless they are are already in those markets or want to be acquired by a company who is that pays attention to whether this work is already done, of course!

    Now I have a couple of regular readers here.

    But mot of the traffic comes from people searching for information on bugs or issue hitting them now.

    All of this is why I think we're mostly taking th wrong approach....

     

    This blog brought to you by ? (U+003f, aka QUESTION MARK)

  • Sorting it all Out

    'It's Not Easy' saying WTF to an 'Ant in Alaska'

    • 4 Comments

    This blog title is not a reference to Kermit's It's Not Easy Being Green, as any diehard Liz Phair fan would recognize...

    I'm going to dig a little into one of the random questions that came out of this last April's I'm aware of that: an Andreaesque segue and intervention, of sorts.

    Andrea: I don't think people understand your relationship with your old team. Especially since you are still writing about a lot of the same things you were before. Do they read it? Do they agree with you or disagree? Do they still talk to you?
    {deep breath from Andrea}
    Andrea: And I don't just mean other people for this one. I don't get this one either. What's your connection to them, now?
    Michael: Oh wow, that one is a bit harder.
    Andrea:
    I'm aware of that. But I sincerely doubt that I am the only one who is curious.
    Michael: Okay, I'll think about that one, too. Maybe that'd worth a post or two, at least for the "me" half of it. I wouldn't try to speak for the other half....
    Andrea: It might help the confused among us

    My old team.

    NLS.

    National Language suppport.

    Not Localization, Stupid!

    Globalization Services.

    They have a lot of names.

    The song I'm playing now might tell you something about it.

    This is the one singing it:

    Liz Phair, lying there 

    It is Liz Phair, from the way back in the time of the girlysound days (the song, not the picture), the title is I Know It's Not Easy and it was re-recorded for the Exile in Guyville re-release under the name Ant In Alaska. They took out a line or two, I think, but you'll barely miss 'em if you don't have the original version.

    It has a lot of the same raw feeling, and I know some have argued whether it was re-recorded at all but I think most people agree that it was.

    The re-recording is most notable for the prefixed 58.5 seconds of silence, which for me symbolizes something too. Maybe I'll talk about that some other time, or maybe not.

    For now that is something between Me and Liz.

    And Liz, actually.

    The lyrics for the song go something like this:

    Call me when you think the coast is clear
    I've been hiding out almost a year
    Is there something wrong? What's taking you so long?
    And yeah, I know it's not easy
    You said if I waited it'd pay off
    But my eyes are growing wild and my body's gone soft
    Is there something wrong? What's taking you so long?
    And yeah, I know it's not easy

    You said I should let go of your hand
    But I don't even know if I can
    You're the only one, you are the very sun to me
    And you know it's not easy

    You'd tell me, wouldn't you, if we needed to talk?
    And you'd tell me, wouldn't you, if I'd pissed you off?
    Is there something wrong? What's taking you so long?
    And yeah, I know it's not easy

    Well, I look at the stars and I know you're under them
    I look at the cars and I know you insure them
    I look at the books and things people are reading
    I know that you've written them, too
    You've got so many little things to do
    But then I look at my life and I know you've forgotten
    The promise you made to me, I think that's rotten
    I'm hopelessly lost and there's hardly a sound anymore
    Coming through that can show me around
    'Cause I'm endlessly endlessly searching the crowd
    Looking for something from you
    Just one fucking measly clue
    Any shitty little tipoff would do
    But I'm just an ant in Alaska to you

    Then I look at the stars and I know you're under them
    I look at the cars and I know you insure them
    I look at the books and the things people are reading
    I know that you've written them, too
    You've got so many little things to do
    But then I look at my life and I know you've forgotten
    The promise you made to me, I think that's rotten
    I'm hopelessly lost and there's hardly a sound anymore
    Coming through that could show me around
    'Cause I'm endlessly endlessly searching the crowd
    Looking for something from you
    Just one fucking measly clue
    Any shitty little tip-off would do
    But I'm just and ant in Alaska to you
    I'm just an ant in Alaska
    An ant in Alaska
    An ant in Alaska to you 

    Now most of the themes of this song are not what I am saying my relationship with my old group is like.

    Seriously.

    Our "break-up" (such as it was) was nothing like this, at all.

    But that last line....

    This world in which I now live, in almost the southwest most place in the building on the opposite side of the group's East side abode in the building, when I haven't been enlisted in their branch for way over a year since Track change (a.k.a. A new job that has a few things in common with the old one) happened, I think that buried in the line is what I think of as the connection I have with my old group.

    At least symbolically, I'm an Ant In Alaska.

    Sometimes I meet with them and they ask me questions about stuff as they work on new features.

    Sometimes they tell me about their plans (since in theory a lot of the other groups I help out might find it helpful if I know about future plans, though in practice not so many are directly impacted).

    But not much (or at times any) of my feedback actually ends up in the new features, and the final plans are often wildly different than I was originally told.

    I probably have more influence and impact on their clients and on customers in other parts of Microsoft (in part due to this unofficial blog, in part due to past relationships) and even on external customers (again via this very Blog!) than I do on them -- to them, I think I really am an Ant In Alaska, even if they do read here (several don't, and it isn't like they have to, but some still do).

    To be honest, I don't know that I'm particularly bitter about that though. I suspect I'd be a lot less happy about the work I do if I knew more about what was going on there, due to the natural desire to be unhappy with things that change, especially if it is not changing the way I would have done it, given the chance. Not knowing gives me a better sense of distance.

    Other people in the building do read the Blog, I know -- they ask me stuff all the time and some of them even feed me ideas that end up becoming blogs here (and others are on the list to be done like the one of the double L, you know who you are!).

    But on the whole, I do feel closer to customers now, which was really the whole point of the Track change thing anyway. Which means I'm happier, a lot more often than I'm not.

    I mean, I won that cool award:

    Bulldog

    and the only people who really knew about were the folks who came by my office (not many of the folks from the old group) and the ones who read about it here. No mail was ever sent (amusing in and of itself since as I suspected it was never mentioned in any mail to the group), so people have just found it kind of randomly if they happened to be coming by for something else.

    We don't get a lot of visitors from the rest of the building, though.

    I was talking to a teammate from the NLS days the other day about an issue that had come up and it had the same cordial feel of a conversation I had with a former manager from nine years prior. No bad feelings, a lot of mutual respect and interest, and very little real idea of what the other person was up to, which kind of made the small talk much more purposeful as we both tried to "catch up" on things. Not self-consciously since there was no expectation that the other one would know things, but a collegial kind of "good to chat from time to time" sort of thing.

    Know what I mean? Just like the manager from nine years prior.

    So we are in the same building, but not the same team....

    And then a different other day, a colleague in another group entirely who managed to embarrass me with his praise a bit asked me:

    Out of curiosity… you have been with int’l for a while, yes? Do you ever feel pressured by yourself or others to “move on and do something else”?

    And I guess the answers are yes and yes some times and yes some other times.

    I've even had a tempting offer or two.

    But there are still things I can contribute here, so I have not been giving into the occasional temptation yet.

    And I'm really not saying that It's Not Easy (the old song title) being an Ant in Alaska (the new song title). Because for the most part, it is -- and I find that I actually enjoy being an ant in Alaska. Kind of collegial isolation technique!

    You know, I debated not writing this blog. And then after I wrote it I debated not posting it. But sometimes you've just got to say WTF....


    This blog brought to you by(U+2f8d, KANGXI RADICAL INSECT)
  • Sorting it all Out

    What's the shape of the sort?

    • 4 Comments

    There is an old Marx Brothers routine that goes something like this:

    Groucho: What's the shape of the world?
    Harpo: It's terrible.
    Groucho: No, I'm talking about the shape.
    Harpro: Oh, that's different.
    Groucho: So what's the shape of the world?
    Harpo: I don't know.
    Groucho: Well, what's the shape of my cuff links?
    Harpo: Square.
    Groucho {becoming exasperated}: Not these cuff links, the ones I wear on Sunday.
    Harpo: Round.
    Groucho: So, what's the shape of the world?
    Harpo: Square in the weekdays, r
    ound in Sunday!

    I was thinking about it the other day, when a question came up about whether one would prefer CompareStringA or _stricmp for string comparisons in a particular situation.

    That could really be thought of as a trick question -- the preference is obviously to use Unicode strings and be worried about CompareStringW or _wcsicmp!

    But in this case the component happened to be dealing with "ANSI" strings, so I'll temporarily reject the premise that the question is flawed. I'll get back to it in a moment.

    The question is somewhat similar to the one between _stricmp and _stricoll, with the added bonus of the confusion between what code page is used to interpret how the 256 bytes of the ANSI code page are going to be interpreted.

    In fact, to make the functions more analogous for comparison/contrast purposes, it might be better to ask about either CompareStringA with NORM_IGNORECASE vs. _stricmp_l (so you always get to specify the locale to use for determining the code page) or CompareStringA with the LOCALE_USE_CP_ACP flag vs. _stricmp (so you never get to do so).

    Maybe I should explain the situation for both of these options, now that I have brought the matter up....

    To start, when looking at CompareStringA with the LOCALE_USE_CP_ACP flag vs. _stricmp, in both cases the ANSI string is assumed to be in the "default codepage", in the former case the default system code page that is the GetLocaleInfoA with LOCALE_IDEFAULTANSICODEPAGE of the default system locale (aka language for non-Unicode programs), and in the latter case with the equivalent to the LOCALE_IDEFAULTANSICODEPAGE of the current CRT locale (retrievable via _get_current_locale).

    In the CompareStringA with NORM_IGNORECASE vs. _stricmp_l case, you specify the locale to use rather than using one controlled by settings that exist prior to the call

    Once you know how to interpret the 256 bytes, you know what the characters are and you know how to determine the "case" of the characters, if they in fact have case.

    For example, if the code page in question is Windows code page 1252, then in all of the above cases if you have the byte 0xC5 then you have the letter Å (U+00c5, aka LATIN CAPITAL LETTER A WITH RING ABOVE) which the case insensitivity of all of the functions will interpret as being the same as å (U+00e5, aka LATIN SMALL LETTER A WITH RING ABOVE), as I explain in an earlier blog (CompareString ignores case by lowercasing).

    But if on the other hand the code page in question is Windows code page 1255, then in all the above cases if you have the byte 0xC5 then you have the point ֵ (U+05b5, aka HEBREW POINT TSERE), which will not lowercase to anything, whatsoever. Hebrew letters don't even have case, so of course its accents and points and marks asnd punctuation wouldn't!

    Now this is not the same as "linguistic casing" which I first explained back in What does "linguistic casing" mean?, since in the CompareStringA case the flag is not being passed and in the _stricmp case it is never ever being passed.

    This is why even though _stricmp is not considered a function that uses locale-specific information (in conrast to _stricoll), how it actually uses some locale information, albeit indirectly and (unless you parsed the first five of the previous six paragraphs with one reading) confusingly. Which, to get back to the Unicode (CompareStringW or _wcsicmp) issue, if you keep it in Unicode then you can ignore the first five of the previous six paragraphs, which are really the most confusing parts anyway.

    This then lets you look at the real issue that the CompareStringA vs. _stricmp (and also CompareStringW vs. _wcsicmp) question was always about -- it really is a linguistic comparison vs. binary/ordinal one, which has all the issues that the whole invariant versus ordinal question dredges up, plus actually specifying any locale for comparison rather than having the single INVARIANT (yet still linguistic) choice.

    And the answer to that question is, and has always been, that depends. On what is being compared.

    This particular time, it happened to be looking at the name of a SQL Server, to see if it was the same as the computer it is running atop of.

    In most cases, this would suggest to me the need for a binary/ordinalignorecase type semantic, though there night be some edge cases (since a SQL Server is involved) that you' want SQL Server type comparisons, which would be dependent on the collation of the SQL Server, either linguistic or the SQL Server notion of binary, which has no case ignoring facility.

    Am I the only one who sees a conversation about which function to use as a very geeky, extended mapping of that Marx Brothers bit? :-)

    If I am ever at some random event and can find someone to play the Harpo role, I'd even try and script it out and perform it!


    This blog brought to you by ֵ (aka U+05b5, aka HEBREW POINT TSERE)

  • Sorting it all Out

    Inspiration, and a code chart

    • 4 Comments

    Way back in September after I did that presentation at the Internationalization and Unicode conference that I mentioned and provided the slides of in Behind the Proposed Change to Tamil in Unicode (five different ways), Scott sent me the following via the contact list:

    Michael,

    After your talk today, I was inspired to put up a Unicode syllabary chart for Tamil on the Tamil Script page in the English Wikipedia, complete with  the new Tamil named sequences from Unicode 5.1, in the hopes of building support for the current Unicode encoding model.  Anyway, you can check it out if you're curious:

    http://en.wikipedia.org/wiki/Tamil_script#Tamil_in_Unicode

    If you find anything horribly wrong, I'd be happy to fix if you let me know about it, so you won't have to violate your policy of Wikipedia non-editorship.

    I just hope this doesn't earn me death threats!  ;)

    -Scott

     I think that what Scott did here was excellent, and I did not note anything horribly wrong at all....

    And it humbles me to think that I helped inspire it.

    Because even though that is the secret hope I have for some of my talks (especially including this one), it is really awesome to see it spelled out in such a way.

    The chart he provided was similar top but not the sam as the ones I provided in Learn Tamil in 30 Days (or something like that), and help people look at Tamil in Unicode the way that they might learn Tamil, something the simple code allocation chart would never be able to do -- in its own way something Uniciode cannot do without prioviding this same crucial bit of infomation in a familar form.

    Thanks, Scott -- both for this and for supporting my non-interferece policies WRT Wikipedia! :-)

    Which reminds me that I promised to talk more about some of the issues I didn't have time to cover in the talk. I'll be sure to get on that....


    This blog brought to you by(U+0bb9, aka TAMIL LETTER HA)

  • Sorting it all Out

    UCS-2 to UTF-16, Part 6: An exercise left for whoever needs some exercise

    • 4 Comments

    Previous blogs in this series of blogs on this Blog:

    Now continuing on from prior blogs in the series, I thought I'd quote a bit from a recent email thread about a very similar issue to some of the prior discussion like especially Part 3, related to caret stops, aka the points where you can put the cursor as you navigate the string.

    The thread had wandered a bit (as threads tend to do!), and then colleague Jerry Dunietz (an architect I have worked with before on issues related to Unicode and cultures and locales and such) offered a great description of many of the issues that I have discussed here in this series. With his permission, I will quote from his response:

    Logically, a Unicode string is a sequence of UCS-4 code-points.  (I intend to carefully distinguish between “code-point” and “code-unit” in the text below.)

    There are several way to encode a Unicode string in memory.  Ignoring for today the existence of a byte-order-mark (and of byte-ordering variations), there are three ways that one could represent such a string in memory:

    As a sequence of 32-bit code-units, each representing a single UCS-4 code-point.  (UTF-32)
    As a sequence of 16-bit code-units.  Some UCS-4 code-points are represented by single code-unit, and some represented as a “surrogate pair” of two code-units.  (UTF-16.)
    As a sequence of 8-bit code-units.  The code points from U+0000 to U+007F are represented as a single code-unit, but all other code-points are represented by a longer sequence of code-units (UTF-8.)

    Now imagine that you’re a programmer, and you want to pick representation in memory for a Unicode string.  UTF-32 seems like the easiest to work with.  (But it is the fattest encoding of the three for any real-world corpus of text.)  If you want the fifth code-point in a string, you just array-index to the fifth code-unit in the string.  If seems (but wait) that if the user hits backspace after inputting a string, you just back up one-code-unit.  Is seems that if a user hits an arrow key to move a caret from one character to the next, you would just advance the caret position by one code-unit. (Given all of this apparent simplicity, there seems to be a compelling argument for defining C++’s wchar_t to be 32-bits long.)

    But it turns out that the stuff I wrote using the word “seems” is an over-simplification.  Unicode has code-points that correspond to combining characters.  Such characters combine with a previous code-point (or sequence thereof) to present to the user what appears to be a single character.  Section 2.11 of the Unicode 5.0 spec provides lots of examples of different combining characters, in different languages.  From a font-rendering glyph selection point of view, or from the point of a view of a user attempting to move a text cursor from character to character, a single glyph or character may correspond to a sequence of multiple UCS-4 code-points, and thus multiple UTF-32 code-units.  Given that such situations exists, an internationally-robust program working with a UTF-32 string must be prepared for the concept that multiple-code code-units or code-points correspond to the user’s concept of a single character.

    But if our program needs to deal with the possibility of a user’s concept of a character corresponding to multiple code-units, then the apparent advantage to the programmer of using a UTF-32 representation instead of a UTF-16 one goes away. (And given the real-world size advantages of UTF-16, it now seems that making wchar_t be 16 bits is a better choice than making it be 32-bits.)

    Whether you choose UTF-16 or UTF-32 encoding, suppose you want to spec a file format that can be easily displayed, but for which a viewer program can easily support text selection.  You could ask the viewer program to build in lots of smarts to determine where a logical caret-stop can occur.  Or you can ask the program that built the file to encode the caret-stop information within the file itself, reducing the burden on the viewing program.

    This is really a great summary of the issues surrounding the UTF-16 vs. UTF-32 debate, as well as the whole UCS-2 vs. UTF-16 one I have been covering already. I can't really take full credit for his knowledge in this area since I was just one of several sources, but Iike now I'm in there somewhere and it is always good to know when one has been helpful in influencing an influencer (which indirectly puts me in mind of both the influence vs. impact issue and the fact that Ms. Phair might be wrong about the influence of an Ant in Alaska, sometimes!).

    Now as to whether storing the information about the carets stops explicitly versus calculating them via a StringInfo-type technique (StringInfo is something I have talked about previously, in blogs like this one) is an interesting one.

    Really in the end it depends on whether you think the pure "derived from Unicode data" answer is sufficient or whether you have additional sources of information. In the particular case Jerry was thinking of, they rely on much more sophisticated methods, such that constantly calculating on the fly could really impact performance. Since the data itself in his case is read-only there was never worry about the potential need for recalculation so the calc-once obnly and store the information was a no-brainer for them.

    But even in a read-write situation, providing a sorted array of indexes for easy enumeration that allows for

    • enumeration in either direction (for cursor movement and selection operations), and
    • insertion (for the insertion of text), and
    • deletion (for the deletion of text), and
    • an initial population that will be already sorted (for a string with no data that needs to build up its initial cache)

    is really just an ordinary interview question in disguise, and one that is pretty easily solved, too. Were it even vaguely "international" I'd say we should solve it here, but it's not so I'll leave that as an exercise for whoever feels that they need the exercise. :-)

    Though of course I'll point a few issues to keep in mind for anyone who would want to implement such a cache....

    Obviously one has to be ready to re-order the items after the point of change for the insertion/deletion case. this could be expensive, depending on where the action is happening and how mush text follows it (when to move to slightly more complex schemes then the doubly linked list that the initial problem suggests is also left as an exercise!).

    But less obviously, in the area immediately preceding and following the insertion point, one can potentially need to recalculate caret stops due to changes in the text. For example if one has the letters "abcdef" then one will have seven indices (covering the points before each caret stop as one moves across the string, plus one for the end):

    {0, 1, 2, 3, 4, 5, 6}

    Then if one decides to add a combining umlaut after the initial "e", the new string is "abcdëf" and the indices would now have to be:

    {0, 1, 2, 3, 4, 6, 7}

    and so on. And in other cases a formerly combining character before or after might now not be....

    From there the exact methods used for calculation of caret stops and how to integrate them together comes into play, as does how the different methods [might] interact. It really can become a rather fascinating technical question, in the end. Though mostly not for this blog, except more on the various methods eventually. :-)

    Getting back to the series for a second, there are a few points left to cover, such as the actual means of support and then the whole UTF-8 question (which is still fair game here!).

    I'll cover these in upcoming blogs....


    This blog brought to you by(U+17bf, aka KHMER VOWEL SIGN YA)

  • Sorting it all Out

    No need to throw out the baby with the streamwriter; they probably could have just put in a replacement

    • 3 Comments

    So anyway, Kim's other recent blog, entitled Making a StreamWriter usable even after given garbage characters, highlights an interesting difference some of the methodology between the way that Windows and .Net handle encoding and codepages.

    In Windows (in contrast to the behavior of most NLS API functions, as I have mentioned previously), the WideCharToMultiByte and MultiByteToWideChar functions will use the target buffer up until the point of failure, so that in the case of failure you may be able to do something with the partial results.

    Now without a length indication the options of what can be done are more limited, but if nothing else then at least subsequent calls will not be affected by their predecessors.

    .Net, on the other hand, has a default behavior here when you write to the stream that causes the StreamWriter to be useless.

    The description in Kim's blog did not fully explain the problem, so I'll fill in the blank to it. :-)

    She said:

    For example, on an attempt to write U+DFC9, which is only half of a Unicode character (not a complete surrogate pair) an EncoderFallbackException was thrown

    Now we have a stream here, so why is the stiry iver? Isn't the point of the stream thing that you can do it in chunks? Why would this be unrecoverable?

    Well, the problem is that U+dfc9 is a low surogate.

    See The basics of supplementary for a glossary update here!

    As I mention in Why do the high surrogates have the low numbers? and other places, a surrogate pair is a high surrogate followed by a low surrogate.

    A lone high surrogate is recoverable because it is incomplete.

    But a lone low surrogate with no preceding high surrogate has no place to go, nothing to do -- it is toast unless you have a fallback plan in place, as Kim mentioned.

    Though to be perfectly honest, after situations like that described in The torrents of U+fffd, I would much rather have had the default fallback plan be the U+fffd insertion.

    I'm not a fan of the whole U+fffd thing, as I pointed out many times before. But given the huge push to change behavior from "drop illegal sequences" to "replace illegal sequences with the replacement character", I think behavior that did not throw in this case would have made for a better default....

    And yes, I know there is a backcompat question here for the behavior, but since behavior was being changed anyway in this "in a service pack" change, there was a good opportunity to take a hard look at changing that default (since even already compiled applications were going to change their behavior!

     

    This post brought to you by (U+fffd, a.k.a. REPLACEMENT CHARACTER)

  • Sorting it all Out

    If you say I'll PASS, that means you'll be there!

    • 3 Comments

    Warning: Excessive PASS puns follow; if you don't like that sort of thing, then do not PASS...

    It was one of those fun conversations that I find myself in from time to time.

    Over IM, I was talking to my friend Rachel (a very smart ASPNET developer I know who rather delightfully has a properly spelled last name Appel rather than those more fruity types of names out there, if you know what I mean).

    We were just kind of PASSing the time.

    I asked her (in PASSing) whether she was going to PASS (I was referring to the upcoming conference put on by the PASS, the Professional Association for SQL Server).

    She told me that due to the dates in question, she was pretty sure she'd have to PASS this time.

    I explained that I was given a PASS for PASS, working in the Ask the Experts area.

    She couldn't PASS up the opportunity to ask, "When is PASS?"

    The dates were, I explained, "The PASS Summit is November 18th-21st in Seattle."

    "Oh, I'll definitely have to PASS, I have client work I have to do then."

    "You're going to PASS on PASS?" I asked.

    "I think i'll have to PASS on PASS, even if you could get me a PASS to PASS."

    "Bummer. I probably shouldn't try to get you a PASS to PASS anyway. People might think it inappropriate -- like I was trying to make a PASS at you or something!"

    "Nash, they know me. I could probably get my own PASS to PASS, if I didn't have to PASS on PASS. I do have that client work to do."

    "Yeah, plus those boarding PASSes don't pay for themselves."

    "That too."

    "Okay, as excuses go it is a pretty good one. I'll let it PASS."

    It went on a bit longer with PASS puns, though we never made it to the NONE SHALL PASS scene from Monty Python and the Holy Grail, though I suspect that was because I didn't think of it until later.

    Though as a point of fact, unlike the black knight, way over 3000 people will PASS, to go to PASS -- and not PASS on PASS.

    It is an awesome SQL Server conference and I promise that I have all of the PASS puns out of my system now. I promise. :-)

    If you will be there, be sure to look for me on my iBOT with the I'm a PC stickers on each side....

     

    This blog brought to you by(U+2391, aka PASSIVE-PULL-DOWN-OUTPUT SYMBOL)

Page 1 of 3 (33 items) 123