Blog - Title

November, 2008

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    If you say I'll PASS, that means you'll be there!

    • 3 Comments

    Warning: Excessive PASS puns follow; if you don't like that sort of thing, then do not PASS...

    It was one of those fun conversations that I find myself in from time to time.

    Over IM, I was talking to my friend Rachel (a very smart ASPNET developer I know who rather delightfully has a properly spelled last name Appel rather than those more fruity types of names out there, if you know what I mean).

    We were just kind of PASSing the time.

    I asked her (in PASSing) whether she was going to PASS (I was referring to the upcoming conference put on by the PASS, the Professional Association for SQL Server).

    She told me that due to the dates in question, she was pretty sure she'd have to PASS this time.

    I explained that I was given a PASS for PASS, working in the Ask the Experts area.

    She couldn't PASS up the opportunity to ask, "When is PASS?"

    The dates were, I explained, "The PASS Summit is November 18th-21st in Seattle."

    "Oh, I'll definitely have to PASS, I have client work I have to do then."

    "You're going to PASS on PASS?" I asked.

    "I think i'll have to PASS on PASS, even if you could get me a PASS to PASS."

    "Bummer. I probably shouldn't try to get you a PASS to PASS anyway. People might think it inappropriate -- like I was trying to make a PASS at you or something!"

    "Nash, they know me. I could probably get my own PASS to PASS, if I didn't have to PASS on PASS. I do have that client work to do."

    "Yeah, plus those boarding PASSes don't pay for themselves."

    "That too."

    "Okay, as excuses go it is a pretty good one. I'll let it PASS."

    It went on a bit longer with PASS puns, though we never made it to the NONE SHALL PASS scene from Monty Python and the Holy Grail, though I suspect that was because I didn't think of it until later.

    Though as a point of fact, unlike the black knight, way over 3000 people will PASS, to go to PASS -- and not PASS on PASS.

    It is an awesome SQL Server conference and I promise that I have all of the PASS puns out of my system now. I promise. :-)

    If you will be there, be sure to look for me on my iBOT with the I'm a PC stickers on each side....

     

    This blog brought to you by(U+2391, aka PASSIVE-PULL-DOWN-OUTPUT SYMBOL)

  • Sorting it all Out

    Apocalypse Font (aka Guess they must have picked the wrong eight characters.)

    • 13 Comments

    The title of this blog is an allusion to Coppola's Apocalypse Now, and eventually I'll be quoting a bit of the Herr-provided narration (those are the pieces Martin Sheen read)...

    It all started with a seemingly innocent question the other day. It went something like this (product and component names removed to protect whatever might deserve protecting):

    We are hitting an issue where surrogate pair characters do not display correctly on localized builds, but display correctly on English builds. This appears to be because the MS UI Gothic font used in the localized builds doesn’t “automatically” do the correct font linking.  (This can be verified by e.g. opening Wordpad, setting the font to MS UI Gothic, and typing some surrogate pair characters—you just get squares.  If the font is something else, e.g. Arial, the font linking works correctly.)

    Is this a known issue with the MS UI Gothic font face? We are currently using one function to obtain the desired font face. Should we be calling a different function instead of this, or in addition to this?

    Now as it turns out, there were several different issues going on here.

    It start with the involvement of GDI font linking and Uniscribe font fallback, discussed previously in blogs like Font Linking vs. Font Fallback.

    First and foremost was the fact that this was what they call a tester scenario. Because of this,the actual supplementary, CJK Extension B characters in question were not ones that are in any version of JIS (including the latest JIS X 213), which is why they were seeing notdef glyphs (aka square boxes).

    Uniscribe largely stays out of the world of CJK (Chinese, Japanese, and Korean) text, allowing GDI font linking to so most of the work here. Usually this will guarantee that some ideograph will make an appearance, because as long as it is in one of those core CJK fonts, it will be on the screen.

    But there is one time when Uniscribe is completely involved and GDI font linking is not -- and that is supplementary characters.

    And Uniscribe is not quite as sophisticated in its efforts here -- it will see if the current font claims to support the Unicode supplementary ideographic plane (which contains e.g. CJK Extension B). If it does then the font will be used, even if there turn out to be some missing characters.

    For the Japanese fonts, such as MS Gothic:

    MS Gothic

    and MS PGothic:

    MS PGothic

    and MS Mincho:

    MS Mincho

    and MS PMincho:

    MS PMincho

    and Meiryo:

    Meiryo

    each font is actually pretty much limited to the 300-some CJK Extension B characters in JIS X 213.

    If you pick one of these fonts to display any other random Extension B ideograph, then you will get a square box.

    And if you pick a font with no Extension B support at all, then it will pick one font to look in, based on its algorithm and system locale settings -- thus if you choose Arial or Tahoma or Microsoft Sans Serif or Segoe UI, then you will possibly also get an ideograph!

    Korean does not have Extension B in any of its fonts.Given the gemneral tendency toward de-emphasis of Hanja in South Korea and the virtual illegality of it in North Korea, this is hardly a surprise (though this could change in the future if the customer demand drives change here).

    And for the most part Chinese has the widest support. Because whether one uses the Simplified Chinese SimSun-ExtB font:

    SimSun-ExtB

    or the Taiwanese style Traditional Chinese font MingLiU-ExtB:

    MingLiU-ExtB

    or the Taiwanese style Traditional Chinese font PMingLiU-ExtB:

    PMingLiU-ExtB

    or the Hong Kong style Traditional Chinese font MingLiU_HKSCS-ExtB:

    MingLiU_HKSCS-ExtB

    one has a much larger number of ideographs to choose from.

    The ranges are of course based on preferred glyphs in the PRC GB18030, Taiwan CNS11643, and Hong Kong HKSCS standards, respectively -- kind of the ultimate exercise of using a code page as a repertoire fence (something I have discussed before).

    But the bug did not quite end there.

    You seem it seems that the application had its own custom font choosing behavior, which in this case happened to be preferring the newer ClearType Simplified Chinese Microsoft YaHei font.

    A font that also has some Extension B in it.

    Eight CJK Extension B Ideographs, in fact:

    Microsoft YaHei

    These eight ideographs are:

    So far, these eight characters as a set seem to have no special relationship in China, Taiwan, Hong Kong, Macao, Singapore, Japan, Korea, or Vietnam, those being the major places where ideographs either are in use or have been within the last 1000 years.

    If the characters spelled something special, I'd assume it was some kind of Easter Egg in the font (imagine the challenge if coming up with such an egg that relied on eight Unicode characters displayed in code point order -- talk about a fun word challenge in any language!

    I am reminded of a bit from Apocalypse Now where Martin Sheen describes a report about Col Kurtz. Specially modified for the current situation, for the conspiracy theory minded:

    Late Summer-Fall 2008:
    The proper glyphs for ideographic text in the supplementary
    planes show up fine in Vista. Then in November in one font
    is noted the presence of eight specific ideographs. Two of
    them are in JIS X 213, three are from a list of Hong Kong
    Cantonese, one is from some from China. T
    he number of
    Extension B ideographs visible in the application in China
    drops off to nothing.
    Guess they must have picked the
    wrong eight characters.

    Kind of a stretch obviously. But still fun to write (had I time to really draw this one out it would have been as much fun in my opinion as that Matrix one!

    Whatever the reasons, their presence (due to the Uniscribe design here) can really break Extension B display support if someone is using the cool font with ClearType support.

    If I had to guess, I'd wonder whether they were in there as part of an experimental effort at looking at ClearType Extension B support that just never got taken out (why would they? It's not like they are wrong, except in the meta sense of their effect!). But again that is just a guess. Probably more likely than my Apocalypse Font scenario above! ;-)

    An interesting situation, in any case....

     

    This blog brought to you by 𠂇 𠂉 𠃌 𠦝 𡗗 𢦏 𤇾 𧾷 (U+20087, U+20089, U+200cc, U+2099d, U+215d7, U+2298f, U+241fe, and U+27fb7)

  • Sorting it all Out

    I think we're taking the wrong approach, mostly

    • 5 Comments

    In the past, I've done a lot of presentations on globalization and localizability issues.

    In different companies where I was brought in to do this, they were very well received, because generally a company is being asked to do the work to support another language and the people being trained found themselves quite hungry for the info.

    But when it came to conferences, most of the positive feedback went along the lines of "very interesting presentation" but in the checkbox for whether it was useful for their immediate work, often they'd say no. Because if a company is spending thousands on a conference, they don't usually have such a focused requirement. If the person even signed up for the talk, it was either curiosity and thought they'd see me cause trouble or maybe they'd heard of me or whatever.

    There are exceptions to this, like the Internationalization and Unicode Conference. But this only proves my point -- you could probably fit over 30 IUCs in a TechEd or a PDC. So you end up with the very small number of people being sent to a specialized conference, often with a generic requirement of "we need to support GB 18030" or "we have to do Japanese" or whatever.

    When it comes to Unicode support, NT shipped well over 10 years ago and quite a few applications out there still don't support it. Slides like

    Language Matters! 

    are only interesting for shock value -- they aren't going to convince anyone who isn't already convinced, and looking their own presentation to justify it to upper management.

    Because how many companies are thinking of shipping their software overseas and not just shipping it as is?

    MUI is a cool technology, but it is not of general interest to anyone other than people trying to build in-box drivers in windows who are told they must support it by contract. People are given an assignment to ship the product in Japan; they don't wake up and say "we should support 10 languages in a switchable fashion" for the hell of it and then the sales people cn figure out what to do with it."

    The flaw is that by trying to get people interested based on some nebulous notion of "best practices" it is hard to get people interested.

    Best practices for globalization? Manuel Garcia O'Kelley Davis would say null program.

    But when asked to do presentations on security issues with string comparisons or the consequences of  user settings breaking applications, I often get a lot more interest.

    People care a lot more about consequences than they do about nebulous features (since selling software in another country is a lot more complicated than just these issues -- there are legal issues that by themselves would block most people from ever even considering it).

    I mean, a lot of the PC game industry and a lot of the driver industry "supports" Unicode. 

    I put supports in air quotes because they may not be and in many cases probably aren't doling anything outside of the ASCII range.

    But they support Unicode because the OS underneath them does and they want to avoid the extra OS conversions.

    I guess what I'm saying as is that we have to stop trying to appeal to "best practices" or the miracle of dynamically supporting UI in 40+ languages.We need to focus on:

    • the bugs that break their own code as it is;
    • security issues caused by their code as it is;
    • performance issues in their code as it is.

    People care about those kinds of issues a lot more than they care about good globalization or localizability.

    Unless they are are already in those markets or want to be acquired by a company who is that pays attention to whether this work is already done, of course!

    Now I have a couple of regular readers here.

    But mot of the traffic comes from people searching for information on bugs or issue hitting them now.

    All of this is why I think we're mostly taking th wrong approach....

     

    This blog brought to you by ? (U+003f, aka QUESTION MARK)

  • Sorting it all Out

    If he don't like it, or me, then why the hell is he here exactly?

    • 8 Comments

    This blog is as off topic as you can get without a prescription from your doctor....

    Sometimes when one doesn't get the answer one wants, one can feel somewhat bitter about that fact.

    Technical problems with computers can cause a person to be particularly susceptible to that kind of reaction, actually.

    Though there can also be more to it, sometimes.

    Case in point: a response to a blog from almost two years ago Vista turns on everything, which explains how you can't turn of the Text Services Framework anymore, like you could in the old days of prior versions.

    Admittedly not great news for people who wanted to turn it off for application compatibility reasons.

    Anyway, the response that Luke sent on (with no return address):

    Useless as ever. You are nothing but a fool. This post is less useful than a broken key. I come here wanting to learn how to turn off advanced text services, and you take up several paragraphs to say "You can't". Don't ever attempt at helping anyone you useless 9 year old.

    There is something particularly hateful about these words that really gives me pause.

    It could be simple frustration leading to an emotional over-reaction, one that the seemingly anonymous nature of the Internet only encourages.

    And some of the words such as "I come here wanting to learn how to turn off advanced text services" though clearly the title doesn't even suggest that the blog is about Advanced Text Services at all. tend to clearly suggest a man who found Vista turns on everything via a Google search (as I have to admit so many do).

    And someone who found the blog by searching specifically for how to turn off TSF in Vista who read the whole post might get very frustrated about the "waste of time" and all.

    And the conclusion of the comment (Don't ever attempt at helping anyone you useless 9 year old) certainly does display a certain amount of impatience and frustration. The kind that makes people lash out in perhaps strange ways that seem vaguely inappropriate.

    Though there is something else in those words and others like "you are nothing but a fool", something that does not fit the picture -- it is not just the one blog that has Luke so unhappy with me. There really is something more going on here, running much deeper than a momentary frustration at not solving one single problem.

    And then the initial bit of the comment (i.e. useless as ever) really doesn't seem to match here either, and makes no sense in the context of someone who had never been here before and (after the mistake of visiting the one time) would never visit again.

    This is someone who doesn't like me, or maybe my online "persona", or maybe after having met me in person. Someone who just really finds no use for me whatsoever.

    It's funny, I think that some of the people who hate me the most spend more time dissecting my words for inappropriate meanings to prove their beliefs than the people who are actually fans. This Luke may be one of them, one of the people who just really doesn't care for the taste of my brand of chai.

    I used to talk with my friend Liz about this, and I have talked about it with Andrea too - in fact I've had this conversation with both of them long before I even had a Blog, nay before Blog was even a word. And both of them have pointed out that if I wanted to reverse everything I could, but that I speak with a very distinctive voice and would probably have a very hard time changing that since it mirrors the way I think about things.

    I gave up three decades ago (in the third grade) trying to please everybody, and have never had cause to think I made the wrong decision back then.

    The Blog is perhaps a megaphone, but not one that is changing what I say or how I say it all that much. I use it (and occasionally even abuse it!) in the same way that I would have done in any book or website or email or newsgroup post or presentation or conversation. I can name both people I have maddeningly frustrated and people I have ecstatically delighted. And I think I do serve a "net positive" purpose with what I do -- for myself, for my group, for Windows, for Microsoft, etc.

    And of course you do have a choice here -- you could just not read me if you don't like me, or what I say, or both.

    So Luke (or whatever your name actually is), if you want to come out from where you are hiding and tell me what your actual concerns with me are then I'd be happy to hear them or even discuss them. Or if you'd rather hide grudges or hatreds behind anonymous venomous messages then I suppose that is okay too.

    Though the likelihood of having either influence or impact is much greater in the former approach than the latter. A friendly suggestion. :-)

     

    This blog no sponsor, just as this sentence no verb.

  • Sorting it all Out

    This is not yet my take on DirectWrite

    • 2 Comments

    More news out of the PDC. :-)

    I had a few people point out after they saw the talk I pointed to in From ____ to ____ to MUI to ELS -- World Ready @ the PDC! (the one that I liked the content but didn't care for the title, and I though could have used more material) the one called Windows 7: Writing World-Ready Applications, another very interesting talk.

    One called Windows 7: Introducing Direct2D and DirectWrite, a presentation you can see right here.

    Now people who would talk to me aren't gonna go ga-ga about Direct2D in front of me, sinc they know that it is not the sort of thing that impresses me.

    On a good day I'll nod and smile and then move on to the next thing.

    On a bad day I'll snarl out something deprecating, like they sell that crap at the airport before moving on to something I find more interesting.

    A man who lives so much of his life in source code and DOS (well, CMD) prompts is just not gonna always get excited every time you have a new faster way to do cool graphics.

    Nothing personal, but I am not their target customer, and they kind of do sell that crap at the airport, from my point of view. :-)

    The people who pointed out the talk to me, they were talking about DirectWrite.

    They don't sell that crap at the airport....

    This is something I do plan to talk about more at some point, though most likely not so much right away since no one has really had a chance to look at it yet.

    For now, I'm going to be cautiously optimistic and withhold any other judgment.

    I have been burned getting too enthused about this kind of thing before. In retrospect, I found reasons to be less excited, as I have pointed out in stuff on text stacks in Khmer, and I'll tell you about all the text stacks, and the recent Silverlight as Esau: selling its implementation for a pot of interface.

    The Khmer thing coming up in both of them is coincidence; I'm talking about deeper issues.

    And when I do get around to assessing DirectWrite, it will be on the basis of my core issues that you have heard me blog me about in the past:

    • CJK text handling
    • vertical text
    • script coverage -- both breadth and depth
    • complex script support
    • ease of use
    • interoperability issues
    • parity or the lack thereof
    • the generic question about yet another entry in the collection of text stacks
    • what's missing -- what scripts, what languages, what scenarios
    • the downlevel story
    • the customers
    • the standards/Unicode/community story - de facto and de jure conformance and leadership
    • the managed/native split

    All of these issues are huge in my mind, and not very many of them make sense to beat up a pre-beta for, and not that many would be top of my list in a beta, either.

    But I'll talk about them. and how issues I have talked about before stack up against the new kid on the block.


    This blog has no Unicode sponsor yet, but lots of characters are jockeying for position for the upcoming ones....

  • Sorting it all Out

    From I SCOOT to IBOT, #5 of ?? (sometimes it is phase 3 that is ????)

    • 12 Comments

    Warning: although slightly technical, this blog is mostly non-technical, and/or technical about stuff related to the iBOT. If the technical issues related to SQL Server and/or PASS interest you then they will probably show up in future blogs...

    Prior blogs in the series here and here and here and here.

    It's kind of funny.

    I just spent last week at SQLPASS 2008, and had a really great time. I was there as an "Expert" in the ATE (Ask the Expert) area, and I talked officially about migration, though I got a bunch of upgrade questions since customers don't tend to distinguish that much between migration (moving from any non-SQL Server database to SQL Server) and upgrade (moving from an earlier version of SQL Server to a later one).

    Plus I got many questions about Unicode (especially both UTF-16 and UTF-8), collation support in 2000/2005/2008, and also rich text support in SQL Server reporting services (drawing some on my typography knowledge, and my work with the RS team). It was really a lot of great conversations with customers and colleagues and partners and friends.

    Did I mention that I had a great time? Well, I'll say it again. I had a great time.

    And not just because I got to see friends and colleagues from years prior and find out what they are up to like Debra Dove (now a Group Program Manager!) and Tom Casey (now a General Manager!) and a whole bunch of others too numerous to mention, not to mention all the people I met in SQL Server marketing. I mean that was very cool, but that wasn't it.

    And not just because in my heart of hearts I'm still a databases person, just like I was way back almost in the beginning for me when Ashton-Tate's DBase II would spit out the "30 days hath September" rhyme when you gave it an out-of-range date (the first software "Easter Egg" I ever remember finding, running on an Osborne 1!). That was cool too, but that wasn't it.

    And also not just because the evening parties and after-parties are superior to any other conference I've been to, though they are (it just seems like SQL folks work hard and know when to play harder, and especially when to not talk about work). This is one of the many reasons I have not been blogging; to be honest I wasn't even sleeping much last week. Every developer should go to a few database conferences like SQLPASS. :-)

    The cool part I am referring to isn't that, either.

    As you can guess from the blog title, it was really because this was my first technical conference with the iBOT.

    I won't say that I was nervous, exactly. Though I admit I took the charger since I did not want to be stuck in Seattle with no way to get home!

    As one of the "Elite 300" ATE folks, I had $25,000 in SQL Bucks to give away to people who asked me questions. But I realized early on that iBOT questions couldn't count -- if they did I would have required a bailout from the program organizers since I'd otherwise be broke within half a day!

    To be fair there were a lot of people I talked to about SQL Server issues, too (as I mentioned before).

    But even some of those conversations started with a few iBOT questions -- people were just interested and curious....

    There was even one conversation with some of the WIT (Women in Technology) volunteers about the predictable way that iBOT questions tended to split on gender lines even when they were kind of the same question (e.g. "how does that chair stay upright?" from men versus "how do you keep from falling out?" from women).

    I was even prepared now, armed with the answer to the ultimate question, the one John McConnell (yes, the one who talked about When will we support Rongo-Rongo) asked weeks before, at a group mixer.

    You see, the most common question people would ask is some variation of "How is that thing balancing?"

    My iBOT shtick eventually developed more fully as the week went on, some based on the boilerplate in the iBOT FAQ's How does the iBOT® Mobility System work?:

    A revolutionary mobility system from Independence Technology, L.L.C. the iBOT® Mobility System utilizes our patented iBALANCE® Technology. It is custom-programmed and calibrated to the owner’s center of gravity. Reach forward in Balance Function to shake hands, and your iBOT® 4000 Mobility System moves with you. Lean back and it moves with you as well. It is subtle and responsive in a way no other mobility device can match.

    but it always eventually made it to the marketing point, about how the iBOT, unlike the Segway, uses six gyroscopes to do its work.

    This always impressed people though I suspect that was because mainly they don't really know much about gyroscopes. So there was no follow-up question along those lines.

    But John is one of those really smart guys who does know about things like gyroscopes, so he asked the very reasonable next question -- how does it use six gyroscopes?

    Now I even do know a bit about gyroscopes, but it had never occurred to me think of that next question, let alone ask it.

    To John I had to admit I never asked, but that I would find out.

    So I asked the iBOT folks.

    Perhaps not entirely surprisingly, the front line of the IBOT's helpdesk didn't know either. :-)

    But they tracked down a better answer, which I will give in a moment.

    It's funny (minor segue for a moment), over the course of the week at SQLPASS, I was reminded of the Underpants Gnomes from South Park, and their three step business plan:

    • Phase 1: Collect underpants
    • Phase 2: ????
    • Phase 3: Profit!

    The thing I noticed over the course of the week was that in the old days there would be some women who would be waving to me, looking at me with interest. They would almost invariably be looking at someone behind me or nearby, rather than at me.

    Now with the iBOT, they actually were looking at me. Well, technically they were looking at my ride, my iBOT, but they still wanted to talk about it. But at least they weren't looking at the person behind me, right? :-)

    Of course while it clearly appears that this "second phase" represented some form of progress, I really didn't know how to get to the third phase.

    In part because I'm not sure what it would be (I suppose having them being interested in me? I'm not sure!), but it still is easy to put it in phases and be curious about how to get to the third phase!

    Anyway, that answer.

    First, I'll borrow an image from Wikimedia to define Pitch, Roll, and Yaw in much less time than a verbal description ever would:

    Pitch, roll, and yaw... 

    If you absolutely must have words then the Wikipedia Flight Dynamics article can probably help...

    It has those other pictures that can help for people with the concepts, like these:

     

    Pitch
    Yaw
    Roll

    Now the gyroscopes on the iBOT are there to look at the movement of the chair, measure these forces.

    The computers in the iBOT generally work under the principle of three independent subsystems so that if any one of them is getting different results then what one would expect, you will either see corrections being made or end up in a warning state if none can be made. But that does not mean that the gyroscopes are all there to provide nothing but redundancy.

    Like with the Segway, They are laid out a bit more cleverly than that....

    Three of them are placed on the main axis of the chair, in a straight line from front to back.

    And the other three are off axis at various strategic places to handle detection of various other kinds of movement.

    Combined with the data of the chair's mode and factors like the acceleration and turning being applied, the computer system can determine three things:

    • where it is;
    • where it is moving;
    • where it is trying to move.

    With that information, it can also know if its progress is being blocked or hindered or accelerated in some way, so it can use its internal governor to speed up or slow down, to correct an incorrect situation, or to trigger a fault condition if it cannot recover from a problem without user assistance.

    The gyroscopes themselves are used to work together to provide the data rather than just having two items trying to get the same answer redundantly to re-check the answer -- because if the computer knows by how much the two gyroscopes should be different, then it can know when something isn't right. The combination is described on many sites online, like this one that talks about the Segway but which partly applies here too (the main difference being that the iBOT is not trying to get data on steering from the movements of the passenger, it is trying to get data to keep the passenger upright, in an inverted pendulum kind of system.

    Other sites like this one also have other somewhat useful descriptions:

    Like the Segway HT, the iBOT contains patented dynamic stabilization (iBALANCE) technology, an integrated combination of sensor and software components and multiple computers that work in conjunction with gyroscopes. Gyroscopes are motion sensors that help maintain balance. When the gyroscopes sense movement, a signal is sent to the computers. The computers process the information and tell the motors how to move the wheels to maintain stability. This electronic balance system is custom-programmed to the user's center of gravity, to monitor and respond to subtle changes in motion. Reach forward to shake hands, and the iBOT moves with you. Lean back and it moves away as well. The iBOT constantly realigns and adjusts its wheel position and seat orientation to keep the user upright and stable at all times, even when driving up and down curbs or inclines. In addition, the iBOT includes built-in triple redundant backup systems, as well as auditory and visual signals to provide even more safety and assurance. With input from the rider or an assistant, in "Stair Function" the iBOT utilizes gyroscopes and adjusts to the driver's center of gravity, climbing stairs by rotating wheels up and over each other. The iBOT can allow riders to stand up to the same eye-level as colleagues. The "Balance Function" of the iBOT can raise the rider to eye level for any number of business or social interactions. It lets the rider see over counters, and reach a high shelf in the office, kitchen or supermarket, safely and easily.

    And this page from Silicon Sensing, one of the component providers, gives some more info:

    The Segway PT is instantly recognisable across the world as a unique and alternative means of transport. Its press launch in December 2001 attracted enormous attention. But, by the time of its launch, Silicon Sensing had been working with the inventor of the Segway PT – Dean Kamen – for several years helping to develop and deliver its key design element – the balancing technology. This close relationship stemmed from our role in providing the balancing gyros for the IBOT®, the equally novel balancing wheelchair, from the same inventor.

    Technically speaking, the Segway design is classic implementation of the 'inverted pendulum control theory' – balancing a broomstick on your fingertip is another example of the same thing. But to enable an automatically-balancing system based on this theory demands the availability of sensing, processing and actuation, all of which are fast and accurate enough. And for a commercially-viable product to emerge, this further demands the availability of these technologies at affordable prices, with sufficient robustness and reliability, and being of a suitable size.  The overall system concept demanded that the Segway PT could always continue to balance if a component fails, whilst providing alarms and reversionary action to ensure that the rider is able to dismount safely.

    Being involved from the very early days, Silicon Sensing were able to propose and develop an innovative design, to be called the Balance Sensor Assembly, in which the size, reliability and affordability criteria were met through use of our VSG3-based silicon MEMS gyro technology. A key requirement was at least dual redundancy in balance sensing – and the desire for triple redundancy in at least the pitch axis. Although not immediately obvious, the other two axes of yaw and roll also required to be sensed for the situation in which the Segway PT is balancing on a slope.

    The resulting solution is ingenious. Rather than providing dual and triple redundancy on each axis separately, the gyros are set at angles such that, by applying trigonometry to any pair of gyros, it is possible to deduce pure pitch, roll or yaw in more than one way. In summary, the solution provides three ways of measuring pitch and two each of measuring yaw and roll. To complete the module, two dual-axis liquid tilt sensors are included which sense the true 'down' direction and thus the pitch and roll angles. Processing within the BSA – again duplicated both electrically and physically – continuously checks the sensor data and monitors for any failures.

    And then this document from BAE Systems, another one of the component providers, gives some more technical info on the nature of how the results of multiple gyroscopes are combined:

    Using Maths to support design engineering
    Ideas from maths are also important in engineering. The Segway HT makes use of a simple but ingenious bit of maths to reduce the number of silicon sensors used in the balance sensor assembly (BSA).

    Directions of motion
    There are three kinds of rotational motion that the Segway can experience; pitch, roll and yaw. These can be detected by a gyroscopic sensor.

    Pitch Stand up and lean forwards and backwards
    Roll Stand up and lean from side to side
    Yaw While standing upright turn from your left to your right

    For safety reasons each direction of motion needs to have two sensors, apart from pitch (the motion used to control the Segway HT) which needs three independent sensors. So, seven (rather expensive) sensors in all should be needed.

    Geometry to the rescue
    Cleverly, the BSA has one independent sensor (sensor 1) measuring just pitch and a set of four other sensors, angled such that each has two jobs. Sensors 2 and 3 both measure pitch AND roll. These are physically arranged so that positive pitch motion will cause both sensors to give a positive output signal. A positive roll motion will reduce the signal from sensor 2 and increase the signal from sensor 3.

    Sensor 2 measures pitch minus roll. We can write this as Sensor 2 = P – R.
    Sensor 3 measures pitch plus roll. We can write this as Sensor 3 = P + R

    If we add these two equations together like this:

    Sensor 2 = P – R
    plus
    Sensor 3 = P + R
    Sensor 2 + Sensor 3 = 2P (the +R and the -R cancel each other out)

    If we subtract these two equations from one another like this:

    Sensor 2 = P – R
    minus
    Sensor 3 = P + R
    Sensor 2 – Sensor 3 = P – R – (P + R)
    Sensor 2 – Sensor 3 = -2R (the +P and –P cancel each other out).

    The steering is done via a Joystick, and its way of doing the steering combines the way a boat would be directed (for the lateral directions) with forward and backward movement handled via forward and back in the joystick in a way that does not exactly match any that I have seen (usually this is handled elsewhere, in a throttle), thus leading to the not entirely intuitive backward movement that I am still learning.

    Usually I rely on the zero radius turns so I can just go forward; this is definitely slowing down my "backwards" learning. :-)

    Anyone want to take a guess on the difference between the Segway and the iBOT, with the former needing five gyroscopes and the latter needing six, in particular tp how the slightly changed mission of the one unit changes the way the gyroscopes are laid out? I give some of the answer away above in the Segway descriptions and the requirements of the two different units but not all of it.

    Anyway, no one ever asked that question John did, which in truth seems like a very reasonable question to ask in response to the six gyroscopes answer. No one else has yet managed to get that far....


    This post brought to you by (U+267f, a.k.a. WHEELCHAIR SYMBOL)

  • Sorting it all Out

    Same shit, different decade/century/millennium

    • 2 Comments

    Lest you have any doubts, I speak here for myself and only for myself, not for Microsoft or for any person, group, or division within Microsoft. This statement is so simple that anyone can get it, right?

    Now although I work for Microsoft, for everything I am about to discuss I am just a user of the technologies, not someone who even knows who owns it or works on it. So for this particular blog within the Blog, think of me as an outsider....

    At the end of last month I got a mail that many other people got as well. The mail went:



    Dear MSN Groups Customer,

    As a valued MSN Groups or MSN Communities Web Folders customer, we want to notify you that the MSN Groups service will close on February 21, 2009 and you will have the opportunity to move your group to our new partner service, Multiply. We understand the importance of keeping your group together, so we partnered with Multiply to create a migration process that moves your group to their service to preserve your online community and its history. Read on to find out about how to kick off the automatic migration of your group to Multiply.

    We realise this may be unexpected, so before presenting your options we want to briefly share why we've made this decision.

    Why?
    Because we are dedicated to providing our customers with the most current and user friendly technology available today we made the difficult decision to close the MSN Groups service. This decision is part of an overall investment to update and re-align our online services with Windows Live. In the long term we believe that closing the service is the best way to continue to offer innovative and effective services that help you stay in touch with the people you care about. We plan to launch a new Groups service in the coming weeks, but unlike MSN Groups, Windows Live Groups will focus on offering a place for small groups to collaborate. Multiply is available now, making it your best option today for continuing to share and communicate together online.

    Options for moving your group to a new service
    We've listed some options and resources below to help you decide what to do with your group.

    • Option 1: Automatically move your group and its data. We have established a partnership with Multiply, an online group and media sharing service so our users can choose to migrate their group to Multiply's service. Choosing this option is free and easy to use: Multiply will move the Group's content on your behalf and invite members to re-join your group in its new location. To begin the migration click here.
    • Option 2: Start again on another service. You can start from scratch and create your group on a different service but we recommend having your Group moved automatically by Multiply. This will enable your Group to transition easily and continue to enjoy the community you have created.
    • Option 3: Start again on Windows Live Groups. To further expand our mix of communications and sharing services, Windows Live will launch a new service this autumn, Windows Live Groups. We plan to launch Windows Live Groups to the public in the coming weeks as a service that helps small groups or clubs collaborate online.

     

    Options for MSN Communities Web Folders users
    If you use save files to the MSN Communities web folders (also known as "My Web Sites on MSN" or the web folder "My Groups"), these services are part of MSN Groups and will therefore will also be closed on February 21, 2009. We recommend that if you store files online using MSN Communities web folders that you back up these files locally, then upload them to another online storage service such as Windows Live SkyDrive. For more details on how to find and move files saved to your web folders, visit the MSN Groups Resource Center.

    Your Next Steps
    We have sent this letter to each MSN Groups user, whether member or manager. If you are:

    • A member or user of MSN Groups: Check with your group manager to determine whether they plan to migrate the group.
    • A manager: Visit the MSN Groups Resource Center to learn more about your options and consider soliciting feedback from your group members about what they would prefer to do, when and how. The Resource Center also provides a sample splash page you can use to notify your members that the group will move. If you're ready to move the group now, click here.

     

    What to Expect between now and the closing date
    Between today and February 21, 2009 the MSN Groups service will remain the same as it is now. We will remove the option to add more storage to your group but other features will remain until the service is shut down and you can use it the same way you do today until the date of closure.

    Where can I learn more?
    You probably have more questions, and that's why we created a website to address them. Please visit the MSN Groups Resource Center at any time for the most up to date answers to common questions, information about migrating your group to Multiply, contact information for our support staff, and important dates.

    Our support staff are equipped to answer your questions and guide you through issues that may arise as you decide what to do with your group. They are ready to help so don't hesitate to contact them at MSN Groups Customer Support with your questions.

    We thank you for using our services and regret any inconvenience this may cause.

    MSN Groups, Microsoft Corporation



    Microsoft respects your privacy. To learn more, please read our online Privacy Statement.

    Microsoft Corporation, One Microsoft Way, Redmond, WA 98052


    Now the groups I belong to are pretty much limited to the VOLT and WEFT groups, and I don't own any groups myself.

    The whole situation seemed eerily familiar, though.

    It was several years ago, in the CompuServe forums.

    Microsoft had a huge presence there -- for betas, for product support, for product insiders.

    Suddenly, they were moving out -- everything was moving to the new (non-replicated) NNTP servers that Microsoft put up.

    There was a new Microsoft provided newsgroup reader that was in Beta, it had a code name of Athena, I believe. Though the goddess would undoubtedly smite the folk who though it was ready to handle the traffic in question, and the users in question (many of whom had never been in a newsgroup, some of whom had never been on the Internet outside of a closed client like CIS).

    Several products in beta kept their CIS forums after people made a strong push to explain that this move could risk their product ship dates....

    I remember a few months later talking to a product manager I knew who remarked how impressed he was at the level of sophistication of the questions being asked in the new Microsoft newsgroups, as compared with the old CIS forums. I had to break it him by pointing out that the reason was that the move was so poorly done that most of the customers had gotten lost along the way.

    An effective way to improve the sophistication of your audience, that.

    Reminds me of an old joke:

    A man takes his wife to the doctor because she is ill.

    The doctor explains that he hasn't run all the tests yet and it will take him several days to do so. But in the meantime he has narrowed it down to either Alzheimer's Disease or AIDS.

    The man is horrified. "What do I do until the test results come back?" he asks, fearfully.

    The doctor responds: "That's simple. Take her to the mall and leave her there. If she comes home then don't sleep with her."

    Now this joke is truly offensive, yet in its own way this is kind of what was done to a whole bunch of customers.

    And given the differences between Athena/all later Microsoft newsgroup clients and the clients that were already out there, many issues with differences in the way the MS clients work still plague the newsgroups community to this day -- phenomena like fully quoting old posts by default, top posting, etc. Microsoft managed to make itself even less popular with a large group of people that really didn't like them much anyway, and they managed to lose a bunch of their own customers too. MVPs like me went from posting thousands of responses a month to low hundreds -- and if I skipped a month, I lost no sleep. And I was not unusual in this regard -- many regularly posting experts disappeared or massively decreased their support due to the real annoyances with the client software. And they never came back.

    That product manager I mentioned figured that Microsoft should put up a white paper explaining how to get to the newsgroups, which prompted to ask him where to put it up. "On the Internet!" he exclaimed, not even realizing the irony of the response....

    Now as it turns out, the scuttlebutt of the CIS to newsgroups migration (reportedly) had to do with a limited time offer that Microsoft had to get out of it contract with CompuServe, which they jumped at even though their migration plans were not fully ready. It might be total fiction and I have no evidence that this is the case but since it kind of explains all of the facts I am willing to take it as the most likely hypothesis of the many I heard.

    What am I to make now of this new announcement that the MSN Groups are shutting down, and what would otherwise be the most obvious intended replacement (Windows Live Groups) are not being provided a migration path like the Multiply option. What's up with that?

    Now I am not a group owner, so I can't say whether they informed owners first or if they truly told everyone at the same time and told group members to ask your owners who may not have even heard about the plans. But I think this is probably just stupidity in the mail and not the plan -- I assume they sent an earlier mail to the owners and just didn't mention it in case people got offended that they were not given as much notice.

    Primarily, I'm annoyed that they are doing all this before the replacement is ready -- it looks like the CIS thing all over again. And I suspect that lot of people will get lost in the shuffle, either intentionally because they go somewhere else (perhaps Google Groups, as one person in the VOLT group joked -- I wonder if Google is going to add a migration plan of their own to pick up some of these folk) or unintentionally because they just got lost on the way somewhere.

    With MSN Groups and Windows Live Groups apparently targeting two different audiences, and with no replacement coming from Microsoft for some of those who will now be disenfranchised unless they do go to some other company entirely (such as Multiply, this looks a lot more like Microsoft getting out of a market (one they themselves took advantage of given groups like VOLT and WEFT) without being willing to admit why (the non-specific "Because we are dedicated to providing our customers with the most current and user friendly technology available today" implies that Microsoft thinks itself unable to provide those customers with something current or user-friendly? Surely that is not the message they intended here?).

    Given that Microsoft did this kind of thing before (in the forum to newsgroup debacle), the whole thing doesn't really even seem all that innovative, to me. More of a "same shit, different group" kind of thing. Or, since the CIS thing happened back in the 90s, a "same shit, different decade/century/millennium" kind of thing. :-(

     

    This blog brought to you by 𒁁 (U+12041, aka CUNEIFORM SIGN BAD)

  • Sorting it all Out

    From I SCOOT to IBOT, #3 of ??

    • 7 Comments

    Prior blogs in the series here and here.

    I had to set a security code for the IBOT.

    Peter, the guy who was programming the code into the IBOT as soon as I had decided what it would be, was shaking his head as he watched me going through the exercise of choosing the code.

    He asked me if I was sure I wanted to make it as complicated as that, reminding me I'd have to enter the code again (requiring me to remember it).

    "Are you kidding?" I asked him. "That's just the first part!"

    "Okay, you're the boss."

    The code had to be good, though.

    Because the IBOT, unlike the scooter, has no key.

    I leasrned very early on not to leave the scooter in the hallway at work with the key in it.

    Nobody would steal it, but joyrides were certainly not beyond them....

    So with no key, I needed the activation code to be complicated enough that no one would be able to just get it.

    It was actually a scene out of Ocean's Thirteen that I was reminded of:

    Rusty: Okay, where's Eugene's trapdoor?
    Livingston: Under the Dragon, first machine on the left.
    Rusty: Got it. What's the secret?
    Livingston: Coin, 3 counts. Coin, 6 counts. 3 Coins, 5 counts. 2 Coins half count.
    Rusty: Could you make it anymore complicated?
    Livingston: That's just the first sequence....

    That's the sc ene. The complicated set of instructions to enable the "make the slot machines pay out" feature they conspired to get into the casino.

    Okay, perhaps slightly less at stake here. But it's the same principle....

     

    This blog broight to you by(U+2387, aka ALTERNATIVE KEY SYMBOL)

  • Sorting it all Out

    From I SCOOT to IBOT, #4 of ?? (with some pictures!)

    • 7 Comments

    Prior blogs in the series here and here and here.

    In response to I SCOOT TO IBOT, #2 of ??, Gwyn commented:

    Can you provide some pictures of the different modes? I'm not really sure what they all are exactly like.

    Very good idea!

    I had a few minutes and my camera so I decided to take some pictures.

    I had no one else here and I did not think of it earlier in the weekend when I was around some people, so this would have to be a solo operation. I'll probably try to get some more when there are people around, eventually....

    WARNING: Although I took these pictures of different modes while I was not in the chair, you should NEVER do that. EVER. The chair is calibrated with a me-sized person in it, and me not in it is not something that it knows what to do with!

    We'll start with standard mode.

    First from the front:

    IBOT standard (from the side)

    and then from the front:

    IBOT standard mode, from the side

    and then from the back:

    IBOT standard mode, from the back

    Special things to note in this mode (not including the I'm a PC sticker on the side, the Microsoft parking pass you can see in the front and the license plate you can see in the back, the latter two of which came from the Saab that is no more) -- this is the mode that is fastest -- up to 6.8 MPH.

    Note those extra caster wheels in the front -- they are only used in this mode.

    The control is fairly lousy so I really only spend time in this mode when I want to be parked or as low as possible or if I have to go somewhere in a hurry.

    Then there is the four-wheel mode.

    Here it is from the front:

    IBOT four-wheel mode, from the front

    and from the side:

    IBOT four wheel mode, from the side

    Now notice how those caster wheels are just kind of hanging there? They are not used in this mode.

    And this is the mode that can bring up the carpet squares if it used indoors. So that is something to not drive around in, inside.

    But it is very rugged and can take on some really steep hills (both up and down).

    I find it to be the best all-around travel mode, and the one I use (for example) to go to work with unless it isn't raining and I do not mind the extra few minutes of the next mode.

    Finally, there is balance mode.

    First from the front in its shortest view:

    IBOT balance mode (shortest), from the front 

    and from the back in its shortest view:

    IBOT balance mode at its shortest, from the back

    and then penultimately (and more impressively) from the side, in the shortest setting:

    IBOT balance mode at its shortest, from the side

    and then finally, and most impressively  is balance mode in its tallest view:

    IBOT balance mode at its tallest, from the side

    This difference may be impossible to see from the pictures, I think it'll have to have people next to it while I'm in it to be meaningful.

    Though there was one important difference which I took some brief video footage of.

    Basically, in the shortest balance mode, the empty IBOT shuffled back and forth a little bit as you can see in the low-res video here (WMV, ~679 KB zipped).

    But in the tallest balance mode, the empty IBOT was moving a lot and was clearly missing me, or some reasonable me-sized weight in there as you can see in the low-res video here (WMV, ~1 MB zipped).

    When taking it out of balance mode the fact that I confused it was even more apparent, as it struggled to try to balance a me-sized weight that wasn't there made it go forward at least five feet.

    I'll try to do this again when I have someone else there to do the video while I switch the modes. It really was quite a site!

    Anyway, hope that will do well enough until I have time to get some more made, with people so that they will be more useful for all the reasons I mentioned! :-)

    On a side note, there is an IBOT over on eBay right now for 12000 which, while definitely better than the full price might see more challenges getting are insurance company to cover it....

     

    This post brought to you by (U+267f, a.k.a. WHEELCHAIR SYMBOL) 

  • Sorting it all Out

    UCS-2 to UTF-16, Part 6: An exercise left for whoever needs some exercise

    • 4 Comments

    Previous blogs in this series of blogs on this Blog:

    Now continuing on from prior blogs in the series, I thought I'd quote a bit from a recent email thread about a very similar issue to some of the prior discussion like especially Part 3, related to caret stops, aka the points where you can put the cursor as you navigate the string.

    The thread had wandered a bit (as threads tend to do!), and then colleague Jerry Dunietz (an architect I have worked with before on issues related to Unicode and cultures and locales and such) offered a great description of many of the issues that I have discussed here in this series. With his permission, I will quote from his response:

    Logically, a Unicode string is a sequence of UCS-4 code-points.  (I intend to carefully distinguish between “code-point” and “code-unit” in the text below.)

    There are several way to encode a Unicode string in memory.  Ignoring for today the existence of a byte-order-mark (and of byte-ordering variations), there are three ways that one could represent such a string in memory:

    As a sequence of 32-bit code-units, each representing a single UCS-4 code-point.  (UTF-32)
    As a sequence of 16-bit code-units.  Some UCS-4 code-points are represented by single code-unit, and some represented as a “surrogate pair” of two code-units.  (UTF-16.)
    As a sequence of 8-bit code-units.  The code points from U+0000 to U+007F are represented as a single code-unit, but all other code-points are represented by a longer sequence of code-units (UTF-8.)

    Now imagine that you’re a programmer, and you want to pick representation in memory for a Unicode string.  UTF-32 seems like the easiest to work with.  (But it is the fattest encoding of the three for any real-world corpus of text.)  If you want the fifth code-point in a string, you just array-index to the fifth code-unit in the string.  If seems (but wait) that if the user hits backspace after inputting a string, you just back up one-code-unit.  Is seems that if a user hits an arrow key to move a caret from one character to the next, you would just advance the caret position by one code-unit. (Given all of this apparent simplicity, there seems to be a compelling argument for defining C++’s wchar_t to be 32-bits long.)

    But it turns out that the stuff I wrote using the word “seems” is an over-simplification.  Unicode has code-points that correspond to combining characters.  Such characters combine with a previous code-point (or sequence thereof) to present to the user what appears to be a single character.  Section 2.11 of the Unicode 5.0 spec provides lots of examples of different combining characters, in different languages.  From a font-rendering glyph selection point of view, or from the point of a view of a user attempting to move a text cursor from character to character, a single glyph or character may correspond to a sequence of multiple UCS-4 code-points, and thus multiple UTF-32 code-units.  Given that such situations exists, an internationally-robust program working with a UTF-32 string must be prepared for the concept that multiple-code code-units or code-points correspond to the user’s concept of a single character.

    But if our program needs to deal with the possibility of a user’s concept of a character corresponding to multiple code-units, then the apparent advantage to the programmer of using a UTF-32 representation instead of a UTF-16 one goes away. (And given the real-world size advantages of UTF-16, it now seems that making wchar_t be 16 bits is a better choice than making it be 32-bits.)

    Whether you choose UTF-16 or UTF-32 encoding, suppose you want to spec a file format that can be easily displayed, but for which a viewer program can easily support text selection.  You could ask the viewer program to build in lots of smarts to determine where a logical caret-stop can occur.  Or you can ask the program that built the file to encode the caret-stop information within the file itself, reducing the burden on the viewing program.

    This is really a great summary of the issues surrounding the UTF-16 vs. UTF-32 debate, as well as the whole UCS-2 vs. UTF-16 one I have been covering already. I can't really take full credit for his knowledge in this area since I was just one of several sources, but Iike now I'm in there somewhere and it is always good to know when one has been helpful in influencing an influencer (which indirectly puts me in mind of both the influence vs. impact issue and the fact that Ms. Phair might be wrong about the influence of an Ant in Alaska, sometimes!).

    Now as to whether storing the information about the carets stops explicitly versus calculating them via a StringInfo-type technique (StringInfo is something I have talked about previously, in blogs like this one) is an interesting one.

    Really in the end it depends on whether you think the pure "derived from Unicode data" answer is sufficient or whether you have additional sources of information. In the particular case Jerry was thinking of, they rely on much more sophisticated methods, such that constantly calculating on the fly could really impact performance. Since the data itself in his case is read-only there was never worry about the potential need for recalculation so the calc-once obnly and store the information was a no-brainer for them.

    But even in a read-write situation, providing a sorted array of indexes for easy enumeration that allows for

    • enumeration in either direction (for cursor movement and selection operations), and
    • insertion (for the insertion of text), and
    • deletion (for the deletion of text), and
    • an initial population that will be already sorted (for a string with no data that needs to build up its initial cache)

    is really just an ordinary interview question in disguise, and one that is pretty easily solved, too. Were it even vaguely "international" I'd say we should solve it here, but it's not so I'll leave that as an exercise for whoever feels that they need the exercise. :-)

    Though of course I'll point a few issues to keep in mind for anyone who would want to implement such a cache....

    Obviously one has to be ready to re-order the items after the point of change for the insertion/deletion case. this could be expensive, depending on where the action is happening and how mush text follows it (when to move to slightly more complex schemes then the doubly linked list that the initial problem suggests is also left as an exercise!).

    But less obviously, in the area immediately preceding and following the insertion point, one can potentially need to recalculate caret stops due to changes in the text. For example if one has the letters "abcdef" then one will have seven indices (covering the points before each caret stop as one moves across the string, plus one for the end):

    {0, 1, 2, 3, 4, 5, 6}

    Then if one decides to add a combining umlaut after the initial "e", the new string is "abcdëf" and the indices would now have to be:

    {0, 1, 2, 3, 4, 6, 7}

    and so on. And in other cases a formerly combining character before or after might now not be....

    From there the exact methods used for calculation of caret stops and how to integrate them together comes into play, as does how the different methods [might] interact. It really can become a rather fascinating technical question, in the end. Though mostly not for this blog, except more on the various methods eventually. :-)

    Getting back to the series for a second, there are a few points left to cover, such as the actual means of support and then the whole UTF-8 question (which is still fair game here!).

    I'll cover these in upcoming blogs....


    This blog brought to you by(U+17bf, aka KHMER VOWEL SIGN YA)

  • Sorting it all Out

    Grease is the word; ░░░░░░ not so much...

    • 3 Comments

    The question from, the other day was an interesting one. It was something like this:

    I’m trying to do a word-boundary check, and I noticed regex doesn’t handle boundaries correctly for some extended characters  (░╤╞╬═╣etc.).

    A simple example is “\b░” which should match “░” but doesn’t. Any normal character in front (“\bg░” : “g░”) will match correctly.

    If I manually check for boundaries (^$\W\s etc.) it works correctly.

    I haven’t found any of the regex options fix it.

    Is this a known issue?

    Does anyone have the equivalent pattern for \b so I recreate it myself?

     First let's look at those characters. They are:

    • ░, aka U+2591, aka LIGHT SHADE
    • ╤, aka U+2564, aka BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
    • ╞, aka U+255e, aka BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
    • ╬, aka U+256c, aka BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
    • ═, aka U+2550, aka BOX DRAWINGS DOUBLE HORIZONTAL
    • ╣, aka U+2563, aka BOX DRAWINGS DOUBLE VERTICAL AND LEFT

    Did you realize there was all this graphical crap in Unicode? :-)

    All of them have a Unicode General Category of So, also known as Symbol, Other. What the CharUnicodeInfo class I mentioned earlier would call UnicodeCategory.OtherSymbol.

    And then we'll look at how \b is defined when it comes to regular expressions, in topics like Atomic Zero-Width Assertions:

     Assertion 
    Description

    \b

    Specifies that the match must occur on a boundary between \w (alphanumeric) and \W (nonalphanumeric) characters. The match must occur on word boundaries (that is, at the first or last characters in words separated by any nonalphanumeric characters). The match can also occur on a word boundary at the end of the string.

    \B

    Specifies that the match must not occur on a \b boundary.

    There we go -- the explanation!

    It would be unrealistic to assume that a regular expresion engine even remotely Unicode aware would think that ░ or any other symbol would be a \w character -- because those symbols aren't words!

    When this was pointed out, the person asking the question definitely didn't expect anything different here; he said:

    That seems reasonable enough.

    If I need to support this scenario (probably don’t) I can create my own \w patterns that include those Unicode characters, like [^\p{L}\p{Nd}\p{Pc}…].

    which gives the workaround if anyone if looking for it (I suspect the actual need here to treat a symbol as a word would be pretty uncommon in text scenarios, as is the use of these symbols anyway).

     

    This blog brought to you by the previously mentioned symbols, obviously!

  • Sorting it all Out

    'It's Not Easy' saying WTF to an 'Ant in Alaska'

    • 4 Comments

    This blog title is not a reference to Kermit's It's Not Easy Being Green, as any diehard Liz Phair fan would recognize...

    I'm going to dig a little into one of the random questions that came out of this last April's I'm aware of that: an Andreaesque segue and intervention, of sorts.

    Andrea: I don't think people understand your relationship with your old team. Especially since you are still writing about a lot of the same things you were before. Do they read it? Do they agree with you or disagree? Do they still talk to you?
    {deep breath from Andrea}
    Andrea: And I don't just mean other people for this one. I don't get this one either. What's your connection to them, now?
    Michael: Oh wow, that one is a bit harder.
    Andrea:
    I'm aware of that. But I sincerely doubt that I am the only one who is curious.
    Michael: Okay, I'll think about that one, too. Maybe that'd worth a post or two, at least for the "me" half of it. I wouldn't try to speak for the other half....
    Andrea: It might help the confused among us

    My old team.

    NLS.

    National Language suppport.

    Not Localization, Stupid!

    Globalization Services.

    They have a lot of names.

    The song I'm playing now might tell you something about it.

    This is the one singing it:

    Liz Phair, lying there 

    It is Liz Phair, from the way back in the time of the girlysound days (the song, not the picture), the title is I Know It's Not Easy and it was re-recorded for the Exile in Guyville re-release under the name Ant In Alaska. They took out a line or two, I think, but you'll barely miss 'em if you don't have the original version.

    It has a lot of the same raw feeling, and I know some have argued whether it was re-recorded at all but I think most people agree that it was.

    The re-recording is most notable for the prefixed 58.5 seconds of silence, which for me symbolizes something too. Maybe I'll talk about that some other time, or maybe not.

    For now that is something between Me and Liz.

    And Liz, actually.

    The lyrics for the song go something like this:

    Call me when you think the coast is clear
    I've been hiding out almost a year
    Is there something wrong? What's taking you so long?
    And yeah, I know it's not easy
    You said if I waited it'd pay off
    But my eyes are growing wild and my body's gone soft
    Is there something wrong? What's taking you so long?
    And yeah, I know it's not easy

    You said I should let go of your hand
    But I don't even know if I can
    You're the only one, you are the very sun to me
    And you know it's not easy

    You'd tell me, wouldn't you, if we needed to talk?
    And you'd tell me, wouldn't you, if I'd pissed you off?
    Is there something wrong? What's taking you so long?
    And yeah, I know it's not easy

    Well, I look at the stars and I know you're under them
    I look at the cars and I know you insure them
    I look at the books and things people are reading
    I know that you've written them, too
    You've got so many little things to do
    But then I look at my life and I know you've forgotten
    The promise you made to me, I think that's rotten
    I'm hopelessly lost and there's hardly a sound anymore
    Coming through that can show me around
    'Cause I'm endlessly endlessly searching the crowd
    Looking for something from you
    Just one fucking measly clue
    Any shitty little tipoff would do
    But I'm just an ant in Alaska to you

    Then I look at the stars and I know you're under them
    I look at the cars and I know you insure them
    I look at the books and the things people are reading
    I know that you've written them, too
    You've got so many little things to do
    But then I look at my life and I know you've forgotten
    The promise you made to me, I think that's rotten
    I'm hopelessly lost and there's hardly a sound anymore
    Coming through that could show me around
    'Cause I'm endlessly endlessly searching the crowd
    Looking for something from you
    Just one fucking measly clue
    Any shitty little tip-off would do
    But I'm just and ant in Alaska to you
    I'm just an ant in Alaska
    An ant in Alaska
    An ant in Alaska to you 

    Now most of the themes of this song are not what I am saying my relationship with my old group is like.

    Seriously.

    Our "break-up" (such as it was) was nothing like this, at all.

    But that last line....

    This world in which I now live, in almost the southwest most place in the building on the opposite side of the group's East side abode in the building, when I haven't been enlisted in their branch for way over a year since Track change (a.k.a. A new job that has a few things in common with the old one) happened, I think that buried in the line is what I think of as the connection I have with my old group.

    At least symbolically, I'm an Ant In Alaska.

    Sometimes I meet with them and they ask me questions about stuff as they work on new features.

    Sometimes they tell me about their plans (since in theory a lot of the other groups I help out might find it helpful if I know about future plans, though in practice not so many are directly impacted).

    But not much (or at times any) of my feedback actually ends up in the new features, and the final plans are often wildly different than I was originally told.

    I probably have more influence and impact on their clients and on customers in other parts of Microsoft (in part due to this unofficial blog, in part due to past relationships) and even on external customers (again via this very Blog!) than I do on them -- to them, I think I really am an Ant In Alaska, even if they do read here (several don't, and it isn't like they have to, but some still do).

    To be honest, I don't know that I'm particularly bitter about that though. I suspect I'd be a lot less happy about the work I do if I knew more about what was going on there, due to the natural desire to be unhappy with things that change, especially if it is not changing the way I would have done it, given the chance. Not knowing gives me a better sense of distance.

    Other people in the building do read the Blog, I know -- they ask me stuff all the time and some of them even feed me ideas that end up becoming blogs here (and others are on the list to be done like the one of the double L, you know who you are!).

    But on the whole, I do feel closer to customers now, which was really the whole point of the Track change thing anyway. Which means I'm happier, a lot more often than I'm not.

    I mean, I won that cool award:

    Bulldog

    and the only people who really knew about were the folks who came by my office (not many of the folks from the old group) and the ones who read about it here. No mail was ever sent (amusing in and of itself since as I suspected it was never mentioned in any mail to the group), so people have just found it kind of randomly if they happened to be coming by for something else.

    We don't get a lot of visitors from the rest of the building, though.

    I was talking to a teammate from the NLS days the other day about an issue that had come up and it had the same cordial feel of a conversation I had with a former manager from nine years prior. No bad feelings, a lot of mutual respect and interest, and very little real idea of what the other person was up to, which kind of made the small talk much more purposeful as we both tried to "catch up" on things. Not self-consciously since there was no expectation that the other one would know things, but a collegial kind of "good to chat from time to time" sort of thing.

    Know what I mean? Just like the manager from nine years prior.

    So we are in the same building, but not the same team....

    And then a different other day, a colleague in another group entirely who managed to embarrass me with his praise a bit asked me:

    Out of curiosity… you have been with int’l for a while, yes? Do you ever feel pressured by yourself or others to “move on and do something else”?

    And I guess the answers are yes and yes some times and yes some other times.

    I've even had a tempting offer or two.

    But there are still things I can contribute here, so I have not been giving into the occasional temptation yet.

    And I'm really not saying that It's Not Easy (the old song title) being an Ant in Alaska (the new song title). Because for the most part, it is -- and I find that I actually enjoy being an ant in Alaska. Kind of collegial isolation technique!

    You know, I debated not writing this blog. And then after I wrote it I debated not posting it. But sometimes you've just got to say WTF....


    This blog brought to you by(U+2f8d, KANGXI RADICAL INSECT)
  • Sorting it all Out

    On blowing a font cache, and overwhelming a Fonts folder with the raw power of typography

    • 3 Comments

     In response to About the Fonts folder in Windows, Part 3 (aka What changes in Vista?), Shaun asked in a comment:

    I unzipped a large number of font folders into my Windows/Fonts folder and now the unzipped folders are not showing up… my Fonts folder is only showing about 4,500 fonts and there are 65,000 fonts in there somewhere but they can’t be viewed and they’re not installed, just sucking up space and invisible. My “show hidden folders” option is enabled in Folder Options, and I can see the folders when I go into “Install Fonts”, but I can’t delete them!

    Any ideas on how to access these folders that are obviously there, but unaccesible?

    This comment took me back.

    Way back, in fact.

    To the Spring and Summer of 2000.

    I was in the final stages of a book.

    This book:

    Internationalization with Visual Basic

    Now most of the production was done in Microsoft Word for Windows, and the machines they were running it on were almost all running Microsoft Windows NT 4.0.

    And the folks doing the production work were having problems.

    It seemed like every chapter would have some characters missing. They would exit programs, log off, and reboot. The symptoms changed each time as the exact characters missing would vary, but invariably something would go wrong.

    They were desperate.

    Folks were getting more frantic in Indiana (where Sams was located), and the stress was being transferred to Redmond (where I was).

    So we had a nice long conversation where we wnt through the issue with the fragile font cache in NT 4.0 and how easy it was to blow by having too many fonts. And the large number of fonts that the book needed were more than enough to blow the font cache. and blow it huge.

    In Windows 2000, a huge push to fix these problems was very successful, but switching the production machines was just not an option -- the IT staff was just not set up to doing this. Even if they were, the results in Word were not the same between NT 4.0 and Windows 2000, and I was working on NT 4.0 for the book. Moving to a new platform would mean major reformatting work for the whole book. So if it seemed like there was a competition between them and I to decide who would drag their heels the most on the idea of a Win2000 upgrade, then the appearance probably wasn't far from reality.

    So my suggestion was to strip down the fonts to the absolute bare minimum, then add just the necessary fonts for each chapter and take them off the machine when done. And reboot in between each chapter, just in case.

    It mostly worked just fine (I say mostly because there were some problems in Chapter 3 that were not caught prior to print), and I swore to move all of my machines to Windows 2000 as soon as the book was completely done.

    Now like I said the problem was largely fixed in Windows 2000.

    But just because things don't blow up as easily as they did in NT 4.0 doesn't mean that GDI and the Fonts folder are prepared to scale beyond 65,000 -- or even 4,500 -- fonts! :-(

    In the words of the folks from In Living Color, Homey don't play that.

    But the deleton of the extra fonts is easy enough via an elevated CMD prompt. Which should allow the deletion to happen for all of the extraneous font files.

    Obviously the situation with my book was what at the time thought of as quite an extreme case of a machine being overwhelmed by the raw power of typography, but all things considered I am pretty sure that 65,000 fonts would probably top that on any version fo Windows (as the folder itself obviously has its own problems scaling to that quantity, beyond whatever problems the underlying infrastructure hits!).

    What are the scenarios that one would really need 65,000 fonts installed?

    How many fonts do you have on your machine, and in what version of Windows?

    when I think about Windows 7 and Long Zheng's Improvements to fonts in Windows 7 over in I Started Something, I can't help wondering if the new Fonts folder in Windows 7 will scale up to 65,000 fonts. I realized three things:

    • The mere fact that I ask the question here does not really make it a meaningful one, and
    • The mere fact that I ask the question here probably does mean someone will have to try it out if they aren't doing so already -- to have a description of the behavior in the future KB article, if nothing else, and
    • The behavior between now in the Windows 7 PDC build and in the final release of Windows 7 might be improved if this scenario wasn't already being tested....
    There is a class of bug that if someone finds it you kind of have to do something about it; this scenario might qualify. So I apologize to whoever is tasked to look further into things. :-)

     

    This blog brought to you by T (U+0054, aka LATIN CAPITAL LETTER T)

  • Sorting it all Out

    Trying to ignore the small stuff is harder, if you're Arabic

    • 0 Comments

    Via the Contact link, Alain asked:

    Hello Michael,

    I ask you about a problem I searched on the net all morning and get no response.

    We work à UNESCO (Paris/France) on a multi-lingual database (SQL Server 2005). We actually add Arabic to a English/French/Spanish/Russian thesaurus.

    We have Arab people at Alexandria test our application and they complained about not getting response when searching in Arabic with letters having not the same diacritics (e.g. Alif with and without Hamzah).

    We use SQL_Latin1_General_CP1_CI_AI, but I tried with Latin1_General_CI_AI and Arabic_CI_AI and got the same result.

    My questions : is there a way to add my own collation to a SLQ 2005 server. Or is there a collation just ignoring *all* diacritics for every UNICODE character ? And why does Arabic_CI_AI  not ignore Hamzah on Alif ?

    I wonder if I am the only guy around the world searching Arabic text on a SQL Server database. I am not an Arabic reader not speaker, but it seems the the requirment is very basic for Arabic...

    Thank you and, please, forgive my poor English

    Very good bunch of issues in there that all deserve some coverage! :-)

    Starting with the easiest part: SQL collations are terrible and essentially useless in most cases. My words in SQL Server: compatibility collations vs. Window collations are probably the best answer here to explain why not to use them. It is just that given how awful the SQL compatibility collations are for text outside of English, they are pretty much only the default in the US (otherwise SQL Compatibility collations are a bit too retro because Unicode and SQL Collations have nothing to do with each other).

    So less than ideal results there are kind of par for the course....

    Then there is the fact that Latin1_General_CI_AI and Arabic_CI_AI return the same results. This is actually also expected since both collations use the default table and the only difference between them in SQL Server is how they have different code pages attached to them for non-Unicode columns (1252 for the one, 1256 for the other).

    Therefore, this too is expected.

    Ok, enough stalling -- let's get too the actual issue -- the incorrect results!

    This is a longstanding bug that I have previously described in Is it punctuation, symbol, or diacritic?, which explains the nature of the problem and describes how in some cases NORM_IGNORESYMBOLS will help here when one is dealing with Windows 2000, XP, or Server 2003.

    Unfortunately there is no way to set this flag in SQL Server, so in the end there is no collation setting to work around the bug in SQL Server 7.0, 2000, or 2005.

    However, Is it punctuation, symbol, or diacritic? explains how Vista and Server 2008 actually fix this longstanding issue. and the cost of fixing eight separate problems with Arabic script collations was just one bug, in Persian (ref: Hello Madda, Hello Father (Iranian style)).

    and how does SQL Server get this fix?

    Ah, for that you can find the answer in On changing the world, or at least the way people order things in it, which explains that SQL Server 2008 has the absolute latest version of the tables to date when SQL Server shipped, and thus has the fix for this bug in it.

    There is, however, no downlevel fix for this problem that has really been around in Windows for as long as Arabic support has been in the product and in SQL Server for as long as Arabic support has been in that product.

    Custom collations or any way to modify collations? That is a feature that does not exist in either windows or SQL Server....


    This blog brought to you by ب (U+0628, aka ARABIC LETTER BEH)

  • Sorting it all Out

    Inspiration, and a code chart

    • 4 Comments

    Way back in September after I did that presentation at the Internationalization and Unicode conference that I mentioned and provided the slides of in Behind the Proposed Change to Tamil in Unicode (five different ways), Scott sent me the following via the contact list:

    Michael,

    After your talk today, I was inspired to put up a Unicode syllabary chart for Tamil on the Tamil Script page in the English Wikipedia, complete with  the new Tamil named sequences from Unicode 5.1, in the hopes of building support for the current Unicode encoding model.  Anyway, you can check it out if you're curious:

    http://en.wikipedia.org/wiki/Tamil_script#Tamil_in_Unicode

    If you find anything horribly wrong, I'd be happy to fix if you let me know about it, so you won't have to violate your policy of Wikipedia non-editorship.

    I just hope this doesn't earn me death threats!  ;)

    -Scott

     I think that what Scott did here was excellent, and I did not note anything horribly wrong at all....

    And it humbles me to think that I helped inspire it.

    Because even though that is the secret hope I have for some of my talks (especially including this one), it is really awesome to see it spelled out in such a way.

    The chart he provided was similar top but not the sam as the ones I provided in Learn Tamil in 30 Days (or something like that), and help people look at Tamil in Unicode the way that they might learn Tamil, something the simple code allocation chart would never be able to do -- in its own way something Uniciode cannot do without prioviding this same crucial bit of infomation in a familar form.

    Thanks, Scott -- both for this and for supporting my non-interferece policies WRT Wikipedia! :-)

    Which reminds me that I promised to talk more about some of the issues I didn't have time to cover in the talk. I'll be sure to get on that....


    This blog brought to you by(U+0bb9, aka TAMIL LETTER HA)

Page 1 of 3 (33 items) 123