Blog - Title

November, 2010

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    If you lie, that replacement character might pop in (the one that isn't Paul Westerberg)

    • 2 Comments

    Late last month, JC Ahangama sent me the following question via email (to the trigeminal.com webmaster address):

    Hi,

    I am writing to this address because I could not find the address of great
    Guru Michael Kaplan. The attached file explains my question as well as my
    complaint. I do not have high connections to Unicode to do this but sure Dr.
    Kaplan is the man for it.

    It is about European characters getting step-motherly treatment.

    Thanks.

    JC (Ahangma)

    The attachment was an HTML file that had two simple features. It had:

    • a charset meta tag set to UTF-8, and
    • some of the so-called "High ANSI" characters (characters between 0xA1 to 0xFF)

    Those characters would some up as square boxes (notdef glyphs) in Internet Explorer, and the diamond question mark character in FireFox.

    Here is the page in FireFox:

    Contrary to the claim at the bottom of the page, this is not a bug.

    Browsers must explicitly make some choices, and in this case, both Internet Explorer and FireFox are choosing to trust the charset meta tag.

    They both dutifully convert this non-UTF-8 data as if it was UTF-8.

    And since the bytes from 0xA1 to 0xFF are illegal in UTF-8, each successive byte is converted to the replacement character (U+fffd).

    As he noted, the problem is fixed if you change the charset meta tag -- at which point the page is no longer lying about how it is encoded....

    The moral of the story is not to put the wrong charset meta tag in the page -- if you want it to be tagged as UTF-8, make sure it is saved as UTF-8.

    The fact is that according to Mark Davis of Google, there are a lot of incorrectly tagged web pages out there, which they index using the correct encoding that the page is actually believed to be in. Now this leads to interesting problems since so many browsers will not display the page in the same way that Google indexed it (meaning you may not be able to see the text that you were searching for and Google claimed to find on such a page).

    I wish that it were so easy to get, rather than U+fffd Replacement Character, the "Replacements characteer, Paul Westerberg.

    Though if Mark is right about the amount of incorrectly tagged pages then that would mean one hell of a touring schedule (and a lot more common than saying 'Biggie Smalls' three times!)....

  • Sorting it all Out

    Math is hard, let's do keyboards...that do math.

    • 3 Comments

    Back at the beginning of the month, swalleh asked over in the Suggestion Box:

    Is there a way I can use MSKLC to type [characters outside theBMP]?

    I am trying to type the unicode   character    𝖳  1D5B3  using  MSKLC  .

    The character appear on my keyboard layout plan, but  nothing appear when I try to use it to type in word 2010  using cambrian math font.

    I need 4  characters  with   fixed case  namely :      T  D  H  and W  in capital form.    I found them in the extended characters -plane 1  series.

    I could not find them anywhere else in unicode.

    Is there a way  I can use  MSKLC  to type them?    I note the code for them is a  5  character one compared to the usual  4 for others.

    If MSKLC  is not suitable , would you be able to suggest  an alternative .

    Many thanks

    ASwalleh

    Crap.

    That was my first thought when I read this -- had someone broken that core piece of MSKLC that supports characters beyond the Basic Multilingual Plane?

    I created a special "math symbols" keyboard by typing all of the U+1d### characters in directly:

      

    You may not be convinced I actually did anything here.

    People who know me will often say I am not above a little smoke and mirrors.

    Fine, I'll change the font to Cambria Math:

    I also made the size really small because this font's metrics are pretty outrageous.

    Let's try again:

      

     Anyway, it works just fine.

    I also tried other methods of adding characters to keys in MSKLC like copy/paste from other applications and drag/drop from IE to the MSKLC key -- the former always worked, the latter had mixed results.

    If you still don't trust me, then

    1. shame on you, and
    2. you can download the .KLC file here if you want to play with it

    The download can make it easier than the DIY aspects of reproducing all of the results in this blog.

    I did notice a few bugs in MSKLC itself that have never (as far as I know) been reported:

    • The "AutoCasing" behavor that hooks up CAPSLOCK and shift state doesn't work with supplementary characters;
    • When typing U+####[#][#] characters, they case of the "U" is important, the case of the # isn't (e.g. U+fEfF is legal but u+0061 isn't);
    • When adding via drag/drop, characters outside the BMP don't always seem to work -- much worse in IE than FireFox.

    Someone on the GDX team should put those bugs in, just in case Microsoft ever ships another version.

    Though no one seems interested these days, even for the meaningful bugs....

    Okay, so it works in MSKLC. Let's try the next part, and type in Word 2010 using Cambria Math:

    Okay, that sort of worked maybe.

    Word decided to autoselect MS Mincho instead of the font it started with (Calibri) even though it didn't change the language.

    Even choosing the font first and then typing didn't work -- I am prepared to forgive the former much more readily than the latter.

    I'd say somewhere between Word 2010's "be clever with font stuff" and MS Mincho's font definitions. I would guess that it is at least 90% Word's fault and MS Mincho happens to claim it supports some math stuff, even if not this particular math stuff.

    Word should fix its bug after all the work they did for math suppport. Any Word testers hanging out here? :-) 

    Okay, let's just select the text and change the font ourselves, that should do the trick:

    Ta da!

    So minus a Word 2010 bug for some Plane 1 recognition weirdness, I'd say we are all set here with MSKLC 1.4.

    And then there are some other bugs that won't block users that someone may want to look in some day.

    One thing I must admit about MSKLC is that it has many different ways to enter characters because of some unofficial usability studies we did by giving it people to try and see what they did. So on the assumption that swalleh was simply trying to enter the characters in a way that did not work but maybe ought to in some future versions, knowing what that method was would also be useful....

  • Sorting it all Out

    Oh I know that I am no sage but I won't be an ANSI code page

    • 9 Comments

    The title of this blog will seem a little less stupid if it is hummed along with the section of the Schoolhouse Rock song I'm Just a Bill that goes "Oh I hope and pray that I will but today I am still just a bill."

    Over in the Suggestion Box, yuhong2 asked:

    Shift-JIS_2004 and Big5 with HKSCS can't be the ACP as some of the characters convert to UTF-16 surrogates. Are there other requirements that must be met for a codepage to be the ACP/OEMCP?

    Well, the first and most important rule is that it must be one of the existing ACP or OEMCP values since none are being added!

    Beyond that, the rules don't have much to do with surrogates directly, though other rules disqualify them anyway.

    I suppose the rules can be enumerated:

    1. The code page must already be on the list of ACP or OEMCP code pages;
    2. The Unicode side must be a single UTF-16 code point;
    3. For non-DBCS code pages, the non-Unicode side can be only one byte;
    4. For DBCS code pages, the non-Unicode side can by only one or two bytes;
    5. The ACP and OEMCP values can never change the length of the string if the same string is round-tripped through one versus the other.

    Now rule #2 does slam out all of the ones yuhong2 mentioned.

    And rules #2 and #3 and #4 eliminate UTF-8.

    It is rule #5 that causes locales that take one of the double-byte code pages (932, 936, 949 or 950) and force them to be both the ACP and the OEMCP; there are many unpleasant hard-to-predict consequences to them not always matching length....

  • Sorting it all Out

    They refer to the quality of existing, not the quality of living (or the quality of gaming)

    • 7 Comments

    So it was not too long ago that I blogged a blog for this Blog entitled The bizarre variation of a skeleton that is iBot + me, to a Kinect.

    In it I talked about a meeting inside Microsoft with internal folks with an interest in accessible gaming.

    That meeting took place maybe a week before one with lots of external influential folks in accessible gaming (mentioned here).

    I was told that with the exception of some purely internal issues that couldn't be discussed, the two meetings were remarkably similar, with the same slide decks and analogous setups for the labs.

    Even at that meeting, barely a few months before the release, we were told that feedback we gave was almost certainly for future versions, because it really was mere months until the release.

    But now, on the other end of the Kinect release, I know that even if none of the feedback people mentioned in the meeting was applied to the Kinect that the feedback they recorded with the iBot and the Kinect was used -- because the Kinect was no longer confused about me in almost every way that it was confused two months prior.

    Although a tiny bit of me can be frustrated about the fact that I was now to be shut out of some games, the vast majority of me knows that if they had shipped that version of the Kinect I would have ripped it to shreds. Simply to shreds. And thus there is no way I can say that they made the wrong decision here or did the wrong thing.

    In some ways, they are one of the few part of Microsoft that I feel is on course these days!

    But ACCESSIBILITY is a big word, much bigger than just my experiences -- which are at best limited to physical mobility issues of one guy in a wheelchair.

    For Microsoft, that word isn't just a line item.

    It's over a dozen different line items, since it covers just about any kind of physical or mental challenge that makes a computer or a computer program or a game harder to use, or impossible to use.

    Now blogs like Joe Clark's Where open-source is as good as Microsoft (which cites my iBot/Kinect blog even though most of his blog is focusing on a different aspect of accessibility than the ones I have obviously been focusing on).

    I am not going to respond to Joe's blog directly since it is mostly about issues I have no experience with, except to say that I have talked with people who have other disabilities and accessibility challenges who don't always say as many nice things about Apple as Joe does. Which is not to say they say much that is better or nicer about Microsoft either.

    Both of them kind of suck for games, for many people with accessibility issues. The few exceptions relate either to unplanned benefits or government regulations, for both companies, and actually most companies.

    My own experience with getting the iBot approved by insurance, described in Cogito ergo cathedra... (I think, therefore IBOT...), reminded me that they didn't care about how much easier it would make traveling -- they cared how it helped around the house and such. Because that was what they cared about -- quality of existing, not quality of living.

    Other exceptions (e.g. the XBOX "Indie Games" In the Pit game which is audio only) actually get panned by people who point out all of the "flaws" in no video (e.g. this negative review).

    I guess you could say that this person never really considered the fact that I have had blind friends of mine tell me they were able to kick the ass of any sighted person when they played this game because it actually forces a person to deal with the same limitations and issues that someone who is actually blind can face.

    So I suppose one could look at that review somewhat ignorant to the wider issues of accessibility and say that (since a game with those limitations on it is bad) and conclude the reviewer is saying that being blind makes for bad games. Because in that reviewer's view, games without those features are bad.

    I doubt John Kershaw (the reviewer) would extend his "the graphics are rubbish" to claim that "blind people are rubbish". And if asked to review it in the wider context of someone who is blind, he might have written a very different review of the game -- one that was a bit more sensitive to the situation.

    But games are a real wilderness when it comes to accessibility in general. None of the big companies making games target disabled players, and with the exception of such Indie games and modified controllers that cost a lot of money, no one is doing much to make sure that the gaming situation itself isn't rubbish.

    What I liked about the summit and about the general feeling of the team is that there is lots that can be done in the games spaces for accessibility, since for every extreme case like me there are many who are tired when when get home and can't jump around standing up. And for every extreme case of someone who is deaf there are many who can't discern sound well when there is lots of background noise, in the extreme case where someone is blind there are many who are colorblind or unable to see as well. And so on.

    For everyone, ACCESSIBILITY is about quality of existing, not quality of living (or quality of gaming).

    My biggest hope is that this realization can have more impact on the Kinect, and the XBOX folks, and the game studios. Because much of the work to help people with disabilities can help everyone else -- since we are all getting older and find things we can't do, and we need programs and games that help everyone. The fact that the people behind th XBOX and Kinect are looking at the problem without the kinds of government regulation that drives most of the industry to do work here means a LOT to me.

    Because no one else is really trying to help just the people with accessibility issues.

    Not even the games that run on Apple products....

  • Sorting it all Out

    One Uyghur walked into a Blog, and...

    • 1 Comments

    The title sounds almost like it might be the beginning of a joke, but it's not. This really happened!

    I should explain.

    You see, just a couple of days ago, a reader with the handle One Uyghur wrote the following in the Suggestion Box:

    Dear Michael Kaplan,

    How can we have Uyghur (Uighur) LIPs for Microsoft products? How does Microsoft develop the LIPs for some local languages? If we help Microsoft to translate the terminology into our language, will Microsoft release the LIPs for us? Help us to use Microsoft products in our language.

    Thank you.

    One Uyghur

    Now this was not the only way the messasge got to me.

    Many other comments were posted in another, unrelated blog (A lot of LIPs for South Africa!) that expanded on the "Uyghur LIP" topic by making arguments about relative population sizes, though I ended up deleting those since three of those comments with no end in sight concerned me about topicality a bit. No offense, but I do reserve the right blah blah blah....

    If you were wondering whether the One Uyghur handle seemed familiar, there were earlier comments in other blogs? You were right. There was The inappropriate nature of getting the Feh out of Uighur, Windows 7 edition and The report of the need for a Uyghur hotfix may be an overstatement. They were about other issues though.

    For the LIP question itself, the topic is one actually I discussed recently -- less than a month ago, in Why one LIP and not another?, such as:

    • the fact that individual user requests aren't really the way such things can be decided;
    • the near irrelevance of the pure "number of speakers" issue;
    • the need for some kind of "sponsorship" or other information to help quantify the business case;

    all of which of course apply here.

    Now I like my blog, most of the time.

    My management kind of likes it too, for the most part (I appreciate this a lot, since it hasn't always been true!).

    It was less than a month ago in a team meeting when -- in a round-the-room intro I gave my name and mentioned I had an "annoying but lovable" Blog -- my manager's manager's manager (Corporate VP Julie Larson-Green) agreed with the assessment and smiled.

    Hopefully more about the loveable part than the annoying part, of course. My return smile was just as warm as hers, but a bit more nervous! :-)

    Anyway, in making decisions about whether to assign the resources to translate the resources of up to 20% of Windows into a given language? This Blog is clearly not where such things are determined, or how. Even on the occasions where I have responsibilities to assist others who are making those type of determinations, this Blog is an entirely separate part of my job. My contribution there is more on the technical side anyway, not on the business case proper (i.e. much more on the "can we do it?" than the "should we do it?" side).

    But I will forward this information on, since that never hurts. Besides, people who make contact like this can often make good beta testers for this or anything related later. :-)

    However, one can't lose sight of the fact that all of this (Windows) is a business to some of the people here (even if not me personally most of the time), and therefore a standard process has to be used to be sure that decisions and actions and implementations are proper and useful and appropriate....

    In this specific case, for not just friend One Uyghur but for All Uyghurs!

  • Sorting it all Out

    UTF-8 on a platform whose support is overwhelmingly, almost oppressively, UTF-16

    • 4 Comments

    I had some developers running an interesting scenario by me the other day.

    They had some strings that were being kept in a cache.

    Pretend that the title has given you no clues as to what is about to happen, please.

    The nature of the strings and the purpose of the cache isn't relevant to the topic of this blog, I'll just say that the strings aren't file paths but are potentially much longer than that.

    Anyway, the cache itself had certain limitations which amount to a maximum size per string, and the strings themselves can be visible to the user.

    If you are a regular long-time reader here then at this point, your first thought may be the same as mine was -- the lessons from my whole UCS-2 to UTF-16 series, in particular the blogs within it dealing with truncation and not changing the meaning/appearance of the strings in unexpected ways.

    Under ordinary circumstances, that would handle it --somewhere between implementing nothing in that series and (to cover all of the locale-specific and linguistic issues) implementing all 110% of it, the truth lies (sadly enough in most cases the final decision ends up closer to the 0% than the 110%, but finding that sad is the occupational hazard of being me, something I wouldn't recommend if you can avoid it!).

    However, in this case there was an additional complication.

    The string cache I mentioned? The strings were being stored in UTF-8.

    In fact, they were looking for help since their attempted solution code was using IsDbcsLeadByteEx and CharNextExA, neither of which seemed to support UTF-8.

    Very true, neither function does (I previously discussed this in Is CharNextExA broken?).

    And now we have a ball game.

    The problem here points to a different smaller series, the You may want to rethink your choice of UTF one:

    In this particular case, the tripping point is in Part #3 -- by implementing a solution using UTF-8 on a platform whose support is overwhelmingly, almost oppressively, UTF-16.

    Certainly one can crack the byte semantics of UTF-8 (you can use the information in Getting exactly ONE Unicode code point out of UTF-8 as a roadmap), and figure out code point boundaries.

    But if one is using either native or managed code coming from Microsoft, then all of the rest of the goodies aren't available to you, since all of that is in UTF-16.

    Maybe it's time to talk to someone about that implementation decision to use UTF-8 here?

    Okay, most of the time in these situations if someone is talking to me at the point where they are asking the "Does CharPrevExA support UTF-8?" question, then my knowledge isn't the issue, and neither is my ability to make implementation suggestions.

    At this point everything is already written and potentially already shipped in some other, lesser manner that is only now being looked at in order to try and fix this problem.

    I don't take it personally. :-)

    Things are now complicated though.

    If the string is too long, one has to walk back a certain number of bytes, and that number can only be known after one knows a lot more about the characters....

    There is no easy answer here.

    Though thinking about the Microsoft developer interview question, how would you attack the problem?

    And don't suggest including ICU, we don't currently do that, as I said yesterday!

  • Sorting it all Out

    Perhaps not evil, but certainly getting hella snarky

    • 2 Comments

    On a note that is wry but unrelated to the rest of the content of today, I will note that traffic to my blog is much heavier on the weekend than during the week. The patterns suggest a lot of the things I may talk about some other day, though for now I'll leave you all with one thought on this: my blog may be interesting to many of you, but it is not important to most of you. I am okay with this, and feel that less traffic would allow me to feel less guilty about typos and such. So if you want to stop reading you should not feel guilty about leaving. :-)

    On a second note, people seem to be interested in my social life (such as it is). I will say someone and I just broke up (well, technically I was dumped) but we hadn't really been dating all that long so classifying it as a breakup may be more charitable to the relationship (such as it was) than it deserved. Technically the cause of said relationship's termination is a geopolitically insensitive statement that may be worthy of its own blog though someone else would have to write it. I'll just say for the record that there are people of Nordic descent who take the subject of Ragnarök very seriously, and that it should not be joked about. I have decided she is right, though the realization came too late to save the situation. She did like my prior blog On Thokks who don't give a Frigg, under the mistletoe, for what it's worth.


    So everyone who cares about internationalization really hates how lame JScript/JavaScript/ECMAScript is in regard to the whole internationalization/globalization/locaslizability space.

    This has been true for really most of a decade.

    Microsoft, which has been burned before for taking such frustrations and using them extend their own support, decided not to go that route. They did add support to VBScript that mirrored much of what was in VB/VBA in that regard, which led to the reasonable conclusion of its users that JScript's ultimate globalization hero: VBScript was really true.

    I mean, really lame. But really true.

    This does make one tremendous freaking albatross to hang on poor JScript. An albatross that is largely unsustainable given the entire ecosystem of potential browser choices that a user has (since VBScript was available on pretty much none of the others while some form of ECMAScript was available one pretty much all of them.

    Better than nothing? Sure. But only in the sense that 35 years in prison is better than 50 years there.

    So obviously, news like the jQuery Globalization Plugin from Microsoft that I described back in June in JavaScript's got a whole new ultimate globalization hero is pretty amazing.

    It's open source, it's jQuery, it is Microsoft trying to provide solutions that everyone can use.

    Which is not to say that others weren't trying to do something similar.

    I mean as long ago as 1999, Richard Gilliam was suggesting something better be done (ref: Adding internationalization support to the base standard for JavaScript).

    And many of us at the most recent Internationalization and Unicode conference were interested to see Nebojša Ciric and Jungshik Shin (both of Google) presented in Making JavaScript Multilingual their submission to try to update ECMAScript to better support globalization -- with what is best thought of as an ICU port to JavaScript given the syntactical similarities between them.

    Now jQuery is in use by (the last time I heard) something like 30% of the sites using JavaScript (I just looked at Wikipedia and their article cites"31% of 10,000 most visited websites"). Anyway, I see a lot of juice behind jQuery.

    Here we go again, some might think. I mean, look at some of the typical examples in this space where Microsoft does one thing and the industry does something else:

    • Microsoft's collation support vs. the Unicode Collation Algorithm;
    • Microsoft's time zone support vs. the Olsen data;
    • Microsoft's locale support vs. Unicode's CLDR.

    You get the idea.

    Now I was not in all of the meetings for these things.

    But I was in a few of the key decision-making meetings for some of them and I can unequivocally state that there was no conspiracy to make Microsoft do its own thing as a matter of policy. In the case of both collation and locales, the people who eventually created alternate standards even approached Microsoft to share its data and participate in creating a wider standard; in both cases we declined to share data explicitly though we did provide expertise in both the creation and the maintenance of the standards behind them.

    Looking at the two different approaches to solutions on the ECMAScript problem here for a moment:

    MICROSOFT: By providing an Open Source jQuery implementation, this can clearly be thought of (for simplification purposes) as a reversal of prior "let's keep it to ourselves because it is a strategic advantage to not share our IP (Intellectual Property)" type approaches. The syntax was (in my opinion, and the opinion of a few less-biased people I asked) driven much more by jQuery than by any Microsoft-specific language or platform. As far as I know (from both my own personal knowledge as someone often brought into globalization discussions at Microsoft and from talking to several of the others involved with the project) it was created from a "purer" place in terms of motivations, noting the history of nearly perpetual lack of interest in updating the standard for this issue and the desire to finally get something out there for people to use.

    Microsoft's motives are not always as pure and I have been quick to point out when they weren't, so I will admit some pride at being "one of the good guys" this time.

    GOOGLE: I don't have a particular bias against an ICU frame of mind, but I don't use it myself -- I don't ship ICU and I work for a company that doesn't ship it either. And I know that on several occasions I have been asked whether and/or when that would change: both officially and unofficially, both directly and indirectly. Adding something with a syntax so similar to ICU to an open, widely adopted standard that is essentially the one that all browsers follow will be an interesting way to see (depending on your point of view) either how arrogantly stubborn Microsoft would be at not shipping ICU by being willing to re-implement the same functionality or how annoyingly manipulative Google and others would be at trying to shoehorn ICU onto Windows.

    Or most likely both -- since the net effect of the move into ECMAScript will indeed give ample opportunity to showcase some of the qualities most negative about all of the people involved.

    I have talked to several of the Google developers involved and I doubt this was their motivation. But an old quote comes to mind "The gun may fancy itself the surgeon's scalpel, but the assassin must know the task."

    Now despite the fact that this is yet another of those "Google vs. Microsoft" things, the coverage will probably stay here, since as I have often mentioned no one cares about internationalization.

    I could try and jump on the implied intent to pervert Open Standards that is there in whatever decision was made that allowed them to proceed here -- I mean standards should be a shield, not a sword. Or, if that was not the intent and this was just some engineers not realizing the consequences of their actions then I could just ask Google to grow the hell up before sitting at the big kids table, like standards -- or better engage the people they have in their company to watch the people involved. 

    But even saying that one paragraph, I'm exhausted and I realize why I hate the TechCrunches of the world -- I lack the energy to be that kind of blogger just to increase my hit count. And I have no interest in the fight here, really.

    But with that said....

    No matter what happens in the future versions of Chrome, of Internet Explorer, of ECMAScript, whether you think Microsoft was a monopolist (or still is one!), whether you think Google is now evil (a mortal sin) or merely snarky (a venal one).....

    In this particular instance, Microsoft was the one the one doing good here, not Google.

  • Sorting it all Out

    Who the hell orders a slice with no toppings?

    • 4 Comments

    Disclaimer: The company I work for is a full member of Unicode and the provider of space for Unicode's home office, and I have been a full and associate member in the past.

    I have never claimed to be a big fan of the emoji, in any of the times they have come up here, e.g.

    and so on. I could go on, but the topic tends to drain me of a lot of energy. You can probably find more if yo0u are interested, though as goals go it isn't very exciting.

    In the end, searching for blogs on some topics is a bit like the flaw in trying to pick people up at the nude beach (that flaw is that most people -- myself included -- look more attractive with clothes on than clothes off, the flaw in the task of finding good emoji blogs is not that there aren't any - though there aren't -- it is that once you have the blogs in hand you feel less clean, less good, less fulfilled.

    Or maybe that's just me....

    Anyway, the one thing the emoji, that thing which in my mind can make Unicode feel less about interesting language issues and more about how to build a clip art factory, is that it can appeal to others

    If you need proof, see Adam Kuban's Unicode 6.0 Includes New Pizza Slice Character on "America's Favorite Pizza Weblog" Slice, which points to Typographer and pizza lover Nick Sherman's U+1F355: “SLICE OF PIZZA” character in Unicode 6.0 on his own Pizza Rules! Blog.

    And suddenly see who the heirs of Bernard R. Miler might be.

    Eventually some of these people who see so much more in the text than, well, the text, will be people affluent enough to join Unicode as members.

    And perhaps one day they will even vote Unicode beyond the constraints it has created for itself in this regard.

    I am reminded of a bit from an episode of The West Wing:

    You know, I never understand why you gun control people
    don't all join the N.R.A. They've got two million members.
    You bring three million to the next meeting... call a vote...
    All those in favor of tossing guns - [Snaps fingers] - Bam! Move on.

    I'm not saying it's necessarily a sensible strategy.

    But it may cost less for the fans of Pizza to get some of their topping into Unicode than the NRA thing would cost people.... :-)

  • Sorting it all Out

    Still with the Itanium?

    • 4 Comments

    So I walked inro a bar the other day.

    And it hurt. Where'd that bar come from? I should have ducked or something!

    Just kidding. :-)

    Anyway, I walked into a bar, and I was having a drink, with my friend Sam.

    Samantha.

    She was telling me about some show she had been at in Los Angeles, and the time she spent with the band after the show.

    The band itself did not impress me much and I told her so. But she said I was just being a music snob.

    Nolo Contendre.

    Anyway, she was talking about how interesting it was to just be hanging out with the band when a small meet and greet happened, and fans got to come in to the meet the band. She enjoyed watching people come in and gush about how much they loved the music, etc.

    I was non-committal about how interesting that sounded. Maybe it was me just thinking they didn't deserve to be celebrities. :-)

    Anyway, while we were having this conversation, somebody came up to us.

        "You've got to be Michael Kaplan!" he excitedly stated.
        I turned to Sam as I replied. "Yes, I've got to be."
        "I knew it had to be you because of the chair. And the license plate on it. Dude, I love your blog!"
        I smile a little, and again turn to Sam and tell her "He loves my blog."
        "Well I can't blame him, I love it too Michael."
        "Probably not for the same reasons," I suggest.
        Sam looks the guy over. "Probably not," she agrees.
        He pauses, nervous. "Can I ask you a question?" I nod to this.
        "I love MKLC, it's amazing," he gushes. Hurrying on quickly, he continues, "but I was wondering why it still supports IA64 if Microsoft doesn't."
        I point out that I no longer own MSKLC, so I can't say for sure, but the team that does is working on other stuff.
        " Wow, does that make you mad?" he asks.
        "No, I can't really control what they do with their time. I know they're busy...."
        He seems disappointed. "I guess I understand," he offers.
        "Hey, it's still a cool too. And they did sell a few of those IA64 boxes. You can just not ship the MSI and the IA64 subdirectory and chop 120kb off your setup if you know you don't need to support it."
        Sam chimes in at this point. "Michael, how the hell do you know how much space it takes up?"
        "Just an estimate. The keyboards are all like 6-8kb each, but the MSIs are like 110kb. I rounded up slightly."
        Sam rolls her eyes and the guy nods thoughtfully. "That makes sense."
        "I'll mention it to them," I say. "Maybe write something up in the blog about it."
        The guy is suddenly feeling shy. "Oh, you don't have to do that. Um, I mean, Unless you want to."
        "No worries. What's your name?"
        "Dan."
        Sam puts on her best 'Boon from Animal House' voice. "He's damn glad to meet you, Dan."

    After that, Sam decided she understood how I wasn't really impressed by the celebrity thing. Because a lot of times, the person of interest may not see themselves as a celebrity. I mean, I certainly don't. :-)

    Anyway, getting back to the issue Dan raised.

    Support for IA64 adds about 7mb to the MSKLC package itself, plus the extra code to generate the MSI and the DLL. That could be worth removing. After all as Dvorak pointed out, Itanium killed the computer industry. No need to keep supporting it further, right?

    Maybe this is worth doing at some point....

  • Sorting it all Out

    He got the cheap thrill. If he wants intelligent conversation about my iBot too, I better be getting dinner out of it....

    • 2 Comments

    Humming this variation on the song: You may know what its like to be held up, to be felt up, behind blue gloves...

    The patdowns.

    There is a lot in the news about the TSA patdowns.

    I think about it, but for me it is old news.

    As a guy in a wheelchair (and an iBot is a wheelchair no matter how cool people may think it is), I have been felt up by TSA on every flight I have taken.

    I don't care for it, I find it to be kind of humiliating.

    I'm not fond of radiation either, but I currently spend 6-18 hours a day sitting atop two NiCad batteries, so all I can say about radiation danger is ship sailed.

    So I'd choose the humiliating dangerous scanner if they would let me, but they can't -- the scanner won't work on me.

    So I endure being felt up by strangers behind blue gloves.

    For me it is a little worse, interestingly enough.

    Even though I am not currently given the "embarrass the living crap out of me intense patdown intended to embarrass me about not choosing the backscatter" patdown (presumably because I cannot even opt-in, which would make punishing me for opting out kind of stupid), the iBot itself gets some eyes on it. I remember the 6-year-old who asked his mom as she and he walked by me "Why are they doing that to him? That's a bad touch!" and it would be hard to say whether it was the mom or I who was more mortified.

    So I get the eyes on me as if they were trying to shame me into backscattering.

    Yet every time I go through I know the iBot distracts the TSA people too -- and I have to routinely point out to them the steps they have missed during the screening, steps involving the iBot itself.

    The TSA agents are occasionally suspicious but more often grateful for the assist.

    Sometimes they want to ask me about the iBot but I find myself drawing the line.

    Total strangers who entreat me with questions about the iBot will get a tiny slice of my time, if they have time.

    People I know can get a bit more if they want it.

    People I am involved with get a much time as they want (though talking about the iBot does not put me in the mood, if you know what I mean).

    But the stranger in the airport who feels me up while they wear blue gloves? He (or she -- it has sometimes been a she) got the cheap thrill. If intelligent conversation about my chair is also desired, I better be getting dinner out of it.... 

  • Sorting it all Out

    Oriya vs. Odia?

    • 7 Comments

    It isn't Korea vs. Corea.

    Or Chaudhuri vs. Chaudhary? for that matter.

    It isn't Farsi vs. Persian, either.

    And really it isn't Uighur vs. Uyghur, either.

    It does have a lot more in common with Macao vs. Macau.

    And Bangalore vs. Bengaluru is even closer still to the mark still.

    Now Oriya has a name, in its own script:

    ଓଡିଆ  

    as I mentioned in O-O-O-Iced Win7 and an Oriya Cookie, They forever go together, What a Classic Combination....:

    • It would seem that many people now refer to Orissa as Odissa or Odisha; given the spelling in the native language (ଓଡିଶା or O DDA I SHA AA) this does not seem entirely unreasonable. Microsoft has not caught up to that just yet.
    • Also, it would seem that many people now refer to Oriya as Odiya or Odia; given the spelling in the native language (ଓଡିଆ or O DDA I AA) this does also not seem entirely unreasonable. Microsoft has not caught up to that just yet, either.

    This particular English transliteration, like the Bangalore one, is widely described as a Britishism, as result of the British occupation.

    This does tell part of, well okay most of, the story, though I have had some additional information suggested to me about it.

    I can't speak to the veracity of the information, so hopefully readers who know will either confirm the logic or refute the claim.

    Many of the British transliterations were based on the Sanskritized transliteration of names, so apparently while it is true that ଡ (u+0b21, aka ORIYA LETTER DDA) is generally pronounced more like the "d" in "dark" or "darn", the Devanagari counterpart ड (U+0921, aka DEVANAGARI LETTER DDA) is almost like the "d" in "drum", with perhaps a less "guttural" bent where one has the tongue further back in the mouth and one is exhaling less. So one has not a true "r" but an "r" style mouth shaping.

    If true, then a transliteration scheme trying to figure out what to do to differentiate

    from from from

    by which I mean

    U+0b21 from U+0b22 from U+0b26 from U+0b27

    by which I mean

    ORIYA LETTER DDA from ORIYA LETTER DDHA from ORIYA LETTER DA from ORIYA LETTER DHA

    one might really cling to this difference as the way to differentiate ORIYA LETTER DDA from the others.

    Now as I said I do not know enough about Oriya to say for sure and although some casual web search led to some mild confirmation of some of the pronunciation "logic" suggested here, I haven't found anything overwhelming to confirm or refute this "logic".

    Though in the end, even if it is true it still represents something a native speaker may not be such a huge fan of. Who wants to rehabilitate the incorrect opinions of someone who is wrong about your language? :-)

    Now as articles like Orissa is now Odisha, Oriya is Odia point out, the government moved on this change with all due haste once it came up on the ballot.

    As with Uyghur (from Uighur) and other such changes, companies like Microsoft will probably jump on it where they can (i.e. in the next version), though there are limits there. I mean, since Unicode can't change character names, all of those names will need to stay

    Oriya Letter *

    and such, which means that Microsoft couldn't change Character Map or the Word Insert Symbol... dialog's notion of these character names either since it depends on the names as they are....

    I imagine descriptions of things like the HTML meta tags like "ISCII Oriya ##charset=x-iscii-or" might also be slow to update given the places it is buried. And there may be some cases like that they stay that way for a long time. people will need to know about both names of a while.

    But I support Odia here, even though I lose an "Oreo" pun out of the deal. Transliteration accuracy trumps my silly pun requirements any day (and the Oriya/Oreo one is on shaky ground anyway!).

  • Sorting it all Out

    Every character has a story #32: There are CJK Compatibility Ideographs, and then there are CJK Compatibility Ideographs

    • 2 Comments

    Now the line from Finding Forrester is what ran through my head. The one near the end. Earlier, Jamal had missed two free throws despite having a nearly perfect record for free throws, mainly to prove that he was not going to "dance" for the bigwigs on the board. After the climactic scene, Forrester asks him "So, those free throws....did you miss them, or did you miss them?"

    When I read the line that became the title of this blog, I knew I had to write this blog....

    You see, Unicode has a history.

    The whole Every Character Has a Story category is all about the colorful histories of some of the characters in Unicode. And what real characters they can be.

    The experts in Unicode are also "characters" in another sense, and perhaps a new series with biographies called Every Unicode Expert is a Character. But masybe that can be a subject for another day.

    Anyway, the other day, a complaint made while looking at the current state of a particular bit of the standard without the benefit of history, helped us get from Ken Whistler the real story of compatibility characters in Unicode. It was some great info, and I asked for (and received) permission to extract this information and write about it here.

    So without further adieu....


    > >> FA47 is a "compatibility character", and would have a compatibility
    > >> mapping.
    > > Faulty syllogism.
    > Formally correct answer but only because of something of a design
    > flaw in Unicode. When the type of mapping was decided on, people
    > didn't fully expect that NFC might become widely used/enforced,
    > making these distinctions appear wherever text is normalized in a
    > distributed architecture.

    O.k., I'm gonna have to intervene again. *hehe* Yes, there is a design flaw here, but Asmus' explanation is also somewhat faulty, because it flattens out the history in a way that is liable to be misunderstood.

    There is a *reason* why "when the type of mapping was decided on" that "people didn't fully expect that NFC might become widely used/enforced" -- but it wasn't that they were goofing up in understanding the implications of normalization. Rather, at that point in Unicode history NFC didn't *exist* yet, nor had the normalization algorithm been designed.

    Here, for the benefit of the standards geeks out there, are the relevant higlights of the historical timeline involved.

    June, 1992

      The canonical mappings for the CJK Compatibility characters were *printed* (with off-by-one errors for some of them!) in Unicode 1.0, volume 2 (= Unicode 1.0.1).
     
      Actually, at the time, we didn't know they were "canonical" mappings, because that concept hadn't formally been invented yet, but the intention was clear. They were the mappings from the "CJK compatibility ideographs" to the "real" unified Han ideographs in the standard. The CJK compatibility characters were all considered to be duplicates in the source standards that didn't follow the unification rules.
     
    July, 1996

      The formal definitions of "canonical decomposition" and "compatibility decomposition" were first published in Unicode 2.0. There wasn't a data file for the CJK Compatibility Ideographs block, but the canonical mappings were *printed* (correctly, this time) on pp. 7-470 to 7-472 of the standard.
     
    August 4, 1998

      The first published version of UnicodeData.txt that contained the canonical mappings for the CJK Compatibility Ideographs was UnicodeData-2.1.5.txt for Unicode 2.1.5. (Actually, they got into UnicodeData-2.1.4.txt on July 9, 1998, but that wasn't a published version of the data file.)
     
    July 23, 1999

      This was the publication data of the first approved version of UAX #15 (Revision 15), and so is the first published definition of NFC. (Of course UAX #15 had been in draft for some time earlier than that, so the term "NFC" can be tracked back in the drafts to mid-1998.)
     
    September, 1999

      Release of Unicode 3.0 -- the first release of Unicode formally tied to the Unicode Normalization Algorithm. (The revision of UAX #15 for the release was actually Revision 18, dated November 11, 1999.)
     
    March 23, 2001

      UAX #15, Version 3.1.0. This was the version of the Unicode Normalization Algorithm that specified the composition version to be Version 3.1.0 and locked down normalization forever more.
     
      So essentially, there was a 9 year period between when the first mappings were defined for the CJK Compatibility Ideographs and the date beyond which it became impossible to reinterpret or change a canonical mapping because of the lockdown of normalization.

      The problems resulting from the normalization for CJK Compatibility Ideographs only started to become visible to people *after* the lockdown, and when Unicode normalization started to become a regular feature of actual processing.

      And it wasn't because "people didn't fully expect that NFC might become widely used/enforced" -- or at least not the people in the UTC. The UAX #15 text published with Unicode 3.0 already stated: "The W3C Character Model for the World Wide Web requires the use of Normalization Form C for XML and related standards..."

      And it wasn't because of some oversight about the canonical appings involving the CJK Compatibility Ideographs per se. That same UAX #15 for Unicode 3.0 also stated: "With *all* normalization forms singleton characters (those with singleton canonical mappings) are replaced." So the ground facts for the FA10 --> (NFC/NFD/NFKC/NFKD) 585C normalization pattern were well-established and explicitly stated in 1999.

    > > FA47 is a CJK Compatibility character, which means it was encoded
    > > for compatibility purposes -- in this case to cover the round-trip
    > > mapping needed for JIS X 0213.
    > > However, it has a *canonical* decomposition mapping to U+6F22.
    > And that, of course, destroys the desired "round-trip" behavior if it is
    > inadvertently applied while the data are encoded in Unicode. Hence the
    > need to recreate a solution to the issue of variant forms with a different
    > mechanism, the ideographic variation sequence (and corresponding
    > database).

      Yes, that is basically correct. But, this architectural "design flaw" actually results from two additional requirements that accrued to the Unicode Standard well after its initial design:

    1. The requirement to be able to carry "round-trip" behavior through distributed environments.

      In the original design, the notion of how one would deal with legacy data was conceived of primarily as a controlled and contained conversion issue. An application/system would convert legacy data to Unicode, and if it needed to convert back, it could use compatibility characters for round-trip conversion. The system would know how and when it could normalize, because it controlled the data and the conversion.

    2. The requirement to be able to maintain CJK variant glyph distinctions in plain text data.

      Again, that was not at all a part of the original Unicode Standard design.

      So the essential nature of the problem is that these new requirements have mostly accrued to Unicode implementations *after* 2001, more or less at the point when the lockdown of Unicode normalization made it impossible for normalization to be adjusted in any way to account for them.

      Hence the need to construct an *alternative* approach involving variation selectors, which would be robust and invariant under normalization transformations.

    > > The behavior in BabelPad is correct: U+6F22 is the NFC form of U+FA47.
    > > Easily verified, for example, by checking the FA47 entry in
    > > NormalizationTest.txt in the UCD.
    > While correct, it's something that remains a bit of a gotcha.

      Yeah, well, the basic gotcha is that no matter how many times I say it or what the Unicode Standard says, people will continue to just assume "compatibility character" implies "compatibility decomposition". For everybody on the list, I recommend frequent re-reading of Section 2.3, Compatibility Characters, of the standard:

    http://www.unicode.org/versions/Unicode5.2.0/ch02.pdf

    whenever somebody mentions "compatibility" in discussion of Unicode. Yes, I suspect that people will find their heads hurting -- but this subject *is* complex, and generalizations that people make about "compatibility characters" are often wrong when they don't pay attention to the details.

    > Especially now that Unicode has charts that go to great
    > length showing the different glyphs for these characters,

      Well, even there the issue is complicated, because there are CJK Compatibility Ideographs, and then there are CJK Compatibility Ideographs. They fall into at least 3 important classes:

    1. Ones which really are *unified* ideographs, despite their names.
    2. Ones which are *pronunciation* variants from KS X 1001, and which are *not* intended to show different glyphs.
    3. Ones which are *graphical* variants from other legacy standards, and which *are* intended to show different glyphs.

      And even class 3 has subtypes, because some show variants that are distinguished only in one legacy standard, whereas some are themselves cross-mapped between more than one legacy standard -- putatively because each legacy standard shows the same variant glyph.

      It is class 3 that may be adversely affected *visually* by the application of normalization in a distributed environment.

    > I would suggest adding a note to the charts that make clear that these
    > distinctions are *removed* anytime the text is normalized, which, in a
    > distributed architecture may happen anytime.

      The CJK Compatibility Ideographs already have warnings attached to them in the standard. They are repeatedly documented as "only for round-trip compatibility with XYZ" and "They should not be used for any other purpose."

      However, I think your point is a valid one. Now that the clear answer for maintaining legacy CJK glyph variant distinctions in a distributed environment is via ideographic variation sequences as registered in the IVD, it would make sense to beef up the CJK Compatibility Ideograph documentation with better pointers (and with accompanying rationale text) to UTS #37 and the IVD, and to post stronger warning labels in the code charts for CJK Compatibility Ideographs.

      Perhaps someone would like to make a detailed proposal to the UTC for how to fix the text and charts? ;-)


    Well, I won't go that far.

    But I will capture the conversation so people can learn something about the meaning of compatibility characters in Unicode.

  • Sorting it all Out

    "Crap, I didn't start Schedule+", [some of] why I still don't love Outlook, and other nostalgias

    • 1 Comments

    I can't claim I have ever been the biggest fan of Microsoft Outlook.

    I didn't hate it or anything. It just had some qualities that I didn't care for.

    Like when Outlook 97 first came out and I had a very old Dell Laptop with an 800x600 screen. And I needed as much vertical screen real estate as I could get.

    I begged several of the Outlook folks I knew to add a way to turn off the annoying gray bar that had the folder name on it.

    "But Michael," they implored, "then you won't know what folder you are in!"

    "Of course I will!" I replied, testily. "I have Folder View on, I like Folder View."

    "But Michael," they insisted, "no one uses folder view. This is part of our distinctive look!"

    Screw that. I liked the Exchange client's unique look. I stuck with that.

    As a consequence, I got to miss HTML mail in its prime. And I missed the early batch of email viruses that hit so many people, as a bonus.

    People throughout Office would be running daily builds of the latest pre-release build of Office, and they would send emails that would say stuff like

    Sent using Outlook {insert version and daily build number}

    To inject the appropriate amount of snark, i would add the following in a 1pt font so no one could read it without expanding it manually:

    Sent using Exchange 5.0.1457.3, because HTML mail makes my Sent Items folder feel bloated!

    and a few dozen other random punchlines. I actually automated the sig generator to randomly put them in occasionally and vary them, and every once in a while someone would blow it up, see the message, and smile.

    In fact, I only gave up on doing that after too many occasions of people pasting mail into Raid (the bug database) which had no rich text and thereby revealed my hidden messages to all.

    I stuck with the Exchange client until about a year and a half after Outlook started supporting Unicode PST files.

    But it was a painful move.

    I miss having the mail client (Exchange) and the calendaring app (Schedule+) being separate -- because by accepting every meeting invite I received in email but never launching Schedule+, I:

    • made others happy by saying I'd be at their important meetings;
    • made myself happy by being able to miss their silly meetings;
    • had a built-in affirmative excuse when anyone complained I missed the meeting that any developer can appreciate:

      Crap, I didn't start Schedule+

    :-)

    And over the years mOre and more features were there that I didn't like.

    But in all that time, the one feature I really missed most of all....

    You see, I missed the Schedule+ Meeting Wizard.

    What I missed most about the Meeting Wizard was when you could set the Meeting Length, defined as follows:

    Meeting Length

    Specify the duration and travel times, if appropriate, for the meeting. The specified travel time is added automatically to the meeting's time slot in the attendees' schedule.

    Duration: Type or select the number and units, such as hours, for the meeting's length.

    To meeting: Type or select the number of minutes attendees may need to travel to the meeting location.

    From meeting: Type or select the number of minutes attendees may need to travel from the meeting location

    I missed that feature even as it was.

    I am tired of being fully booked for meetings that are 40 minutes away rolling (or over half an hour sometimews to get a shuttle given the terrible accessible shuttle situation!). With 25% of my job in Building 27 on the main campus and 25% of it in Buildings 84/85 in the Safeco buildings, having to miss up to half a meeting due to the inability to have Scotty beam me directly over makes me wish Outlook cared about such real productivity problems that the product they killed was solving before they were even an idea, let alone an outlook!

    Forget about then though. What about now? Today?

    When I think about the long distances between various buildings on the main campus and the north campus and redwest and so on....

    And when I think about the abilities Active Directory gave us with printers and the ability to find printers near me....

    And when I think about Bing maps and travel times and such....

    This would be a kick-ass and cool feature for Outlook to have!!!

    A natural extension of a feature that Exchange's Schedule+ had for years and that Outlook didn't deign to add for the last seven versions (since they were too busy screwing up unfinished holiday support and Hijri calendars and such, I guess -- it takes a long time to under-architect).

    If only there were an awesome mail/scheduling client around, this would be a really cool Exchange/AD integration feature.

    I can't help feeling like if the Exchange client and Schedule+ were only still around.

    Maybe one day if Outlook could be as cool as its predecessor in some of these really practical ways, I could learn to love it....

  • Sorting it all Out

    The bizarre variation of a skeleton that is iBot + me, to a Kinect

    • 6 Comments

    I'll start by saying I think Kinect is, generally speaking, very cool.

    Just so you know where I stand, generally. :-)

    Others have written about the Kinect and wheelchairs before, like in I Connect with Kinect from August 18th.

    I have only interacted with the Kinect twice:

    • Once at the end of August (I think the 26th), at an all day meeting talking about the Kinect and accessibility (they took some recordings of the iBot there);
    • Once at the end of October (I think the 28th), where someone on the Kinect team got a bunch of recordings of the iBot interacting with the Kinect.

    Now when I read reviews and see really cool displays like this one, and the possibilities here are very exciting.

    I was not given the opportunity to take pictures or video on either occasion, but maybe if I get a chance I'll get some another day!

    That meeting in August was a half day talking about the device, and a half day with us trying it out. Most of the time I was "trying it out" various people on the Kinect team were having me try things out on different devices (that were running at least three different builds of the Kinect firmware/software).

    I guess they had already tried out the Kinect with wheelchairs, but the iBot really confused the Kinect in every build I tried. The primary reason for this is that the detection has two core 'tracks" or "modes" that it uses -- one standing, and one sitting.

    For the iBot, it couldn't figure out which one was right. Because I seemed to be sitting, yet I also seemed to have legs far enough down to the ground that I appeared to be standing. Not to mention my iBot joystick, especially when it was pushed off to one side a bit, seemed a lot like my right arm.

    Skeletal tracking was obviously a little confused in this bizarre variation of a skeleton that is iBot + me.

    Now the October build I tried a few months later was pretty much the final or close to it. They had actually done some work to fix a lot of the problems with skeletal tracking as it applies to a guy in an iBot.

    I guess I should say they "fixed" it, actually. Sincw the fix was to make sure I was always in the sitting mode (they call it the sitting "pipe") and never the standing mode.

    This "fix" actually makes a lot of games not work, since many of require standing mode -- even if you are doing something that clearly implies sitting like driving a car.

    I've talked to enough people on the team at this point that  I know this could be considered a limitation with two clear architecting causes:

    • The best practices in the Kinect SDK given to game developers that does not push harder on the need to work with the seated pipe, and
    • The game developers themselves, who gave so few options here for the seated pipe.

    In my mind, these are the immediate problems that will make the Kinect more generally useful for me (in its current state and the state of games most of it is unavailable to me).

    So for now, the Kinect took a person who generally does not feel very disabled (me) due to my iBot and made me feel excluded. The same way that I might be if I were not allowed in certain buildings or to certain events.

    But in the long term there is more to it.

    If you look at the community of people who are in wheelchairs and those who have problems such as missing limbs as one big group for a moment, they fall into two distinct categories:

    1. Those who see themselves as whole people with four fully active limbs, and
    2. Those who see themselves with the particular limitation their disability gives them.

    This can be thought of the way Morpheus described it to Neo, the residual self image, also described here:

    Residual self image is the concept that individuals tend to think of themselves as projecting a certain physical appearance. The term was popularized in fiction by the Matrix series, where persons who existed in a digitally created world would subconsciously maintain the physical appearance that they had become accustomed to projecting.

    Now I think it is fair to say (once these issues related to best practices and game developers are more widely addressed) that every person who is either missing limb or limbs or who is in a wheelchair see themselves in one of the two ways I mention above, and they would ideally want their Kinect avatar to match that, in large part. Furthermore, I think that if they (by which I also mean me) are asked to have an avatar that denies them their residual self image then they will not enjoy the experience as much.

    Once this is addressed, there is remarkable potential to give people back the life they want if the ability to virtually extend themselves into a digital world lets them be who they feel they really are, with their own residual self image, even if circumstances have worked to interfere with that image a bit.

    I could spitball (and have spitballed) thoughts on how this might work, i.e. configuration options that would alter the way the Kinect projects the user to the games based on their images of themselves. But I have no idea how all of that will go, how much they will do.

    Some of these issues are issues I hinted at in a blog back in February (this blog), though at the time I did not give the option of a digital way out of things -- or a way to move beyond things.

    But I think the Kinect has some very real possibilities here, and I look forward to seeing what they come up with....

    If I find myself in front of a Kinect again, I'll try to get some pictures and/or video. It really is pretty cool to see it work with something it is clearly not entirely comfortable with!

  • Sorting it all Out

    A lot of LIPs for South Africa!

    • 5 Comments

    THE WINDOWS 7 LANGUAGE INTERFACE PACKS FOR SOUTH AFRICA ARE LIVE!

    You can click on the links below to download them via the Microsoft.com Download Center:

    Please note that the South African Windows 7 LIPs can only be installed on a system that runs an English client version of Windows 7. They are all available to download for both 32-bit and 64-bit systems.

    The South African Windows 7 LIPs are produced as part of the Local Language Program sponsored by Public Sector.

    A LITTLE BACKGROUND INFORMATION ON THE SOUTH AFRICAN LANGUAGES

    NUMBER OF FIRST LANGUAGE SPEAKERS:

    Language Speaker numbers
    isiZulu ~10 million
    isiXhosa ~8 million
    Sesotho sa Leboa ~4 million
    Afrikaans ~6.5 million

    PREDOMINANT DISTRIBUTION IN SOUTH AFRICA:

    SOME FUN FACTS:

    • IsiZulu and isiXhosa stand apart from the majority of Bantu languages by including a series of click sounds which are borrowed from the indigenous inhabitants of the region, the Khoi and San people (also known as the Bushmen).  The first consonant in the stem Xhosa (isi- being a prefix) is a click sound, one of a total of 15 such sounds in that language. The most famous speaker of isiXhosa is Nelson Mandela. 
    • isiZulu has a rich literary tradition that dates back to 1933 when the first novel in isiZulu appeared.  Banned from use in the education system and suppressed during the years of Apartheid anywhere in South Africa but in the KwaZulu Bantustan, isiZulu is enjoying a linguistic and cultural revival. Radio and TV shows are broadcasted in the language and the isiZulu language film “Yesterday” was nominated for the Best Foreign Language Film at the 77th Academy Awards.
    • Sesotho sa Leboa is also often referred to as Sepedi or Northern Sotho to distinguish it from its close relative, Sesotho, which is also an official language of South Africa.  Sesotho sa Leboa is a cluster of some 30 closely related dialects.
    • There is a marked degree of mutual intelligibility between Afrikaans and its European ancestor, Dutch.
    • Afrikaans, finally, is famous for its double negative nie ...nie, which functions very much like the French negation ne ... pas.  Two Afrikaans words have made their way into the English lexicon, namely Apartheid and Trek (as in “Star Trek”).

     

     For more information on these languages, see:

    Enjoy!

Page 1 of 2 (27 items) 12