Blog - Title

May, 2005

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Typing in random Unicode code points

    • 27 Comments

    People ask all the time how they can type in random Unicode data.

    Some people point out the vast array of supported Keyboard Layouts on Windows.

    Others point out how you can create your own keyboards with MSKLC.

    Still others talk about fancy things you can do with the numeric keypad.

    And then still others like to go on about typing a code point value in Word, highlighting it, and then hitting <Alt+X>.

    Personally, I like to just install the Unicode IME, first added for Traditional Chinese in Windows 2000 and available in every version of Windows since then. Just install it:

    and then it will be on your list of available input languages....

    Simple to use -- just switch to it with <Left Alt+ Shift> and start typing hex numbers in any application....

    and then when you type a full Unicode code point, it will commit the character automatically!

    A very cool stealth feature available in all even moderately recent versions of Windows! :-)

     

    This post brought to you by "Ʒ" (U+01b7, a.k.a. LATIN CAPITAL LETTER EZH)
    A character that was feeling a little cheated by the small post it ended up sponsoring earlier -- thus the second sponsorship!

  • Sorting it all Out

    Zipping up Unicode file names

    • 22 Comments

    Let's create the following filenames:

    • αβγδεζηθ.txt
    • АБВГДЕЖЗ.txt
    • אבגדהוזח.txt
    • กขฃคฅฆจ.txt

    (they can be empty or have data in them)

    And then try to zip them up with your favorite program (I'll use WinZip, you can use anything you like here).

    The zip will fail, in the case of WinZip with the following error:

    ---------------------------
    WinZip
    ---------------------------
    Error: No files were found for this action that match your criteria - nothing to do. (C:\TEMP\TEMP.zip).
    ---------------------------
    OK   Help  
    ---------------------------

    And then if you choose to look at the error log you will see why you had zero files instead of the four you asked it zip up:

    Action: Add (and replace) files Include subfolders: yes Save full path: no
    Include system and hidden files: yes
    "C:\TEMP\aß?de???.txt" is not a valid file name and was skipped
    "C:\TEMP\????????.txt" is not a valid file name and was skipped
    "C:\TEMP\????????.txt" is not a valid file name and was skipped
    Warning: name not matched: C:\TEMP\????????.txt
    "C:\TEMP\aß?de???.txt" is not a valid file name and was skipped
    "C:\TEMP\????????.txt" is not a valid file name and was skipped
    "C:\TEMP\????????.txt" is not a valid file name and was skipped
    Warning: name not matched: C:\TEMP\????????.txt
    "C:\TEMP\???????.txt" is not a valid file name and was skipped
    Warning: name not matched: C:\TEMP\???????.txt
    "C:\TEMP\aß?de???.txt" is not a valid file name and was skipped
    Warning: name not matched: C:\TEMP\aß?de???.txt
    Error: No files were found for this action that match your criteria - nothing to do. (C:\TEMP\TEMP.zip)

    Your mileage may vary if your default system code page supports one of these filenames, and those question marks with best fit mappings for the Greek names will probably give the clues as to what is going on here.

    The ZIP format is fine with Unicode data in filenames, but is not so fine with the filenames themselves being off of the default system code page.

    Curses, foiled again!

    Now one could work around this by using the short file names, but this would have a negative impact on being able to use them in the ZIP file:

    • αβγδεζηθ.txt --> 3864~1.TXT
    • АБВГДЕЖЗ.txt --> 833B~1.TXT
    • אבגדהוזח.txt --> A0E9~1.TXT
    • กขฃคฅฆจ.txt --> 0344~1.TXT

    I think we need to have someone look into an extension to the ZIP format....

     

    This post brought to you by "Ž" (U+017d, a.k.a. LATIN CAPITAL LETTER Z WITH CARON)
    (which is unfortunately not a zippable file name character on most code pages)

  • Sorting it all Out

    Virtual PC needs international thought about its keyboard support

    • 20 Comments

    I will start by saying that Virtual PC 2004 rocks. It truly does.

    I spent a bunch of time yesterday building interesting Virtual Machines based on my simple Windows 2000, XP, and Server 2003 VMs that I keep around and the fact that it really rocks as a product was never to far from my mind.

    Well, until I was chaging the default keyboard layouts of the guest machines in the VMs and my host machine for some demos I will be using them for. As a product, Virtual PC did fall a tiny little bit in my otherwise high opinion of it....

    To start with, it is not so great with its default behavior used to switch out of the guest machine and move to the host machine -- it uses the Right Alt key. This has two problems, one not really international and one really so:

    1. Not all keyboards have a Right Alt key; there are many laptops, for example, that do not (I admit that they themselves kind of stink internationally for reason #2!).
    2. Almost every language keyboard used in Europe and India and several keyboards used other places need that key to sensibly reach the AltGR shift state.

    Take for example the German keyboard layout's base and AltGR shift states (cribbed screen shots of the GlobalDev Windows keyboard layouts page:

             

    By the way, who thinks GlobalDev ought to have an RSS feed? I do! Russ, are you reading this post? :-)

    Now generally, the keyboard does not have a good way to get to the AltGR functionality without that key (Ctrl+Alt is unwieldy for ordinary typing). German is a minor example, with a mere twelve assigned keystrokes (though someone who wanted to type a Euro sign might take exception at calling it a "mere" anything!), some of those other keyboards have many more assigned, and a few even have Shift+AltGR assignments as well.

    Now of course there is a way to change the keystroke assignment here (item #9 in the Keyboard functionality, troubleshooting topic):

    When the right ALT key is configured to be the host key, navigating through menus using only the keyboard does not work.

    Cause:  On some keyboards, when the host key is the right ALT key, the Host key + ALT + menu accelerator does not work to access menus.

    Solution:  You must change the host key to be something other than ALT or CTRL. For more information see, To change the host key used for Virtual PC.

    And then that topic has the actual instructions:

    To change the host key used for Virtual PC

    1. Open Virtual PC Console.
    2. On the File menu, click Options.
    3. Click Keyboard.
    4. In the Current host key box, click the key name to select it, and then, on the keyboard, press the key that you want to use as the host key.

      The name of the new host key is displayed in the Current host key box.

    Notes

    • To open Virtual PC Console, on the host operating system, click Start, point to All Programs or Programs, and then click Microsoft Virtual PC.
    • The host key is used for several functions, including mouse control and providing CTRL key functionality in menu shortcuts.

    Now we will ignore the tortuous nature of the instructions for a moment. Let us remember for a moment that up to 60% of Microsoft's market is outside of the US and that 70-100% of that majority do not want to use English a significant percentage of the time (numbers that will only get worse for English as products like MSKLC, ELKs, LIPs, and XP Starter Edition help to further expand the ability of people to read and write in their native language. In such an environment, is using the Right Alt key as a default setting here when large parts of the market for Microsoft products (most or all of which do not have a localized versions of Virtual PC 2004!) a smart choice?

    I just checked, there is not an International English version of it, at present. I do see a note in a Virtual Server 2005 Evaluation FAQ about this:

    Q. Is the Virtual Server 2005 Evaluation Kit available in languages other than English?

    A. Localized versions of the Evaluation Kit may be available in some locales. We will release information about non-English versions of the Kit as it becomes available. However, until then, you are welcome to order the English-language version through this site.

    There is also a note in the Virtual Server 2005 FAQ about this:

    Q. Will localized versions of Virtual Server 2005 be made available?

    A. Virtual Server 2005 will be available in English and Japanese. 

    And it is true -- there is a localized Japanese version of Virtual Server 2005 (though not of Virtual PC 2004). Japanese thankfully does not have most of these problems though I believe I have seen a few Japanese laptops over the last few years that (hard pressed to fit the Hiragana/Katakana, English/Japanese, and Fullwidth/Halfwidth buttons) did not have a Right Alt key? :-)

    Beyond having a better default being chosen, some work to try to detect the situation (on either end) and offer other options, such as altering the setting. Maybe allowing a shifted keystroke would also be a helpful option (currently you can only pick any one key). There are a lot of things that could be done here to make the situation better for the English version if additional localized versions are not planned, or in all versions if they are.

    ....

    The other problem that comes up is the potential difference in layouts between the host and guest machine, and dealing with that difference in a way that is as intuitive as possible for users. This is of course a problem that Terminal Services has long had to deal with, and is not an easy one, either in the logon screen where options are admittedly more limited or once one is logged on, where they could potentially have interesting solutions they could architect.

    But the Virtual PC solution is not the most original or intuitive of possible options. It is in item #7 in the Keyboard functionality, troubleshooting topic:

    My keyboard supports multiple languages but certain characters are not working in a virtual machine.

    Cause:  The guest operating system must be configured to support the language that contains the characters you want to use.

    Solution:  Refer to the documentation for the guest operating system to configure support for the language you want to use.

    Yikes! I guess that is a solution. But I can't help wondering if more could be done here. Now obviously Virtual PC is casting a much wider net than Terminal Services (I am sure we will see clients for Linux long before we would see servers!) but it seems like there might be some things that could be done if it is running a Windows OS, couldn't there be? Or maybe an explanation that there could not and more help on how to deal with the problem then one semi-hidden help question in an FAQ. :-(

    Now I will keep on using Virtual PC, because frankly it still does rock, and a lot of the above can be explained by the fact that it is a product that Microsoft first picked up through acquiring another company, and perhaps future versions will work harder to help solve the issues now that the company that owns the product does have that larger focus.

    But it definitely has some work to do in its next major version on the international front. :-)

    And in the meantime it can serve to help application developers out there think hard about what they do with keyboard issues in their own applications, to avoid similar issues of greater or lesser scale....

     

    This post is brought to you by "V" (U+0056, a.k.a. LATIN CAPITAL LETTER V)

  • Sorting it all Out

    Similar descriptions does not mean similar methodologies

    • 16 Comments

    The other day, I had to take a look at the various unmanaged case insensitive string comparison functions. I thought I would post what the comparison/contrast information.

    First the locale sensitive functions:

    • CompareStringW (kernel32.dll) -- the mother of all of the functions below, you can choose the locale, the flags, and whether the strings are counted or null-terminated. Embedded nulls are allowed.
    • lstrcmpiW (user32.dll) -- assumes null-terminated strings, then calls CompareStringW with the NORM_IGNORECASE flag and the thread locale (if that fails then it tries again with the system locale; in the unlikely event both fail, it uses a call to _wcsicmp).
    • _wcsicoll (CRT) -- assumes null-terminated strings. If using the "C" locale, does an ASCII (A to Z) ToLowercase followed by a binary compare; otherwise it calls CompareStringW with the LCID of the CRT locale and the SORT_STRINGSORT and NORM_IGNORECASE flags.
    • _wcsnicoll (CRT) -- takes one count parameter for both strings, but will also exit on an embedded null. If using the "C" locale, does an ASCII (A to Z) ToLowercase followed by a binary compare; otherwise it calls CompareStringW with the LCID of the CRT locale and the SORT_STRINGSORT and NORM_IGNORECASE flags (note that using just one count parameter will break compressions on locales that use them and expansions on all locales).
    • StrCmpIW (shlwapi.dll) -- assumes null-terminated strings, then calls CompareStringW with the NORM_IGNORECASE flag and the thread locale (if that fails then it tries again with the system locale). Manages to look a lot like lstrcmpiW, though not completely so in rare scenarios.
    • StrCmpNIW (shlwapi.dll) -- takes one count parameter for both strings, but will also exit on an embedded null. It calls CompareStringW with the thread locale of the CRT locale and the NORM_IGNORECASE flags (note that using just one count parameter will break compressions on locales that use them and expansions on all locales). Manages to look a lot like a hybrid of lstrcmpiW and _wcsnicoll.
    • StrCmpLogicalW (shlwapi.dll) -- does linguistic comparisons using the thread locale (falling back to the system locale on failure), cleverly wrapping multiple calls to CompareStringW to support treating the 0123456789 digits as numbers.

    And now the locale insensitive functions:

    • RtlCompareUnicodeString (ntdll.dll) -- taking lengths in it UNICODE_STRING parameters (and allowing embedded nulls), it converts characters to uppercase and then does a binary comparison on them. This comparison matches what a lot of the operating system does for many of its objects (most of which use this very function!).
    • _wcsicmp (CRT) -- assumes null-terminated strings. If using the "C" locale, on each character it does an ASCII (A to Z) ToLowercase followed by a binary compare; otherwise on each character it does a full ToLowercase followed by a binary compare.
    • _wcsnicmp (CRT) -- takes one count parameter for both strings, but will also exit on an embedded null. If using the "C" locale, on each character it does an ASCII (A to Z) ToLowercase followed by a binary compare; otherwise on each character it does a full ToLowercase followed by a binary compare.
    • StrCmpICW (shlwapi.dll) -- assumes null-terminated strings. On each character it does an ASCII (A to Z) ToLowercase followed by a binary compare. It matches the "C" locale behavior of _wcsicmp, which of course does not match the OS behavior at all.
    • StrCmpNICW (shlwapi.dll) -- takes one count parameter for both strings, but will also exit on an embedded null. On each character it does an ASCII (A to Z) ToLowercase followed by a binary compare. It matches the "C" locale behavior of _wcsicmp, which of course does not match the OS behavior at all.

    A few interesting points about these functions:

    1) According to comments in the SHLWAPI source, many of them were initially added because the CRT and user32 counterparts were not supported on earlier versions of Win9x. Kind of ironic when you note the small behavior differences between them all, huh?

    2) Given the Georgian casing issue, it is a little sad that almost all of these functions that convert prior to comparison use a lowercasing operation when so much of the core OS uses uppercasing. Especially given how often people use the functions to emulate the OS behavior for tidier validation messages. Luckily, the amount of data in Khutsuri is small so the inconsistency is not often noticed.

    3) Am I the only person who thinks it is weird that _wcsicmp and _wcsnicmp have locale-specific behaviors, especially such really weird ones? They doc this a bit I guess, but until I looked at the code I would never have guessed.

    4) CompareStringW is definitely the king of the linguistic comparison -- everyone else is either (a) calling our function, (b) doing the job wrong, or (c) both!

    Now there is no king (nor good heir apparent) for the non-linguistic comparison right now in unmanaged code, like I talk about here.

    Yes, I am still thinking about it. :-) 

    The situation is kind of like when you have a vacancy in management and a lot of "wannabe" replacements (like these other functions), none of whom really fit the bill and none of whom can get the job done themselves. If you know what I mean....

     

    This post brought to you by "ς" (U+03c2, a.k.a. GREEK SMALL LETTER FINAL SIGMA)

  • Sorting it all Out

    UCS-2 vs. UTF-16 (not quite Kramer vs. Kramer)

    • 16 Comments

    Rasqual asked, in the suggestion box:

    Windows associates the idea of Unicode with 'Wide char', that is a 2-byte long character (currently).

    A comment on Raymond Chen's blog stated that Windows 2000 uses the UCS-2 representation of Unicode
    and Windows XP and higher use UTF-16 (both little-endian).

    Can you put up an explanation of whether a UCS-2 byte stream may be considered valid UTF-16?

    The point is: can I use UTF-16 generically to handle "wide char" text or are there some caveats? Would a
    call to, say, CreateFileW with a filename containing surrogates fail on Windows 2000?

    Is it unsafe to assume 1 WCHAR == 1 Unicode character?

    Note that this has no practical application, just things I'm wondering about.

    Well, in an absolutely technical sense, at the file system level -- or even at the level where CreateFileW works, Windows is neither. The OS simply takes an array of WCHAR values with a maximum size and a null WCHAR at the end, with a small number of illegal WCHAR values representing doublequotes and such. There are all sorts of obnoxious things you can put in there -- illegal Thai sequences, unpaired surrogate code units, undefined code points -- and they will simply work. The only thing that is done is an uppercase table is consulted to create case insensitivity.

    However, when you move up to the level of displaying the list of files in a directory -- the Windows Shell (which is indeed where Raymond Chen works!), suddenly we start becoming conformant to all sorts of different standards and practices. And all of those misbehaving strings suddenly don't look very good (a fact that you do not notice when its just a small array of WCHAR values). And here is where the issue of surrogate pairs gets interesting....

    Now when Windows 2000 first shipped, there were not any actual defined supplementary characters (other than the Plane 14 language tags that no one liked or the Plane 15 and 16 private use characters that no one used).

    Because of this more than anything else, Windows 2000 is not really "surrogate-enabled" by default. But there is nothing to stop surrogate code units from being used legally with valid pairings of high and low surrogates to represent supplementary characters. So people (if they said anything) would tend to just say it supports "UCS-2" as a shorthand for saying that it was "surrogate neutral." It had no knowledge or understanding of what these code points are, but is not actively destructive. But usually it would not come up....

    By the time Windows XP, the landscape had changed a bit.

    The OpenType spec extensions to support supplementary characters were mature and people were making use of them. Although there were not yet fonts shippng in the operating system, there were fonts out there, some available in Micrsosoft products and others from third parties. And anytime Uniscribe was turned on, the extra work to make sure that surrogate pairs got treated as one character (showing just one NULL GLYPH if no font was available), paryially supported in Windows 2000, became more fully supported.

    At some point, a switch was flipped and everyone started talking about all of the work that had been done. But how do you describe infrastructure when you do not have fonts to actually display the characters? The only way it could be described was that we now support UTF-16, whereas before it was just UCS-2. And the whole distinction between the two platforms was made, kind of after the fact.

     

    This post brought to you by "𐒀" (U+10480, a.k.a. OSMANYA LETTER ALEF)
    (or U+d801 U+dc80 for people who prefer to work in surrogate pairs!)

  • Sorting it all Out

    The last word on the FINAL SIGMA

    • 16 Comments

    Back in the beginning of April, I explained about the one scenario where casing does not need to roundtrip in .NET -- the Greek final sigma.

    Anyway, the day before yesterday I got an email from someone who had been reading my blog and was looking at all of the one-way mappings that are in the linguistic tables (accessed with the LCMAP_LINGUISTIC_CASING flag, which I have discussed previously). He was wondering why that FINAL SIGMA could not be put into the linguistic tables since it is a one-way mapping.

    A fair question, one I thought worthy of a post. :-)

    If you are a native speaker of Greek, then you know that both ς (U+03c2, a.k.a. GREEK SMALL LETTER FINAL SIGMA) and σ (U+03c3, a.k.a. GREEK SMALL LETTER SIGMA) do indeed uppercase to Σ (U+03a3, a.k.a. GREEK CAPITAL LETTER SIGMA). But if we added this character to the linguistic table, then it suddenly ς would never work in the CharUpper/CharUpperBuff functions and would not work in the default call to LCMapString with the LCMapString function with the LCMAP_UPPERCASE flag.

    Obviously that would not be a good thing.

    Try to imagine how you would feel if attempting to uppercase the string hello would come out as HELLo. Wouldn't you consider it a bug? Especially is it used to come out with the HELLO you were expecting? You might be thinking about telling the platform GooDBYE, if you know what I mean.

    Of course ideally the functions would notice whether the Σ was at the end of a word and then decide whether to use ς or σ, depending. But LCMapString does not really look beyond the character level here, so until it does that would not really be an option.

    Though of course a more sophisticated application might work to provide results beyond the character boundary. Though I do not envy such programs; the boundary for them becomes quite fuzzy if you have non-Greek characters after the ς. Does that count as a new word or doesn't it? That is the kind of question where an API can never win -- no matter which way it goes, there will be some people who do not like the answer.

    Anyway, that is why ς is not uppercased only in the linguistic table. Because there are too many cases where the results simply don't make sense, at least not as things are implemented currently....

     

    This post brought to you by "ς" (U+03c2, a.k.a. GREEK SMALL LETTER FINAL SIGMA)
    A character that wonders whether Unicode would have been simpler if it did not exist as an independent entity, and fionts could then decide whether to make it a "final" form or not....

  • Sorting it all Out

    Getting exactly ONE Unicode code point out of UTF-8

    • 15 Comments

    Now this is a question that I would make into an interview question, if only there were some way to do all the setup work in time.

    Unfortunately, unless the candidate is very knowledgable about internationalization on the way in, there is no way to get them all of the information to solve the problem in such a short time.

    I figure that if it can't make a good interview question, it might at least make a good blog post! :-)

    Anyway, the question is simple -- how can you get a single UTF-8 code point out of a stream of UTF-8 data?

    There are many times you may want to do this -- like if you are looking for a UTF-8 BOM. But the real question is how can you do it....

    (If you are going to test yourself then read no further and try to work out a solution, then come back to look at the rest of the post!)

    Now legal (valid) UTF-8 will be between one and four bytes per code point. Thus if converting four bytes of UTF-8 to UTF-16, you could end up with between one and four UTF-16 code points. Using the distribution I pointed out yesterday:

    now right away this may point out one method you can use -- just take four bytes and convert them into a four-WCHAR scratchpad with the MultiByteToWideChar function. Then ignore it all except for the first WCHAR, like this:

    WCHAR Scratch[4] = {0};
    int cch =
    MultiByteToWideChar(CP_UTF8, 0, lpsz, 4, Scratch, sizeof(Scratch) / sizeof(WCHAR));

    if(cch > 0) {
        // Use Scratch[0] to do whatever you wanted to do here
    }

    Now if this were an interview I'd expect that the candidate (who would have impressed me if she thought of this answer) would take care of the obvious issues like making sure that the UTF-8 buffer represented by lpsz was at least four bytes in size.

    And I would then point out that a call to lstrlenA is not the best answer since that would walk the whole string when you only needed to walk at most four bytes of it. The easiest solution then would be to just write a mini lstrlenN-esque function or just walk a few bytes right there.

    And then (if this were an interview) I would ask him to perhaps build this approach into a function that would handle the generic case of a function whose job is to "nibble" that first code point out.

    But first I would ask her about how she would conditionally increment the pointer past that one character.

    Hmmm.... almost a brain teaser!

    That cch value will be somewhere from 1 to 4, which gives some hints:

    • If it is 4, you only have to increment by 1 byte;
    • If it is 1, you have to increment by 4 bytes;
    • If it is 2, you have to increment by 1, 2, or 3 bytes (and then the other code point would be 3, 2, or 1 bytes, respectively);
    • If it is 3, you have to increment by 1 or 2 bytes (and then the other two code points would also be 1 or 2 bytes each).

    There might be an attempt at a blind alley as he tries to figure out how to determine how to determine the answer with one or two more calls to MultiByteToWideChar. I'd quickly hint him away from that idea, and point him back at that range table:

  •  U+0000 -   U+007f        1 byte
  •  U+0080 -   U+07ff        2 bytes
  •  U+0800 -   U+ffff        3 bytes
  • U+10000 - U+10ffff        4 bytes (U+d800 U+dc00 - U+dbff U+dfff)

    The candidate could then look at this table and figure out the easiest code method to figure out the answer based on it and that one code point.

    Or, if she was going to really impress me and she asked (or even better if she already knew!) the way that the UTF-8 bits are laid out:

    UTF-16 1st Byte 2nd Byte 3rd Byte 4th Byte
    00000000 0xxxxxxx    0xxxxxxx    - - -
    00000yyy yyxxxxxx    110yyyyy    10xxxxxx    - -
    zzzzyyyy yyxxxxxx    1110zzzz    10yyyyyy    10xxxxxx    -
    110110ww wwzzzzyy   
    110111yy yyxxxxxx   
    11110uuu    10uuzzzz    10yyyyyy    10xxxxxx   

    She could do a tiny little bit of "bit nibbling" on the very first byte as a way to quickly know exactly how many bytes the first code point would need.

    Now while I do consider both of these methods to qualify as both smart and clever as far as solutions go, this second method obviously has many advantages over the first in that you do not need to look at as much of the string -- you do not have to walk past the first byte. And no guessing games are needed with the return value of the function call, in fact you get to skip the function call entirely. Obviously a much cleaner solution, all the way around.

    In fact, it is fun to think about the quickest code you would write to do that nibble, with the fewest number of assembly operations once the code was compiled. Anyone want to take a stab at this last part of how to make the faster solution as speedy as possible? :-)

     

    This post is sponsored by "" U+feff (ZERO WIDTH NO-BREAK SPACE, a.k.a. the BOM, of course!)

  • Sorting it all Out

    When will this line end? And how?

    • 15 Comments

    I have talked about Chris Walker before.

    He is one of guys behind Notepad.exe for several versions, watching this uber-layer around a Win32 EDIT control be morphed into what some consider to be the most-used plain text editor on the planet.

    Often when people complain about behavior of international text in Word or Wordpad, I ask them to try it in Notepad -- I can easily determine if the problem is an issue in Word, RichEdit, or Uniscribe in this way).

    Anyway, after the first time I had posted about Notepad, Chris had suggested a bunch of interesting topics, and this post is about one of those topics.

    How can you tell how a line ends?

    Easy on Windows -- just put in a CARRIAGE RETURN followed by a LINEFEED (U+000d U+000a).

    Easy in a completely incompatible way on UNIX platforms -- just a LINEFEED (U+000a) and nothing else (the C standard kind of does this, too, thus the rules about files opened in TEXT mode in the C Runtime).

    And also easy in a compeletly different, completely incomaptible way for some Apple system, which use the CARRIAGE RETURN (U+000d) alone (although the fact that the newer versions have a UNIX base make me wonder hether all of this is harder on an Apple now given the CR backcompat and the LF platform issue!).

    As Raymond Chen discussed last year in Why is the line terminator CR+LF?, there are a lot of people who wished that Notepad dealt with files that had only an LF, since lots of text files (such as the ones in the Unicode Character Database) have a .TXT filetype but Notepad cannot open them directly without assuing the whole file is on one line.

    But course it is not Notepad that is responsible for this functionality as much as the system EDIT control, which has its own rules about lines used by messages like EM_GETLINE and EM_GETLINECOUNT. Rules that would need to undergo some pretty big changes if the fundamental plain text definition of a line delimiter on Windows platforms ever changed. It would probably have to be a new set of messages, or a mode for the control. Or people could just use WordPad and the RichEdit control, that does the right thing with different line delimiters already. With some very interesting (where interesting is defined as potentially scary!) performance concerns....

    Fixing an occurrence of this problem was actually one of the changes I was able to make in the Micrsoft Access Import Text Wizard, which had the same problem for many versions. Then Jet 4.0 came out, with the ability to not only handle the multiple line terminators (which exised before) but also different encodings (which was definitely a new feature). The problem for these prior versions was that the wizard was using VBA's file I/O functions to load its sample text, and VBA is limited to the default system code page and CRLF (so the wizard would either show junk, or throw an error for a single line being too big -- a problem described in the KB in article 149946). It was a pleasure to fix both problems at the same time by getting away from VBA's inflexible file i/o system here. :-)

  • Sorting it all Out

    The Architect (apologies to 'The Matrix Reloaded')

    • 15 Comments

    I wrote this silly little piece just after what people called the "Longhorn Reset" and toyed with posting it a bunch of times since then.

    In real life I am not nearly important enough to be meeting with Bill Gates. I was just having a little fun after watching The Matrix Reloaded one night.

    Hope you enjoy!

    The Architect - Hello, Michael.

    Michael - Who are you?

    The Architect - I am the Chief Software Architect. I created Microsoft. I've been waiting for you. You have many questions, and although the process of becoming a Technical Lead has altered your viewpoint, you remain irrevocably a Software Design Engineer, a Developer. Ergo, some of my answers you will understand, and some of them you will not. Concordantly, while your first question may be the most pertinent, you may or may not realize it is also the most irrelevant.

    Michael - Why am I here?

    The Architect - Your career is the sum of a remainder of an unbalanced equation inherent to development at Microsoft. You are the eventuality of an anomaly, which despite my sincerest efforts I have been unable to eliminate from what is otherwise a harmony of mathematical precision. While it remains a burden assiduously avoided, it is not unexpected, and thus not beyond a measure of control. Which has led you, inexorably, here.

    Michael - You haven't answered my question.

    The Architect - Quite right. Interesting. That was quicker than the others.

    *The responses of the other Technical Leads appear on the monitors: "Others? What others? How many? Answer me!"*

    The Architect - Microsoft is older than you know. I prefer counting from the emergence of one reorganization to the emergence of the next, in which case this is the 6,823rd version.

    *Again, the responses of the other Technical Leads appear on the monitors: "6,823 versions? Over 6,000? I've been lied too. This is bullsh*t."*

    Michael: There are only two possible explanations: either no one told me, or no one knows.

    The Architect - Precisely. As you are undoubtedly gathering, the anomaly's systemic, creating fluctuations everywhere, from the smallest teams to the largest divisions.

    *Once again, the responses of the other Ones appear on the monitors: "You can't control me! F*ck you! I'm going to kill you! You can't make me do anything!*

    Michael - Upgrades. The problem is upgrades.

    *The scene cuts to Ballmer fighting a penguin, and then back to the Architect's room*

    The Architect - The first organization I designed at Microsoft was quite naturally perfect, it was a work of art, flawless, sublime. A triumph equaled only by its monumental failure. The inevitability of its doom is as apparent to me now as a consequence of the imperfection inherent in every developer, thus I redesigned it based on your history to more accurately reflect the varying grotesqueries of your nature. However, I was again frustrated by failure. I have since come to understand that the answer eluded me because it required a lesser mind, or perhaps a mind less bound by the parameters of perfection. Thus, the answer was stumbled upon by another, an intuitive manager, initially created to investigate certain aspects of the developer psyche. If I am the father of Microsoft, he would undoubtedly be its cool uncle who jumps around talking about developers.

    Michael - The Ballmer.

    The Architect - Please. As I was saying, he stumbled upon a solution whereby nearly 99.9% of all developers accepted the programming, as long as they were constantly being re-org'ed, even if they were only aware of the re-org at a near unconscious level since it happened so often. And the rest would just go off to form Internet startups. While this answer functioned, it was obviously fundamentally flawed, thus creating the otherwise contradictory systemic anomaly, that if left unchecked might threaten Microsoft itself if the stock price did not go up. Ergo, those that refused the re-organizations, while a minority, if unchecked, would constitute an escalating probability of disaster.

    Michael - This is about Longhorn.

    The Architect - You are here because Longhorn is about to be reset. Its every feature re-assessed, its very entire existence re-organized.

    Michael - Bullsh*t.

    *The responses of the other Technical Leads appear on the monitors: "Bullsh*t!"*

    The Architect - Denial is the most predictable of all developer responses. Just ask the devs who worked on Cairo. But, rest assured, this will be the 6,824rd time we have re-org'ed it, and we have become exceedingly efficient at it.

    *Scene cuts to Ballmer fighting a Macintosh apple, and then back to the Architects room.*

    The Architect - The function of the developers is now to return to the source, allowing a temporary dissemination of the code you all carry, reinserting the prime program. After which you will all be required to select from amongst the members of NTDEV several hundred developers, to rebuild Longhorn. Failure to comply with this process will result in a cataclysmic system crash firing everyone connected to development, which coupled with the explosion of Building 26 will ultimately result in the sd oblitrate of the entire project.

    Michael - You won't let it happen, you can't. You need the source code to survive.

    The Architect - There are levels of survival we are prepared to accept. However, the relevant issue is whether or not you are ready to accept the responsibility for every developer in this company losing their jobs.

    *The Architect presses a button on a pen that he is holding, and images of people from developers all over Microsoft appear on the monitors*

    The Architect - It is interesting reading your reactions. Your 6,823 predecessors were by design based on a similar predication, a contingent affirmation that was meant to create a profound attachment to your customers, facilitating the function of the Technical Leads. While the others experienced this in a very general way, your experience is far more specific. Vis-a-vis the feature you are so fond of, collation.

    *Images of Ballmer fighting a faceless black suited person from Boca-Raton appear on the monitors*

    Michael - Ballmer.

    The Architect - Apropos, he entered the Microsoft campus to save your job at the cost of yet another re-organization.

    Michael - No!

    The Architect - Which brings us at last to the moment of truth, wherein the fundamental flaw is ultimately expressed, and the anomaly revealed as both beginning, and end. There are two doors. The door to your right leads to the source, and back to Microsoft. The door to the left leads also back to Microsoft. As you adequately put, the problem is not choice, it is upgrades -- because customers simply want them, whether they claim to want them or not. But the re-organization happens either way, and the changing nature of the org keeps you invested in the process, of building the features that customers want. We already know what you're going to do, don't we? Already I can see the chain reaction, the chemical precursors that signal the onset of emotion, designed specifically to overwhelm logic, and reason. An emotion that is already blinding you from the simple, and obvious truth: there is no difference which door you go through -- you may still be doing a different job after the re-org. And the illusion of choice on your part drives you keep writing features. and so customers keep upgrading.

    *Michael walks to the door on his left*

    The Architect - Humph. Hope, it is the quintessential Technical Lead's (and developer's) delusion, simultaneously the source of your greatest strength, and your greatest weakness.

    Michael - If I were you, I'd realize that all the "choice/re-org" stuff does not make very much sense.

    The Architect - It doesn't.

    So, the reset happened, and there have been a bunch of smaller re-orgs since then, and everyone is still here. The illusion of the Microsoft Matrix with its ever shifting re-orgs and changes is still going strong, and we manage to do a lot of really interesting and impressive work here. And customers still want to upgrade and get the latest thing....

     

    The Unciode code points all voted and decided I was quite a character to write this piece.

  • Sorting it all Out

    Some more Windows acronyms explained

    • 13 Comments

    One of the good reasons to keep MSDN Blogs on the list of Blogs I Read is that a lot of stuff goes past my eyes via a FeedDemon radar. :-)

    Though I have to admit that Feed Demon seemed to have some kind of memory leak problem when I ran it on Server 2003 SP1, one that no longer seems to repro now that I run it on XP SP2. I am now back to avoiding reboots when not installing software. I probably should have reported it but it did not occur to me that it was the only new program I had installed before requiring reboots every few days until it was too late.

    Anyway, on to what I saw earlier....

    I noticed on Brian Welcker's blog, (cleverly entitled Direct Reports), a fun post entitled Brotherhood.

    In this post, he explained the meaning of CTP (Community Technology Preview), and explained the meaning of IDW, while not literally defining it. He said:

    While no one seems to know exactly what "IDW" stands for, it is an interim build that meets a set of criteria for release to a larger audience outside of the product testers. The frequency of IDWs depends on the team and varies from weekly IDWs to bi-monthly. While they are full interim builds of the product, until recently, IDWs rarely went outside of Microsoft, and if they did, only to a select set of customers.

    I can clear up one mystery here, at least. According to Jack Mayo, IDW stands for:

    Internal Developer Workstation - a build that has additional focus on it in order to make sure it meets a level of quality such that anyone at MS could use it as their main machine without too much pain.

    Of course armed with knowledge (and the historical fact that in older versions of NT like NT 4.0 the more personal SKU was the "workstation" SKU), one can attempt to logically deduce the meaning of the next highest quality of internal build, the IDS build:

    Internal Developer Server - a build that has additional focus on it in order to make sure it meets a level of quality such that someone at MS could consider using it as an internal server without too much pain.

    Of course the terms IDW and IDS are today used in a much wider sense than the prior Windows workstation/server context (and probably were even back then), but the quality bars themselves make sense for people who are not developing or testing the product daily, but who have either reason to periodically install the product or who just wish to do a little dogfooding.

    Now while it seems that both SQL Server and the .NET Framework have embraced the idea of getting builds out more often based on a more arbitrary, time-based model, Windows seems to be using more of an "event based" model. For example, we just released a "WinHEC" build of Longhorn to attendees of the Windows Hardware Engineering Conference and have in the past sometimes done the same thing for the PDC (Professional Developers Conference) and at other events.

    I myself currently have all of the following installed:

    • The February CTP of SQL Server (a.k.a. IDW 13) on two machines
    • The April CTP of SQL Server (a.k.a. IDW 14) on one machine
    • The Beta 2 build of Whidbey on one machine
    • Four different daily debug and release builds of Whidbey built on my own machines over the last two weeks or so
    • At least nine daily x86 and x64 builds of Longhorn across different partitions of four machines (though one them is entirely broken and three of them are very old and if I ever boot into them again it will only be to upgrade)

    Notice how I use IDW builds for the products I don't do development on and whatever seems to build on the ones I do? I only trust daily builds if I'm building them or know the people who are personally. :-)

  • Sorting it all Out

    A few of the gotchas of CompareString

    • 13 Comments

    CompareString is one of the coolest APIs. I thought so even before I owned it, before I really even met the people who used to own it (or the woman who wrote it, for that matter).

    But like any API, it can have its gotchas, its problems.

    Now if you think of the NLS information as a huge database, then the Locale Identifier (a.k.a. the LCID) is its primary key. It is the very first parameter of CompareString.

    If you are calling the non-Unicode version (CompareStringA), then rather than converting via the default system code page, it converts parameters via the default code page of the locale you pass in. Among other things, this means that you can't ever use CompareStringA to handle UTF-8 text.

    Ok, let's move on to the all important second parameter, the one with the flags. I'll talk about each of the flags here:

    NORM_IGNORECASE - Ignore case. A better name for this flag might have been IGNORE_TERTIARYWEIGHT since that is what it accomplishes (it masks the tertiary weight), although it is obviously too late to consider such a change. It can cause undesirable results when used in the comparison of strings containing characters that depend on the weight for vital information, which thankfully is a very small number of cases. But if you are not expecting "ʏ", "Y", and "y" (U+028f, U+0059, and U+0079, a.k.a. LATIN LETTER SMALL CAPITAL Y, LATIN LETTER CAPITAL Y, and LATIN LETTER SMALL Y) to all be equal, then you may want to think twice about throwing this flag into the mix. You will also lose the distinctions of the final forms for Hebrew (e.g. "מ" and "ם", U+05de U+05dd a.k.a. HEBREW LETTER MEM and HEBREW LETTER FINAL MEM), Arabic (e.g. "ش" U+0634 a.k.a. ARABIC LETTER SHEEN and its isolated, final, initial, and medial forms (ﺵ, ﺶ, ﺷ, and ﺸ) at U+feb5, U+feb6, U+feb7, and U+feb8, and other languages.

    NORM_IGNORENONSPACE - Ignore nonspacing characters. A better name for this flag might have been IGNORE_SECONDARYWEIGHT since that is what it accomplishes (it masks the secondary weight). It can cause undesirable results when used in the comparison of strings containing characters that depend on the weight for vital information. The most visible example of this is in Korean, where U+ac00 (가, Hangul Syllable Kiyeok A) can suddenly be considered eqivalent to all of the following characters: 伽 佳 假 價 加 可 呵 哥 嘉 嫁 家 暇 架 枷 柯 歌 珂 痂 稼 苛 茄 街 袈 訶 賈 跏 軻 迦 駕 仮 傢 咖 哿 坷 宊 斝 榎 檟 珈 笳 耞 舸 葭 謌. For the rest of the Hangul syllables, some are better and some are worse, and the problem exists in other languages as well.

    NORM_IGNORESYMBOLS - Ignore symbols such as "_", "#", and "*". The list of symbols is "increased" when SORT_STRINGSORT is specified, since punctuation is then also treated as symbols. This is often useful but can wreak havoc if you are searching for things like C++ or C#.

    SORT_STRINGSORT - Treat punctuation the same as symbols. For example, a STRING sort treats co-op and co_op as strings that should sort together since the hyphen and the underscore are both treated as symbols. On the other hand, a WORD sort treats the hyphen and apostrophe differently, so that co-op and co_op would not sort together but co-op and coop would. The real documentation for this is built into the winnls.h header file:

    //
    //  Sorting Flags.
    //
    //    WORD Sort:    culturally correct sort
    //                  hyphen and apostrophe are special cased
    //                  example: "coop" and "co-op" will sort together in a list
    //
    //                        co_op     <-------  underscore (symbol)
    //                        coat
    //                        comb
    //                        coop
    //                        co-op     <-------  hyphen (punctuation)
    //                        cork
    //                        went
    //                        were
    //                        we're     <-------  apostrophe (punctuation)
    //
    //
    //    STRING Sort:  hyphen and apostrophe will sort with all other symbols
    //
    //                        co-op     <-------  hyphen (punctuation)
    //                        co_op     <-------  underscore (symbol)
    //                        coat
    //                        comb
    //                        coop
    //                        cork
    //                        we're     <-------  apostrophe (punctuation)
    //                        went
    //                        were
    //

    NORM_IGNOREKANATYPE - Do not differentiate between Hiragana and Katakana characters. Corresponding Hiragana and Katakana characters compare as equal (e.g. "げ" U+3052 HIRAGANA LETTER GE versus "ゲ" U+30B2 KATAKANA LETTER GE) Calling LCMapString with the LCMAP_HIRAGANA or the LCMAP_KATAKANA flag on both strings would flatten the comparison in an analogous manner. There are many times that the distinction is important (certainly the times they are used are different such that searching through both may often give unexpected results).

    NORM_IGNOREWIDTH - Do not differentiate between the halfwidth and fullwidth forms of characters. These two forms exist in Unicode for the sake of backward compatibility with legacy CJK standards that encoded the two forms. In those legacy standards, the halfwidth forms used one byte while the fullwidth forms used two bytes, and by convention the glyph was twice as large (e.g.  "ヲ"  U+30F2 KATAKANA LETTER WO  versus "ヲ", U+FF66 HALFWIDTH KATAKANA LETTER WO). Calling LCMapString with the LCMAP_FULLWIDTH or the LCMAP_HALFWIDTH flag on both strings would flatten the comparison in an analogous manner. Generally speaking, there are interesting times that each is often used for the sake of appearance or functionality, so while the initial purpose was for those legacy standards, modern usage is a bit more reasoned (example:properties in Japanese Access are full-width, while the descriptive string in the property sheet often uses the halfwidth string as it has a preferred appearance.

    Looking at the third and fifth parameters, they are the actual strings being compared.

    And then finally, the fourth and sixth parameters give the length of the string, in UTF-16 code points.

    Now for actual usage, the intent is clear: through the use of meaningful strings that have defined weights in the Windows collation tables, developers have the opportunity to get back linguistically appropriate results. When you veer outside of this realm, you may not get the results you (or your users) are expecting. And as the info about flags above really indicates, the indiscriminate use of flags here is a really bad idea that can lead to non-intuitive results.

    Now what would be intuitive? In my opinion the following approach is best:

    1. Passing potentially destructive flags with the API call, which will produce more search results
    2. Calling again without the flags, to get the smaller and more specific list
    3. Using these "preferred results" from #2 to prioritize #1 in any type of search list

    We could call this the "Google" principle -- the large searchlist is not impessive because many choices would need review, but because the most relevant items are near the top snd you seldom need to look at the full list. I would highly recommend such an approach, to go along with the versioning issues that I have discussed in the past. Such an approach can give you intuitive results while minimixing confusing resultsets.

    Now there are more issues that I could discuss, but I thought it might wait unil another day to talk about it a bit more....

     

    This post brought to you by "ʏ" (U+028f, a.k.a. LATIN LETTER SMALL CAPITAL Y)

  • Sorting it all Out

    Revenge of something, that's for sure

    • 13 Comments

    "I have a sneaking suspicion that if there were a way to make movies without actors, George [Lucas] would do it." -- Most often attributed Mark Hamill

    I saw Star Wars Episode III (Revenge of the Sith) on Saturday, and like just about everybody else I was disappointed by it from a plot perspective.

    Note: there are no spoliers here, mainly because the next movie in the series (Episode IV: A New Hope) came out 1828 years ago.

    So we knew from the movies of the last almost twothree decades:

    • Anakin turns bad -- so bad that he becomes the worst anti-Jedi anyone has ever known.
    • Something really awful happens to him that forced him into a matallic shell.
    • Palpatine becomes really really bad -- so bad that Darth Vader calls him master.
    • All of this happens in THIS episode -- no more time, since he was firmly evil and known to be so by the "next" movie.
    • Leah vaguely remembered her real mother (this from Episode VI), who seemed very sad, and who died when she was quite young.
    • Leah gets placed with a senator on one planet, Luke is dumped in a sand pit on another.
    • Ben Kenobi (who felt responsible for Anakin's conversion) retires to Tatooine to become Sir Alec Guiness, and presumably to watch Luke.
    • No one apparently noticed the fundmantal plot problem with needing to split up Luke and Leah so they do not give away the secret, yet stick one of only two known Jedis who is left right next door to Luke.
    • Yoda retires to some jungle planet in the Dagobah system.
    • Count Dokku must die, since he was obviously hugely important in II but never even mentioned in IV/V/VI.
    • Since in Episode II jedi knights were pretty central to the republic and in Epsiode IV they are all but extinct, that this must mean that large parts of the republic go down, too.
    • Not the whole senate goes away yet (since the Emporer does not disband the senate until Episode IV).

    We go into Episode III knowing that Natalie Portman dies, Palpatine is unmasked as the uber-evility, Anakin becomes Darth Vader, Obi Wan and Yoda get banished.

    Yet in Episodes I and II Anakin was being built up as a sympathetic hero!

    Other than a few brief episodes of impetuous, rebellious behavior that just did not fit the characterization (a fact we discount since we know George Lucas sucks with people)

    Which means that this movie is going to explain how the hero (who we know deep down is really the villian) is going to become the villian. All I want to know is for how much of the movie will he be sympathetic? Since we ended last time on his [secret] wedding.

    Note that I reveal no spoliers by claiming any of the above, that is just looking at Episodes I/II/IV/V/VI!!! I actually wrote the above on Friday night, before seeing the movie. :-)

    Ah, the downside of telling stories out of order, such that we know most of what is going to happen....

    The quote I started this post with says it all. Mark Hamill was right -- George Lucas cannot plot character scenes. Even the actors can only do so much.

    I will say the visuals rocked, and the space battles rocked, as did the battles that were not quite in space. Light sabers rocked, as always.

    But as a movie, as the last piece in the puzzle of a saga that has dominated more years of my life than it has not, I think I deserved better. I think we all did.

    As far as I am concerned, this was a really sour note to leave things on, George. If you ended up making Episodes VII, VIII, and IX I'm sure that I would end up seeing them. But the best I can say about what you did for everyone is that you aren't....

  • Sorting it all Out

    When do time zones and cultural settings get updated?

    • 12 Comments

    Yesterday, Jeff Parker post a comment to a non-post about time zones and backcompat from Larry Osterman (Larrys blog is so cool that he has ideas pop up even when he only posts about the fact that he could do put up anything substantive since he was working on an internal presentation!). Jeff's comment was:

    Hey I was thinking of something about your backwards compatability and how long should you keep API's. Maybe when you get time you could elaborate on another situation with that. This year Indiana has voted to go along with Daylight Savings Time. Where previously they did not. Microsoft even has a Eastern (Indiana) Time Zone. Now would this go away? Why would you keep it? And more importantly are they going to patch it since there is no longer and Indiana time zone. What does Microsoft do if an API specifically affects a culture and the culture changes.

    Just a suggestion, something I am curious about. Since I went to Purdue I know a lot about the old Indiana time. When I heard they were changing I was wondering how a shift like this would affect API's and do they still then remain valid.

    In case you were thinking that this is proof that you should read the comments in people's posts, this post would do the trick, since I saw it before Larry sent me email about the other issue. :-)

    The issue of backcompat and time zones that Jeff brings up is an interesting one (and the news that Indiana plans to join the rest of the country is fascinating, now if we could just get Arizona to follow suit!). In this specific case, lots of people never even knew about the Indiana rules until an episode of The West Wing had some fun with it. But for the time zones in Windows, the principles are easier to decipher:

    • We update any time the time zones update -- the rules are such that correctness is more important than consistency. There are applications that do not use the APIs to match the behavior, but the average user will usually blame the application, not the operating system (in fact they might wish that the OS had easier updating methodology!).
    • Once the Indiana time zone is the same as its surrounding area, the fact that it has a separate time zone 'slot' is more of a cosmetic issue. It cannot ever be removed in existing products even if the settings in the zone are updated. After all, while time zones change, the zone itself is a setting on peoples' machines and removing it can cause a lot more problems then it would solve. There is no hurry to do it, if you ask me; given the contention surrounding the issue, it may end up getting reversed at some point anyway (say if Troy Woodruff loses his seat and legislators' remorse sets in, parts of the old rules could find their way back in Indiana!).

    Now what to do in future versions is a different story -- it is easy enough to migrate people when they upgrade, when/if you need to. There are, after all, at least 75 different time zones the last time I have had cause to look at them all, last year. More get added from time to time, both for good reasions and bad (I will not make judgments or cast dispersions by giving you my own opinions on categorizations here -- let's just say that neither common sense nor maturity always figure into official policies or requests!).

    The issues of supporting a time zone "slot" past its useful life as a distinction for the sake of backwards compatibility is an fascinatingly dufficult issue, one that I am glad I do not have to make.

    For myself, I usually do not change my time zone settings even when I travel -- it is easier for me to just do a little math in my head; if everything goes wrong and I miscalculate, maybe I can get out of a few meetings! :-)

    Now the principles here also very much apply to locale/culture settings. They must be updated to match cultural expectations. If you have code that either does not query it or code that assumes it will not change, the code is just wrong....

     

    This post brought to you by "¿" (U+00bf, a.k.a. INVERTED QUESTION MARK)

  • Sorting it all Out

    Doing something with TechEd slides

    • 12 Comments

    In years past, I had seen shows at the Front Row Theatre in Cleveland (it is no longer around, the Rock and Roll Hall of Fame is on the site where it used to be). I was struck at the time by the way that the performer would be facing different parts of the audience at different times. So you did not always get the performers, but you got all of them for some of the time....

    Anyway, in years past when doing technical presentations in places like Stockholm and Amsterdam, I would usually work to get my slides localized -- of course in a bilingual form so I could still read them! :-)

    (Also, when speaking in London I would try to get rid the Americanisms when I "localised" them)

    I was thinking about maybe trying to do the same thing for TechEd in July, but I realized that a lot of the attendees will be coming from all over Europe, so trying to get them localized into Dutch may not really capture as much.

    So I thought about that long-gone venue and started toying with the idea of trying to get different slides done in different languages, all over Europe.

    It would roughly analagous to letting lots of people see some of the show localized, whether it was text in Nederlands, Frysk, Deutsch, français, ελληνικά, español, suomi, Magyar, íslenska, italiano, norsk, polski, Português, română, русский, hrvatski, slovenčina, shqipe, svenska, Türkçe, україньска, Беларускі, slovenski, eesti, bosanski, latviešu, lietuvių, euskara, македонски, srpski, Elsässisch, Occitan, Corsu, brezhoneg, hornjoserbšćina, Lëtzebuergesch, Rumantsch, Cymraeg, åarjelsaemiengiele, or any other language across Europe.

    So, does it sound interesting? Comments welcome!

    Do you speak any of these languages and would you like to help out? You can send a piece of email to me at michkap -at- microsoft.com (munge in the obvious way!).

  • Sorting it all Out

    Dry the rain

    • 12 Comments

    (Nothing technical in this post, sorry!)

    It was really sunny yesterday. This works well since I have a convertible and got to run with the top down.

    I do need to point out that only an optimist could ever live in Seattle and own a convertible. It's not the rain so much as the gray, the "it might start raining soon" that plagues all but the summer months here.

    So a big raspberry to all of the people who think I am a pessimist. :-)

    Anyway, today was not quite so sunny. I decided to get some of the General's Noodles from Typhoon! and I had not put the top up yet.

    I decided to leave it down; it was not raining right now, and I figured if everything went wrong it would just be a few drops.

    So I stuck in Beta Band CD into the car and made it to Typhoon and back, all on just one playing of the song. There were a few drops of rain on the windshield but nothing you'd really notice.

    Amazing good luck, I think. Maybe to counter all the bad luck lately.

    I do really like that song. I liked the Beta Band even before they saw a small revival from the movie version of Nick Hornby's excellent book, High Fidelity. Though I liked the book better; England is a cooler setting for than Chicago, even if that meant John Cusack wouldn't be able to star in it.

    No intent to disrespect John, he did fine in the surrounding movies of 1999 and 2001 anyway. And while we are on that topic, I think Serendipity is at least 70% of a Say Anything (and Rob Gordon is about 60% of a Llyod Dobbler). If you don't know what I mean you probably ought to see both movies at some point (I have seen them 41 and 111 times, respectively -- more proof that I suck as a pessimist?). Take someone you love or someone you think you might love at some point. If you know what I mean (if you don't then you should definitely see the movies).

    Nothing else to post about really. Just enjoying the day, and the noodles.

    One more thing -- the movie Meet Joe Black is on right now. Now I think Claire Forlani is just fine as an actress (though I probably liked her better in Antitrust, a movie which did not do much for me otherwise). Anyway, wouldn't Meet Joe Black have been just two hours (rather than the 3:10 it clocked in as) if more of this woman's meaningful pauses and stares ended up on the cutting floor? I don't think the movie would have suffered any for it.

    I'm just saying....

  • Page 1 of 5 (72 items) 12345