Blog - Title

August, 2005

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    What the hell does HTTP_ACCEPT_LANGUAGE mean?

    • 18 Comments

    The question is a simple one: what the hell does HTTP_ACCEPT_LANGUAGE mean?

    The answer is also quite simple: IT DEPENDS.

    The user is sending information from their browser, and could mean any of the following things:

    • language/locale to use for formatting/collation preferences
    • language/locale to use for the UI
    • language/locale about which to provide content
    • location for which to provide information

    Now sometimes all of the settings will be the same. It is obviously more common for that to be the case. But it is a huge Internet and frankly there are a lot of times that they're not the same. It is unfortunate and all of these different items have to be filtered through a single setting across all of the browsers. But life is about dealing with things as that are, not as we want them to be.

    It is therefore importantcrucial to recognize that a user may have any of these in mind, and be careful not to assume too much based on the HTTP_ACCEPT_LANGUAGE -- giving them an easy way to change the settings if you assumed more than they wanted you to....

     

    This post brought to you by "Ǯ" (U+01ee, a.k.a. LATIN CAPITAL LETTER EZH WITH CARON)

  • Sorting it all Out

    You probably don't want to use Microsoft's code page 20269

    • 5 Comments

    Yes, there is a problem with code page 20269. And there has been, since birth.

    It is intended to be an implementation of ISO-6937. Unfortunately it cannot really be used for its intended purpose, to provide a form for combining characters for Latin-1. The ISO standard works as follows:

    ISO 6937 has for characters single letters and combinations of a letter with a diacritic. Only those which occur in a list are legal, the "repertoire" of ISO 6937. The diacritic shall preceed the letter, but is no character in itself. A diacritic as a free-standing character is created by coding a space behind the byte that represents the "diacritical mark". In this way some characters are coded with one, others with two bytes. The number of codeable characters is finite, basically the 333 characters defined in the repertoire.

    (The scheme of 6937 was abandoned in favor of the ISO-8859 scheme, which uses precomposed characters.)

    Now both Windows and Unicode do things the other way around (base character followed by combining character). In order to properly handle conversions for ISO 6937, any of the following characters would have to be reversed with the character following it when calling WideCharToMultiByte(20269,...) and the character preceeding it when calling MultiByteToWideChar(20269,...)

    Unicode cp 20269 Character
    U+0306 0xC6 Combining Breve
    U+0307 0xC7 Combining Dot Above
    U+0308 0xC8 Combining Diaeresis
    U+030a 0xCA Combining Ring Above
    U+030b 0xCD Combining Double Acute
    U+030c 0xCF Combining Hacek
    U+0327 0xCB Combining Cedilla
    U+0328 0xCE Combining Ogonek
    U+0332 0xCC Combining Low Line

    Technically, we should only do this for chars within the legal list of 333 chars, all others should fail to convert properly. But the simple reversal above might be enough....

    Since 20269 is a table based code page, this kind of special handling is not being done and really cannot be done; to fix, a new (algorithmic or 'baby DBCS') code page would have to be defined. And we are not defining new code pages, so this one is going to need to be file under the "do not expect useful results without doing a lot of work yourself" category....

    Not the end of the world or anything, but it seemed worthy of at least a blog entry. :-)

     

    This post brought to you by "A" (U+0041, a.k.a. LATIN CAPITAL LETTER A)

  • Sorting it all Out

    The Milk Bet lives!

    • 30 Comments

    They are never going to learn this one.

    Marlins suspend batboy for milk-drinking dare

    I'll ignore the suspension issue and talk about the "milk bet" here.

    Now this particular bet has been around for a long time. I first heard about it when I was working for the Access team, probably around eight years ago.

    Heath, a fellow developer on the team with an office right next door to mine, was certain that he could drink a gallon of milk in an hour without throwing up. He went to CalTech and because of this had a very logical way of thinking this through. He could easily drink one of those one pint milk containers in just a few minutes. So the gallon could be polished off easily since it really is just eight of those one pint containers.

    (for those outside the US, there are four quarts to a gallon and two pints to a quart!)

    And he had a whole hour to do it, so he could take his time and make it with no problem, right?

    Well, actually, it is wrong.

    Fellow developer Nicholas Shulman volunteered the explanation for why the bet is never won as Heath was running to the restroom to avoid throwing up in the conference room to which we had all adjourned.

    "A stomach," Nick explained with the just the right inflection for irony, "is about a half a gallon."

    Milk needs time in the stomach to be broken down before it can go on -- it does a body good, but it needs a little time to do that good. And there is simply no way to break the milk down fast enough to take in a full gallon in an hour. If you try to do so, your body will rebel and if you try and force the issue, your stomach will settle the argument for you.

    Perhaps some future CalTech or MIT student who has read this blog will either refuse the bet, or anticipatorily buy something that will break down the milk and drink a bunch of that right before the bet starts.

    (via Spencer)

  • Sorting it all Out

    Why I think the thread locale really stinks

    • 14 Comments

    I do not like the thread locale.

    Yes, GetThreadLocale and SetThreadLocale are two of the many NLS functions that the GIFT team supports.

    And yes, if we are to look at the functions we own as if they are our children then we are supposed to love them all.

    But in that case, I guess I am a lousy parent (if you will recall, I think that SetLocaleInfo really stinks, too).

    First of all, there are the weird dependencies in USER32 and SHELL32 that probably ought to be using the user locale but instead use the thread locale and fall back to the system locale if something goes wrong.

    Second of all, there is the poor story in GetThreadLocale:

    Return Values

    The function returns the system's default user locale.

    Remarks

    When a thread is created, it uses the system default user locale. The system reads the system default user locale from the registry when the system boots. This system default can be modified for future process and thread creation using Control Panel's International application.

    Since it always returns the thread locale, which starts its life as the user locale (set in Regional Options) but can be changed by a call to SetThreadLocale, you can make a fair case that both parts of the text are losuy.

    Third of all, there is the worse story in SetThreadLocale:

    Return Values

    If the function succeeds, the return value is a nonzero value.

    If the function fails, the return value is zero. To get extended error information, call GetLastError.

    Remarks

    When a thread is created, it uses the system default thread locale. The system reads the system default thread locale from the registry when the system boots. This system default can be modified for future process and thread creation using Control Panel's International application.

    The SetThreadLocale function affects the selection of resources that are defined with a LANGUAGE statement. This affects such functions as CreateDialog, DialogBox, LoadMenu, LoadString, and FindResource, and sets the code page implied by CP_THREAD_ACP, but does not affect FindResourceEx.

    Windows 2000/XP: Do not use SetThreadLocale to select a UI language. To select the proper resource that is defined with a LANGUAGE statement, use FindResourceEx.

    Where do I start? The function should return an LCID on success -- the previous thread locale! -- not a BOOL. Just like functions like SetWindowLong does. Oh well, I guess that is not the end of the world.

    There is that same type of silly text about the 'system default thread locale' which is a beast not found in nature. The notion that it is read from the registry on boot is also crap. it is based on user locale. Always.

    Then there is the text about how resource loading is affected -- and a warning sans explanation to not use that functionality on Win2000 and later. Since it is not supported on Win9x, what it is really saying is "good for only NT 3.1, 3.5x, and 4.0". In other words a warning that the function is not useful on modern platforms unless you want to affect the Shell and User subsystem in strange ways. Although it does not bother to say so.

    Fourth of all, the fact is that resource loading is incredibly complicated, in part because of this questionable functionality. It is not based on the user locale, and it is not in essence based on the thread locale unless you change it. In which case it suddenly is based on the thread locale. Unless it is based on the UI language -- which it should always be. I swear you need a spirit dancer and a Ouija board to know what resources load if you start mucking with the various locale settings, and that is mostly because of this weird setting that hovers between UI language and user locale -- the thread locale.

    On older platforms it is worse -- the system locale is the one that is used except when you set the thread locale (even though the thread locale is initially the user locale). I guess that is about the same as Win2000 and later, just substitute system locale for UI language.

    This is by the way (now that I think about it) the first reasonable explanation for the strange Visual Basic <= 6.0 behavior where in the IDE the user locale is the language used for resource loading, even though in the compiled application that would not work -- VB was setting the the thread locale to the user locale and confusing millions of VB developers with this never-before-now-fully-explained behavior. Geez.

    Anyway, we document that developers should use LOCALE_USER_DEFAULT and not GetThreadLocale when they are trying to respect the user's preferences.

    If you ask me, we should treat every case where the thread locale is currently used as a bug to be fixed and not as a legacy behavior to be coddled. We have been slowly breaking the behavior anyway with each version without explaining why since it was already broken, so why not just cut the cord and stop using this dastardly functionality?

    The thread locale really stinks, after all!

     

    This post brought to you by "ײ" (U+05f2, HEBREW LIGATURE YIDDISH DOUBLE YOD)

  • Sorting it all Out

    Unicode text in a VB6 RichTextBox?

    • 5 Comments

    Galit Avieli asked (via email):

    My name is Galit, we are a few students from Israel working on a project using visual basic. We ran into a problem we're dealing for a few days now. Searching for a solution on the web, we saw a response you wrote in a forum that answered our problem, only it referred to Microsoft Access. We'll be more than happy if you'll help us since we can't find the answer to the problem elsewhere.

    Our problem is:

    I'm writing in RichTextBox under "ARIAL" font, when I switch (alt-shift) to Hebrew the font is changed to "ARIAL (Hebrew)". When I send the text to TextBox the fonts are changed to "ARIAL" and on the screen appear unreadable chars.

    I would like to know how can I send the proper font name so that it would stay as "ARIAL (Hebrew)" (exactly like Microsoft WordPad is working).

    That was an example, our problem is wider and refers to all languages (not just Hebrew).

    Again, it would be a great help and would save months of work.

    Thanks a lot and have a wonderful weekend

    Galit Avieli
    Israel

    Unfortunately the answer is not a good one. :-(

    In this particular case, at the point where you switch to the Hebrew keyboard layout, the RichEdit control underneath the RichTextBox notices the switch and changes the charset of the underlying font to Hebrew (this is also what happens in WordPad).

    However, when text is sent to the control, VB does not keep it in Unicode; instead it sets the text using methods that convert it to the default system code page, which may or may not contain the characters in question. the RchEdit control is smart enough to know that the text it is being sent is not, in fact, Hebrew text. So it converts the font back.

    The same problem will happen for any text that is not on the system default code page....

    Unfortunately, as I talk about in Chapter 6 of my book (no longer in print, sorry!), the answer to the question "is VB <= 6.0 Unicode or not?" is not a simple one.

    The answer is to not use the MS-provided RichTextBox ActiveX control; instead use one of the controls specially designed to keep Unicode text as it is, like the UniToolbox controls from Woodbury Associates, Ltd. The intrinsic controls and the VB-provided ActiveX controls for the most part will not do this.

    This problem is for the most part solved with .NET, though in some cases Win9x will not see the right "Unicode" behavior due to use of underlying OS controls. But that is a subject for another post, on another day....

     

    This post brought to you by "ג" (U+05d2, a.k.a. HEBREW LETTER GIMEL)

  • Sorting it all Out

    Suits you to a _T()

    • 7 Comments

    The other day, Jeremy asked me:

    I thought that with your wealth of unicode knowledge, you may be able to answer a few questions for me.

    In a C/C++ program, is it necessary to wrap single character conversions in a _T( ) macro?

    For instance...

    TCHAR tch = _T('S');

    The MSVC compiler happly converts the literal 'S' to a double byte '\0''S' for UNICODE builds, so that the following appears to compile fine...

    TCHAR tch = 'S';

    I'm unclear if the compiler is actually substituing L'S' for 'S', or simply promoting the value of 'S' to a double byte.

    Is there any case in which a single-byte character has a unicode representation that is not a simple double byte promotion?

    If you are not dealing with both Unicode and non-Unicode builds of a program, then all of the _T()/TEXT() macro stuff as well as all of the TCHAR stuff is fairly superfluous. As I mentioned a few days ago, new functions NLS adds to Vista are not going to have non-Unicode versions added (a trend started in Server 2003).

    To answer the specific question about whether the macros is required (and keeping the last paragraph in mind), I would always suggest using the L prefix on Unicode characters and strings, even though the compiler seems to not feel the need to use it for characters. It is definitely still needed any time you specify a string literal, and the consistency seems like a good thing, doesn't it?

    For the ASCII range, you will not find a difference between that "double byte promotion" and a Unicode representation. However, for anything single byte that is outside of ASCII but inside of the default system code page, I would go so far as to say that the "promotion" would usually be wrong, and possibly also subject to different interpretations depending on what the default system code page happens to be. If you are gong to write UNICODE/_UNICODE applications, then it seems best to keep them using Unicode everywhere....

     

    This post brought to you by "S" (U+0053, a.k.a. LATIN CAPITAL LETTER S)

     

  • Sorting it all Out

    Vietnamese is a complex language on Windows

    • 17 Comments

    Back in May of 2004, Quan Nguyen sent a message to Dr. International about Vietnamese collation in Windows and the .NET Framework:

    I tried to sort Vietnamese characters according to Vietnamese collation rules, as precribed in http://vietunicode.sourceforge.net/charset/vietalphabet.html. However, .NET Framework's built-in sort order for CultureInfo("vi-VN") seems not correct. What should I do to get it to sort according to Vietnamese alphabetical order? 

    This was not the only place that this information was asked -- Quan had asked this same question on several newsgroups and other places. We requested some more details, did the investigation, and were able to report on the claim -- he was right, there were a few letters that did not sort properly. In the end, the problem basically consisted of the uppercase and lowercase versions of the following letters:

    Of course since these letters are in Unicode and are used by several other languages, they have some default weights -- but they are not in the Vietnamese exception table. And their weights in the default table are not completely correct....

    Now no one had reported this problem before, so hopefully these are letters that are not used often in Vietnamese in situations where the small but definite differences in collation would be noticed.

    Which is not to say it is not a bug or that it should not be fixed -- it definitely is.

    But it is to perhaps explain why it took so long for someone to report to Microsoft a bug that has been in the code page and sorting tables since the very first Vietnamese enabled versions of Windows....

    Now Windows code page 1258 has its own set of problems here, because the above characters are not in cp1258, either. Well, they sort of are as combining characters since the code page has U+0300, U+0301, and U+0303 on it -- but the conversion to and from Unicode of the above characters can be quite nightmarish, for the reasons I mention when I pointed out a few of the gotchas of MultiByteToWideChar. We would have had to include them as the precomposed form listed above, and there are not enough free slots to do so (even if we were able to modify code pages, which we are not when I explained about we cannot change the code pages).

    So let's just assume that cp1258 is about as limited in use as all of the rest of the attempts at the other (at last count 42!) 8-bit encodings of Vietnamese are (they all have problems due to the fact that there are too mny characters or not enough slots to put them) and stick with Unicode....

    Getting back to collation, this particular problem that Quan Nguyen reported is fixed in the updated sorting tables in LonghornVista Beta 1. It could not be fixed in earlier versions of Windows or the .NET Framework as requires a major version change for Vietnamese to change the weights of code points that already have weights defined, so Vista is our first chance to make the fix (Whidbey's sorting tables are not being updated so the fix could not be made in .NET 2.0).

    On a happier note, the font story for Vietnamese has been really good on Windows for a while now, for all of these various letters.

    And the Vietnamese LIP was released in March 2005 which is also pretty awesome.

    It just took a little while for the NLS side of GIFT to catch up with everyone else, that's all. :-)

     

    This post brought to you by "Ý" (U+00dd, a.k.a. LATIN CAPITAL LETTER Y WITH ACUTE)

  • Sorting it all Out

    Mitigation tools for IDN security problems

    • 16 Comments

    Back in January, just before the flap at the hacker's convention with the paypal.com like that used a cyrillic 'a' to prove that IDN without a way to ferret out phishing attacks, I posted my own post entitled International Domain Names? The sign on the door says 'Gone Phishing'....

    It was an interesting flap because the RFCs for Internationalized Domain Names clearly points out the dangers and talks about the need to do some extra work to avoid security issues, but several browsers jumped ahead to support them and then just as quickly rushed out to turn them off by default.

    Folks at Microsoft, who knew about the need to do work here first, did not jump ahead without looking. And Microsoft was complimented for not jumping in too quickly. :-)

    Unicode has move in to assist with Unicode Technical Report #36: Unicode Security Considerations.

    And now Microsoft has some functions to help ISVs jump in (functions that can and will also be used in future versions of Microsoft products!).

    Here it is: Microsoft Internationalized Domain Names (IDN) Mitigation APIs 1.0.

    From the overview:

    The "Internationalized Domain Names Mitigation APIs" download includes several API functions to convert an IDN to different representations, as well as several API functions specifically intended to allow applications to mitigate some of the security risks presented by this technology. The functions IdnToAscii, IdnToUnicode, and IdnToNameprepUnicode each convert an IDN string to a particular form. The functions DownlevelGetLocaleScripts, DownlevelGetStringScripts, and DownlevelVerifyScripts allow applications to verify that the characters in a given IDN are drawn entirely from the scripts associated with a particular locale or locales. However, these functions are only helpers; applications have still to perform comprehensive threat modeling and create appropriate mitigation for these threats.

    Also included are the Unicode normalization APIs IsNormalizedString and NormalizeString, which are used by the mitigation APIs.

    This package is supported on XP (Service Pack 2 or later) and Server 2003 (Service Pack 1 or later). And differently named functions will also be in Vista!

    For info on the Normalization API functions, look here.

    For info on the IDN API functions, look here.

    The cool functions in the package to help with the mitigation (they make use of ISO 15942 for their script definitions):

    You can use these functions as part of your strategy for dealing properly with internationalized domain names -- warning users of potentially dangerous links to information.

    Awesome!

     

    This post brought to you by "а" (U+0430, a.k.a. CYRILLIC SMALL LETTER A)

  • Sorting it all Out

    Let's get vertical

    • 11 Comments

    (computerized apologies to Olivia Newton John)

    Dmilat asked (in the Suggestion Box):

    @-prefixed fonts

    If you try to manually type a font name like @Arial Unicode MS in MS Word font selection combo-box and then enter a text with some CJK hieroglyphs (make sure the font name did not change), those characters will be turned 90 degree. I believe this allows for vertical text layout that may be used by people from east asian countries. What is amazing that I failed to find any info on that in MSDN. Is it kind of undocumented feature ? Can you give more info on that ?

    This feature has been around for a long time, actually. I look in Nadine Kano's book for the first time I had seen mention of it (see the mention here in Vertical Writing and Printing). An excerpt here:

    As the following illustration shows, displaying text vertically doesn't mean that you simply rotate an entire line of text by 90 degrees. Most characters remain upright, but others, such as those identified by arrows, change orientation.

    Fortunately, with Win32 you don't need to write code to rotate characters. To display text vertically on Windows, enumerate the available fonts as usual and select a font whose typeface name begins with the at (@) character. Then create a LOGFONT structure, setting both the escapement and the orientation to 270 degrees. Calls to TextOut are the same as for horizontal text.

    The Far East Win32 SDK contains a sample application called TATE (short for tategaki, meaning "vertical writing") which demonstrates how to create fonts and display vertical text. Figure 7-22 shows a sample file displayed in TATE using a horizontal font. Selecting a vertical font from the Font dialog box (see Figure 7-23 below) causes the text to be displayed vertically. (See Figure 7-24 below.)

    And so on. See the link for the full story. :-)

    There are probably other mentions in both the Platform SDK and MSDN, but it is harder to find them with symbols like @ usually being ignored in searches. :-)

     

    This post brought to you by "@" (U+ff20, FULLWIDTH COMMERCIAL AT)

  • Sorting it all Out

    Not every code page value is supported

    • 4 Comments

    The other day, John Bates asked in the suggestion box:

    This suggestion is probably just a documentation update, but here goes.

    One of my applications (compiled for Unicode) allows the caller to specify a code page for output. During testing I found WideCharToMultiByte works for most CPs but it fails for 1200, 1201, 12000 and 12001. The "Code-Page Identifiers" page lists these as valid CP values, but my system's NLS key doesn't have any values for these CPs.

    Is there something that has to be installed for this to work, or is there another API (or series of APIs) that should be called instead?

    I think there's a need for a (simple) encoding-to-encoding conversion API!

    Regards,

    John Bates

    Well, I will have to take this apart one piece at a time. :-)

    Now, if there ever were a function to handle "code page" 1200, it would not be WideCharToMultiByte, which has the job of converting UTF-16 LE into a byte-based encoding of some type, and by no stretch of the imagination can "cp 1200" be considered such a thing. :-)

    I'll break that one piece at a time rule for the rest -- the other three "code pages", 1201, 12000, and 12001 (a.k.a. UTF-16 BE, UTF-32 LE, and UTF-32 BE), also fall into a similar rule. They are not byte based and thus really not something I would want to see us bend the WideCharToMultiByte and MultiByteToWideChar functions to do. It is (in my humble opinion) unfortunate that we went this route with the Encoding class in the .NET Framework, but that is not by itself a reason to mess up the model for Win32 NLS API functions....

    Further, there is no need to have a conversion with "code page 1200" since that is converting something to itself. If you want to convert an LPWSTR or a WCHAR * to an LPBYTE or a BYTE * then you can just use a cast and then you are done, no need to go through a conversion function. Just cast it and you are done....

    As for the UTF-16 BE, UTF-32 LE, and UTF-32 BE cases, Murray Sargent of Microsoft once explained to Asmus Freytag of Unicode fame (who accosted me at a Unicode conference to make a similar demand for UTF-32 support) that there wass no need for this -- the conversions in question are macros and do not have to be full functions. I think Asmus mostly backed down after being out-accosted, but I very much appreciated the support. :-)

    The only useful excuse for functions in any of these cases would of course be to also handle validation (i.e. is it actual, valid Unicode), and I do not want to minimize that. But it is not a reason to back down from that model (in my opinion). Perhaps it is a reason for another function in the Win32 NLS API for these types of conversions, if there were a lot of customer requests that expressed such a need. We are not quite there yet, though; at this point those macros can still handle the immediate need....

    Sorry, John. :-( But I will talk to someone about the doc issue here, in any case. :-)

     

    This post brought to you by "𐒑" (U+10491, a.k.a. OSMANYA LETTER MIIN)
    A character that is just as comfortable as U+10491 as it is as U+d801 U+dc91, because it is not self-conscious about its weight. :-)

  • Sorting it all Out

    what's with the anchor tattoo?

    • 2 Comments

    I have had a few people ask me for a little of the back story here....

    You see, track 9 of Aimee Mann's new album (The Forgotten Arm) is entitled 'That's How I Knew This Story Would Break My Heart' and the first verse goes something like this:

    I drew a picture of you
    You and your anchor tattoo
    And saw the face that I knew
    Covered in shame
    You drew a bird that was here
    A kind of sweet chanticleer
    But with a terrible fear
    That the cage couldn't tame

    That's how I knew this story would break my heart
    When you wrote it

    In the concept album, this is kind of a low point for our hero John and heroine Caroline, as the lyrics would kind of indicate -- they have gone their separate ways (several songs ago) and it looks kind of hopeless as a romance (things get a little better later, but clearly not in this song).

    Anyway, it turns out that Aimee got a large anchor tattoo on her right arm. This was just before her show at the Fillmore last weekend (the tattoo was just 'one of the things she did' in San Fransisco that day, before the show).

    She was wearing blue jeans and a white tank top, so it was hard to miss the tattoo (I was behind the curtain, stage left, so I only got a few glimpses of it, when she was turning to switch guitars between songs; people who were on the other side of her probably got a better view).

    There was wide speculation that it was not real, but people have since then insisted that it is.

    It was kind of hard to understand the purpose of getting the tattoo -- the song, which I did like, is not one that is in the regular set they have been doing during the tour, and by report is has only been performed once in all those shows. And it is hard to fit it in with the themes of the album when it is sitting on her arm, although the fact that she actually does some boxing now confuses roles a bit. Perhaps that is why she did it, although it seems like a pretty huge step -- much bigger than just wearing a jacket and tie to shows, as she has been known to do. But she has a different look now, during shows. And you can't change out of a tattoo, after all -- so what happens the next time the look changes?

    On the other hand it is really not up to people who listen to music to understand everything the artists do, so that really is okay.

    Now when I hear the song which was formerly one I liked a lot, wondering about the tattoo kind of distracts from the music itself (others have said the same thing, so I know it is not just me). Maybe it is to remind her of the song or some special meaning behind it, but maybe it is for the best that she is not playing at shows, certainly the ones she wears tank tops to!

  • Sorting it all Out

    The Keyboard Convert Service

    • 14 Comments

    Some of you will remember a while back about when Kate Gregory inspired me to talk about why sometimes the keyboard does not do what I tell it to!

    Sometimes you think you are typing in one keyboard, but it turns out you are typing in another.

    So some of the truly awesome GIFT team folks in Ireland decided that maybe there was something better that could be done about the problem than just blogging about it.

    Turns out they were right -- they created the Keyboard Convert Service, a free download that will do its darndest to fix the text in these cases!

    I will be trying ot out this week to see if they were able to integrate some of the feedback I gave them early in the project cycle. :-)

  • Sorting it all Out

    What if my strings are > 2 gb?

    • 24 Comments

    We do get our fair share of silly questions here in NLS.

    I should perhaps explain what I mean by silly. :-)

    I don't think I'd ever consider a question where somebody is asking about language and how it might work in a certain situation and call that silly. I mean, that's how people learn. It's the kinds of questions that I ask of native speakers and of linguists, and even if they smile or laugh I never get the sense that they are thinking me silly for the question.

    But today, somebody who is thinking about 64-bit Windows and who assumed that one day strings that are greater than 2 GB would be common looked at our signature for CompareString:

    int CompareString(
        LCID Locale,
        DWORD dwCmpFlags,
        LPCTSTR lpString1,
        int cchCount1,
        LPCTSTR lpString2,
        int cchCount2
    );

    and suggested that perhaps those int parameters containing the string lengths ought to be size_t instead.

    Now I would like to forget about the argument that this is a public API that is been around since NT 3.1. It's obviously important here, and makes a suggestion a little bit silly, but not everyone really pays attention to what's in NLS API or how long it's been there.

    I'd also like to forget about the argument that 2 GB strings are uncommon, because one day they may not be. Especially in the 64-bit world. There may be a perfectly valid reason to have huge strings.

    The real problem I have here, and what makes the question in silly to me, is the notion that you need to do linguistic comparisons on strings that are greater than 2 GB in size.

    There is simply no way to justify this is a reasonable use of the collation functionality in NLS API.

    Perhaps some of you may disagree with this notion, and I'll be curious how people respond to this post. If you are somebody disagrees, please be sure to include information about your "reasonable example" so that people have a chance to appropriately judge the judgment being used. :-)

     

    This post brought to you by "§" (U+00A7, a.k.a. SECTION SIGN)

  • Sorting it all Out

    'Our move. What do you think?' I asked. 'Napalm' she said.

    • 13 Comments

    (nothing technical here, at least not for computers -- if the MS thing doesn't interest you then you may want to skip this one)

    I did lift and modify the title quote from the movie Barbarians at the Gate. George Roberts (played by Peter Dvorsky) is talking with Henry Kravis (well played by Jonathan Pryce) just after a meeting where they were stonewalled. The clock is ticking, and time is running out. So George asks the question and Henry makes clear that a slash and burn strategy is called for.

    A great movie that I enjoy a lot, and a scene that has been running through my head a lot over the last few days, which have had some fairly huge distractions....

    I had an unusual appointment with my neurologist on Monday (I think I had already mentioned that it was a fairly odd day, professionally). We were talking about recent declines in my MS that did not seem related to a specific clinical exacerbation. Now this is a disturbing trend, any way you look at it. If you want to reverse these things and not have them stay as permanent deficits, you have to act quickly.

    But the options for acting are limited -- there are just not very many conventional treatments available, and as unconventional as some people believe me to be, I am about as likely to start the snake oil rounds as I am to sprout wings and fly to Guam. I certainly am not going to jump into the exciting world of stem cell research. There is just not a lot out there....

    So my neurologist made a very responsible suggestion -- that it may be time to think about Novantrone.

    Now Novantrone is pretty much the napalm of the MS world -- an antineoplastic agent (basically chemotherapy), and you can watch how much the official site for the drug tries to tap dance about how much milder the drug is when used for MS than for cancer (whoever decided it was okay to sell it under the same brand name rather than a different one was not thinking too clearly that day, if you ask me).

    But even though it is the napalm treatment, it is pretty much one of the only treatment options that really exist for what they like to euphamstically call "worsening MS". It is the sort of thing I had already learned a lot about because I have a morbid fascination for such things. The party line about how it works is pretty harsh, even after the spin doctors have had their way with it:

    Novantrone works differently from other MS drugs This difference provides hope for people who have experienced worsening symptoms of MS while getting other treatments.

    Other drugs used to treat MS (like interferon) "moderate" the immune system. You can think of it as "calming down" certain immune cells, but not completely reducing their numbers.

    Novantrone works by "suppressing" the immune system. The active substance in Novantrone (called mitoxantrone) affects DNA, a basic building block of all cells. You can think of Novantrone as killing certain cells in the immune system (called T cells, B cells, and macrophages). These are the cells thought to become abnormal and lead the attack on the myelin sheath—producing the brain or spinal lesions characteristic of MS.

    You may hear the term chemotherapy associated with Novantrone. For some people, the word "chemotherapy" triggers scary images. However, the word simply means using chemicals to treat disease.

    When used to fight cancer, chemotherapy drugs are given in doses that are strong enough to kill aggressively-growing tumor cells.

    However, when Novantrone is used to treat MS the goal is to inactivate or destroy any immune cells that are "misguided." When Novantrone is prescribed by a doctor for MS, the recommended treatment schedule is far less intensive than for cancer. The overall dosage is given less frequently over a longer period of time.

    Well, okay. But the list of side effects are the same, the worries about the potentially destructive effect on the heart is in there, the small (less than 1%) chance of getting leukemia is present, and the fact that the effects can still be a worry years after you have stopped the treatment is sobering. The Patient Information Leaflet goes into the amusing under any other circumstances things like blue-green urine and the whites of the eyes turning blue. Some of the other risks are also interesting....

    But the fact that hits me hardest is not even one of the side effects -- it is that this drug has a lifetime maximum dosage of somewhere around 8-12 doses.

    This is where that morbid fascination thing kicks in -- a drug with such long-term destructive potential that there is not just a daily or weekly maximum but an actual maximum over the period of a lifetime.

    Wow. Or whatever the negative version of the word 'Wow' is.

    Am I really doing that badly?

    Of course that is the problem, and that is why it might be the right thing to try -- because I am not doing so badly that this would be a useless idea; it may well be the perfect time to knock this monkey off my back for a little, try to unseat the mostly unfettered ascendency of multiple sclerosis a bit.

    The other problem is a psychological one -- when the MS was not affecting me as much, I was in denial and everyone would go on about how great my attitude was and what a great example I was setting (and everyone has a friend or relative who is doing worse than I am). But now that it is affecting me and I finally may be reaching a more real and healthy attitude, a lot of people don't know what to do with me and have trouble thinking of me as setting a great example for anybody.

    I used to think that just about everything I ever needed to know I learned from the movie Apocalypse Now (yes, many will find that to be disturbing!), but this time Coppola failed me -- despite Robert Duvall's memorable Kilgore quote, I am not loving the smell of this napalm in the morning.

    On the other hand, I am finding comfort in Captain Willard's (Martin Sheen's) words, modified for this situation. "I took the Novantrone consult. What the hell else was I going to do? But I really didn't know what I'd do when I found if it was right."

    So we'll see what happens.

    This post may not have been everyone's cup of tea, so I am sorry for that. But it is hard for me to completely separate the various pieces of my life since they are all me, in the end. One way or another I do have to sort it all out....

  • Sorting it all Out

    Yellow and blue make green

    • 8 Comments

    WARNING - a least one regular reader of this blog has told me that this post contains TMI (too much information). If you are easily offended, you probably shouldn't read this post -- on the other hand if you're easily offended, you probably shouldn't read this blog!)

    This is something I learned back in kindergarten finger painting, and Glad bags have had the slogan for many years.

    But today, I got to see it firsthand in a way that I have never seen it before....

    You see, I received my first Novantrone infusion today. All I can say is that this is a very very blue drug. Like blue enough to make the sky feel self-conscious about its color.

    A little while ago, I went to the bathroom. Yada, yada, yada and without further ado I can verify via the equation in the title that the Novantrone is in my system. :-)

    Getting the infusion done was a little bit of an adventure, mainly because the nurse practitioner had suggested that I should show up an hour in advance but by the time they started the infusion were running an hour behind. I was spending a little more time there than I wanted to, a pill made no less bitter by the fact that the Auto-Pay system did not recognize the parking validation and I was charged $24 for the time I was parked.

    Ah well, no biggie. From now on, if they still haven't fixed the elevator to the third-floor from the Triangle garage, I'll just use the valet parking....

    Feeling good so far, and I haven't checked the whites in my eyes to see if they're still white. Maybe I should ask somebody, if I can figure out a way to ask without sounding really really strange or like I'm using some type of pickup line!

Page 1 of 5 (64 items) 12345