Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
I used to write a bunch of wizards and add-ins for Microsoft Access. One of the things that these add-ins include is a USysRegInfo table that instructs the Microsoft Access Add-In Manager on what registry keys to add so that the wizard can be installed.
But that Add-In Manager has a flaw.
If you mark registry keys as needing to be written to HKEY_LOCAL_MACHINE (which Access documents you must do to get the wizard or add-in installed properly) and the user who runs the Add-In Manager has no permission to do so, the helpful error message along the lines of this one (translated into your local language):
So now people will send email to me complaining about the error my wizard/add-in puts up when they try to install it.
How hard would it be to say in the error message that this error has to do with permissions and is something to talk about with your administrator rather than the developer of the add-in, rather than saying very little and naturally causing the poor customer to assume that this is an error for the person who developed the free wizard?
Hopefully someone who is involved with Microsoft Access will suggest this feedback for the next version of Access and help out the poor ISV who is being blamed for something that is not their fault! :-(
And even more hopefully, this problem will be fixed some time soon. :-)
Yet another 'New in Vista Beta 1' post!
Now I answer a lot of questions in this blog, some that people ask directly and a lot of others about things that no one has asked but I personally find interesting, but many that people have asked in the past
Back in November of last year, I answered one of my favorite questions when I pointed out that they ask me "why is my Korean text in random order?"
This was back in the early days when I was a bit more popular, so that even though none of the people commenting on the post actually had personally witnessed they issue, they found it kind of interesting (I have since mostly slipped into something a bit more obscure except when I have art that has women falling head first off of mechanical bulls!).
:-)
I am not complaining, mind you. But it is hard to not notice the fact that my posts about any topic other than internationalization seems to draw about 2-20 times the interest!
On top of which earlier today I looked at an advance copy of a book on internationalization coming out soon that is probably going to be well over 500 pages, only 5-6 of which are about my favorite topic (collation), which managed to split collation and string comparison into two different topics and spend two pages talking about alternate sorts and none of which covered almost any of the topics I go on about here as real concerns for internationalization in Windows and the .NET Framework. And I think the author might read this blog!
Some days it does not pay to get up in the morning.
And that was just the technical stuff, the non-technical stuff was just as helpful (more on this in another post I'll do later).
Nevertheless, I carry on. Someone likes what I am doing here, I'm sure of it. And I get to say it all my way, too. Sometimes people point out bugs, and other times I find bugs myself while posting. Which is undeniably cool. Maybe a year from now a Google search will dig up an answer to a question that helps someone save the day or whatever. And all of that is really good enough for me.
But every once in a while (and here is where I pop the stack a bit to the original purpose of the post) I get to post about something obscure but fun that no one out there in the world knows about yet.
There are a whole bunch of people who read about the odd use of the word linguistic when I answered that other question (What does "linguistic casing" mean?). At the time I proposed that we could have called the LCMAP_LINGUISTIC_CASING flag LCMAP_UNICODE_SIMPLE_CASING and been just as close to what was actually going on with the flag. But this is selling the flag a little short; after all, it handls Turkic casing, after all, and it takes many of the lookalike symbols in Unicode that are identically shaped to Greek and other letters, and converts them to those letters. All operations that are sensible linguistically even if not a good idea in filesystems and other less linguistic operations.
And I guess that is kind of linguistic.... well, more "language-like" rather than "computer-like".
And in Windows Vista, this use of the word "linguistic" will continue, with several new flags (all of which can be used alongside the other flags in CompareString and LCMapString and all of which are available in Vista Beta 1 and all of (which will be documented in the Longhorn SDK as soon as the part that covers Win32 is available!):
LINGUISTIC_IGNORECASE -- Could have been called, NORM_THEREALANDACTUALIGNORECASE as it does what NORM_IGNORECASE ought to do and only masks the case information in scripts that actually have a notion of case.
LINGUISTIC_IGNOREDIACRITIC -- Could have been called NORM_WHATWEREALLYMEANTFORIGNORENONSPACE, it does what NORM_IGNORENONSPACE does and only masks the diacritic weight for that small range of scripts where actual, European style notion of diacritics are used (which is a great way to address that issue I mentioned earlier that affects Korean and other languages!).
NORM_LINGUISTIC_CASING -- The flag that will allow comparisons on Win32 to handle Turkic case properly, whether or not either NORM_IGNORECASE or the new LINGUISTIC_IGNORECASE is specified (something that could not be done with the existing flags due to the breaking effect that would have on existing code involving the filesystem and other situations).
Now none of these three operations are what a linguist would actually call linguistic. But all three of these operations certainly provide collation support on Win32 with behavior that is more linguistically appropriate than the existing support does on Windows. Which is pretty awesome....
And it is pretty darn linguistic if you accept the tap-dancing act above. Which is a lot better of a dance than the one done about the naming of the ANSI code page, or the Visual InterDev product! :-)
This post brought to you by "İ" (U+0130, a.k.a. LATIN CAPITAL LETTER I WITH DOT ABOVE)
This last week, Dean Harding asked in the suggestion box:
Hey Michael, after all these years of reading your blog I finally got a question for you (your topics have always been so well-covered that I never needed to suggest anything before, but now I got a specific problem which I hope you can help me with :) Anyway, one of the most interesting parts of my job is that I get to do a lot of interfacing with SMS (and I use the term "interesting" as in that old Chinese proverb ;) and one of the things about SMS is that it has a very limited character set. Now, one of the new applications we're setting up is essentially a database of live events (shows, bands, DJs, etc) which you can access via the web and via SMS. Occasionally, however, you'll get a band with non-US-ASCII characters in their name (most popular are of course the famous Umlauts!) but of course accented characters and so forth cannot be displayed in SMS. Now, we don't want to miss out the Umlauts and such for the web interface, but for SMS we don't have much choice. So there's a couple of solutions. First is we have two fields in the database, the "web" name and the "sms" name - but that's no good cause it means we have to keep both up-to-date. The solution I was hoping to go for was to do a simple run through an Encoding.GetBytes followed by Encoding.GetString with a US-ASCII encoding. My hope was that this would be equivalent to WideCharToMultiByte followed by MultiByteToWideChar /without/ the WC_NO_BEST_FIT_CHARS flag which would convert all the accented characters to their non-accented equivalents. But that doesn't seem to be the case - they get converted to ?'s which is no good. I was hoping for a .NET-only solution, but it looks like I'll have to p/invoke the WideCharToMultiByte/MultiByteToWideChar calls. Unless you've got some good news for me :)
Well, I did start posting in November of last year, so technically it has been "years", but it really has been less than a full year that I have been blogging, Dean. :-)
But I can definitely speak against using encoding support directly to support the plan here -- mainly because the "best fit" support in the Win32 encoding API is not really a completely firm way to take out all of the diacritics!
Offhand, I would way the best way is the Stripping Diacritics... post I did this last February, which will handle this case quite well and quite a bit more completely than the Win32 encoding APIs in concert will do.
Or if you really wanted to do it through encoding you could use the .NET Framework 2.0 support custom encoding fallbacks with the ASCII encoding to simply drop anything you wanted to and replace it with whatever you like, including the ASCII-fied version of text sans diacritics....
Is that close enough to good news? At least since I include the warning about using the Win32 functions? :-)
This post brought to you by "è" (U+00e8, a.k.a. LATIN SMALL LETTER E WITH GRAVE)A character that might have resented having its grave stripped from it, but then realizxd that it meant in your application it might a little further from the grave due to the joy of pronunciational ambiguities!
The other day, John Bates asked in the suggestion box:
This suggestion is probably just a documentation update, but here goes. One of my applications (compiled for Unicode) allows the caller to specify a code page for output. During testing I found WideCharToMultiByte works for most CPs but it fails for 1200, 1201, 12000 and 12001. The "Code-Page Identifiers" page lists these as valid CP values, but my system's NLS key doesn't have any values for these CPs. Is there something that has to be installed for this to work, or is there another API (or series of APIs) that should be called instead? I think there's a need for a (simple) encoding-to-encoding conversion API! Regards, John Bates
This suggestion is probably just a documentation update, but here goes.
One of my applications (compiled for Unicode) allows the caller to specify a code page for output. During testing I found WideCharToMultiByte works for most CPs but it fails for 1200, 1201, 12000 and 12001. The "Code-Page Identifiers" page lists these as valid CP values, but my system's NLS key doesn't have any values for these CPs.
Is there something that has to be installed for this to work, or is there another API (or series of APIs) that should be called instead?
I think there's a need for a (simple) encoding-to-encoding conversion API!
Regards,
John Bates
Well, I will have to take this apart one piece at a time. :-)
Now, if there ever were a function to handle "code page" 1200, it would not be WideCharToMultiByte, which has the job of converting UTF-16 LE into a byte-based encoding of some type, and by no stretch of the imagination can "cp 1200" be considered such a thing. :-)
I'll break that one piece at a time rule for the rest -- the other three "code pages", 1201, 12000, and 12001 (a.k.a. UTF-16 BE, UTF-32 LE, and UTF-32 BE), also fall into a similar rule. They are not byte based and thus really not something I would want to see us bend the WideCharToMultiByte and MultiByteToWideChar functions to do. It is (in my humble opinion) unfortunate that we went this route with the Encoding class in the .NET Framework, but that is not by itself a reason to mess up the model for Win32 NLS API functions....
Further, there is no need to have a conversion with "code page 1200" since that is converting something to itself. If you want to convert an LPWSTR or a WCHAR * to an LPBYTE or a BYTE * then you can just use a cast and then you are done, no need to go through a conversion function. Just cast it and you are done....
As for the UTF-16 BE, UTF-32 LE, and UTF-32 BE cases, Murray Sargent of Microsoft once explained to Asmus Freytag of Unicode fame (who accosted me at a Unicode conference to make a similar demand for UTF-32 support) that there wass no need for this -- the conversions in question are macros and do not have to be full functions. I think Asmus mostly backed down after being out-accosted, but I very much appreciated the support. :-)
The only useful excuse for functions in any of these cases would of course be to also handle validation (i.e. is it actual, valid Unicode), and I do not want to minimize that. But it is not a reason to back down from that model (in my opinion). Perhaps it is a reason for another function in the Win32 NLS API for these types of conversions, if there were a lot of customer requests that expressed such a need. We are not quite there yet, though; at this point those macros can still handle the immediate need....
Sorry, John. :-( But I will talk to someone about the doc issue here, in any case. :-)
This post brought to you by "𐒑" (U+10491, a.k.a. OSMANYA LETTER MIIN)A character that is just as comfortable as U+10491 as it is as U+d801 U+dc91, because it is not self-conscious about its weight. :-)