Blog - Title

August, 2005

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Sometimes it *does* pay to be neutral

    • 9 Comments

    I can hardly believe it has been a year since I asked and answered the questions What is a neutral culture? What is a neutral locale?

    I spent a bunch of time talking about how lame neutral locales were on Wn32 given how they are not supported by any of the NLS functions.

    They are very useful for resources though, something Stuart reminded me of the other day when he asked the following question in the newsgroups:

    In XP my user is set to German (Switzerland) and in my COM dll (built in Visual Studio 6) I have resources for German (Germany) and English (U.S.)

    However the German (Germany) string table is ignored and the LoadString win32 call loads the English (U.S.) resource.

    Why is the German (Switzerland) user loading the English (U.S.) resource?

    Well, once upon a time this is what may have happened -- if you look at the section of Chapter 4 of the first edition of Developing International Software for Windows 95 and Windows NT entitled Multiple Language Resources:

    On Windows NT, FindResource searches for resources tagged with the language ID of the calling thread. On Windows 95, it searches for resources tagged with the default system language ID. On Windows NT, you can search for a resource in a specific language by calling FindResourceEx, which takes a language ID as a parameter. Both FindResource and FindResourceEx first attempt to find a resource tagged with a language ID, as described above. If they don't find anything, they then search for a resource tagged with the same primary language as that of the specified language ID. (If several resources with the same primary language but different sublanguages exist, the functions will return whatever they encounter first.) If, for example, the program requests resources in Standard German that aren't available, the program can retrieve Austrian German or Swiss German resources and still provide a user interface that the user can understand.

    If the FindResource and FindResourceEx functions do not find any resources that match the language ID's primary language, they search for resources tagged as "language-neutral." This language ID is useful for resource elements such as icons or cursors that are identical for all languages. If a bitmap or an icon will differ for some languages, you can define one language-neutral bitmap as the default and specify language IDs for as many other customized bitmaps as required. For example, bidirectional applications might require bitmaps with right-to-left directionality. Because the FindResource and FindResourceEx functions always search for specific language IDs first, they will always find a bitmap tagged with that language ID before they find one tagged as language-neutral. The search algorithm they follow is summarized in the following list:

    1. Primary language/sublanguage
    2. Primary language
    3. Language-neutral
    4. English (skipped if primary language is English)
    5. Any

    But a lot has changed since this book was published (hell, just after the section I quoted is the one that talks about how you can use SetThreadLocale to choose the language and if you are a regular reader here then you know how I feel about SetThreadLocale.

    But there have been other changes in resource loading since NT 3.1 and Windows 95 that invalidate some of the priorities in resource loading. To understand why, it might be a good idea to think back to my post in May when I talked about The weird, weird world of the SUBLANGID. The truth is that each of these locales is different, and there are times that the behavior Stuart wanted (if not the full language I wanted, give me something close) may actually cause more problems then it could solve. Possibly not for German, but you never know. And there are times that you want to define an entirely different fallback logic, such as Basque or Catalan falling back to Spanish. and times that you do not want Bosnian to fall back to Croatian (or Simplified Chinese to fall back to Traditional Chinese).

    So things had to change, they really did. And they have been changing over the last few versions.

    The specific change prior to Windows Vista is to meet the same numbered steps as above, skipping the text above the list, i.e. (If several resources with the same primary language but different sublanguages exist, the functions will return whatever they encounter first.).

    (At this point, people pretty much universally recommend that "best practice" in this case involves separate language resource DLLs, one per language. I will get into why this is the case at a later time.)

    But specifically it is important if you do not want to follow the multiple DLL advice that you use neutral language resources to use a language that you want to have all of the various specific locales fall back upon. As I mentioned back in June, Using full locales rather than the neutral ones is often not a terribly good idea.

    Because sometimes, it pays to be neutral!

    (Editorial request of readers who wish to comment -- I would prefer if comments if any for this post not try to get into the religious issue of separate language resource DLLs and limit themselves to the religious issue of neutral vs. specific resources. I promise I will be covering the other issue soon!)

     

    This post brought to you by "₦" (U+20a6, NAIRA SIGN)

  • Sorting it all Out

    Keyboard Convert Service goes Bidi!

    • 5 Comments

    This is undeniably cool -- the Keyboard Convert Service localized into Arabic!

    If you are interested in getting it now (or even if you are just interested in looking at the download page in Arabic!) you can see it at:

    خدمة تحويل لوحة المفاتيح لـ

    Arabic of course joins that ever growing list of languages for this useful tool....

    It also highlights something important that developers do not think about as often as they perhaps should -- the difference between the logical order of the text and the visual order. The simple fact that you can use the Keyboard Convert Service to switch the text to that of an Arabic keyboard versus one in an LTR language relies on the even simpler fact that the typing of keystrokes (obviously based on a logical order) can do drastically different things, visually, depending on the keyboard used.

    Of course learning this lesson did not require the Arabic localized version of the tool (this functionality has nothing to do with the language of the user interface). But it does help remind us that every version of Windows has full internationalization support!

  • Sorting it all Out

    They're asking about Vista, even if they don't know it....

    • 6 Comments

    People keep on using that Contact link to ask me questions that essentially either shouldn't be asked at all or should be asked in the suggestion box. Sigh....

    Ok, I'm over it. Now here are some of those questions. And you, the reader, already know the answer, since it is right in the title!

    Question #1:

    Title: Longhorn

    hi,

    i looking for long time Windows Longhorn but i can find on the internet.

    can you help me where i can get windows Longhorn?

    greetings

    Kire

    Hi, Kire. The answer is Vista! More specifically, you have to get a copy of the Beta 1 release of Vista.

    Now of course, you sent the message to me on July 18th (before the beta was available), so the fact that you were unable to find the product on the Internet is sort of understandable -- it was not yet available. And it is not available on the Internet now, unless you are on the beta of course. Your message did not say much about what in particular interested you about the product or what features you wanted to take a look at, though. Was it anything in particular to do with internationalization or globalization functionality?

    Or did I just seem like a soft touch? :-)

    Ok, what's next? Hmmm... question #2:

    How to create calendar for Win32

    I would like to create a custom Win32 calendar, a variation on English with different month labels. I want programs such as Excel to use this calendar.

    Could you please provide some pointers?

    Google hasn't found much other than your blog.

    Thanks,

    Eric

    Well, Eric -- the answer one again is Vista! More specifically, if you use Whidbey's new custom cultures feature you can modify the month names and abbreviated month names. And if you create that custom culture on Vista, then it will also be a custom locale. And those month names will be used in the calendar and locale NLS functions provided in the Win32 API....

    Another time I'll post the code you can use to do this, if you have Whidbey. And the unmanaged code you can use to take advantage of it, if you have Vista.

    There are actually a few more questions I could post, but the answers are about things that I will not be able to talk about until Beta 2. So let's just put a bookmark here and I will answer more questions later.

    And next time consider using the suggestion box, people....

     

  • Sorting it all Out

    Keyboard Convert Service - русский вариант!

    • 3 Comments

    I have previously talked about the Keyboard Convert Service, and the fact that its UI can be found in English, Greek, and Czech.

    Let's add Russian to that list!

    Or in Russian, Служба преобразования клавиатуры.

    Who knows what language will be next? :-)

  • Sorting it all Out

    Compilers are hard working and dumb?

    • 4 Comments

    Philipp Lensen explained Why Good Programmers Are Lazy and Dumb. I thought this was a great and funny post, made even funnier by a small piece in the middle:

    In the endless battle between a programmer and the compiler, it’s best for the programmer to give up early and admit that it’s always him and never the compiler who’s at fault (unless it’s about character encoding issues, which is the part even the compiler gets wrong).

    Like the line in the movie said, we laugh because it's funny, and we laugh because it's true. Compilers often get this bit wrong, but in fairness to the compiler it is usually the programmer's fault and the compiler is just following orders when if it knew better it would know to put up an error.

    Things are getting better for languages like C# and VB.NET, which are moving to Unicode. But no one is quite there yet, something that I will be blogging about soon.

    In the meantime, enjoy the article, it is pretty accurate and reminds me of the story of "The Man Who Was Too Lazy to Fail" from a Heinlein novel I enjoyed years back....

  • Sorting it all Out

    Everybody picks on Brett

    • 0 Comments

    Yes, it is true -- everybody picks on Brett. Adi has even gotten into it now....

    In some ways it seems totally unfair (I mean he has a full time job and it is not like they are sitting around doing nothing over there!).

    In other ways it seems immature (like everyone thinks he will bow to peer pressure?)

    Yet we cannot help it -- we all want Brett to post!

    C'mon Brett!

  • Sorting it all Out

    About [not] writing books

    • 8 Comments

    (Nothing but a bunch of navel gazing in this post, you may want to skip if you have a temperment like mine; I would have no patience for this type of rubbish, myself!)

    A few days ago when I shook my fist at the heavens and proclaimed my kingdom for some Unicode controls, I mentioned about how Joel Spolsky said some incredibly nice things about my book, and even mused that "Hopefully we'll see another book on international software from him soon."

    Well, I can honestly say that there is nothing planned at the moment.

    Though I have noticed the best way to get publishers to woo me for such a thing is to go out in public and talk about international stuff, and being wooed (is that a word?) is flattering, I am finally smart enough at this point to know that it is a lot of work to write a book. Even if I have a lot of the material I would cover already figured out.

    I have pretty much had a simple rule for as long as writing a book was ever a topic to discuss, one that I think I first said aloud to Sharon Cox and Brad Jones at a dinner meeting in San Fransisco in early 1999 during VBITS. The rule?

    The rule is simple: I would never want to write a book that I wouldn't want to read.

    Sharon later became my acquisitions editor at Sams Publishing, maily on the basis of what went on in that meeting (well, that and the fact that folks at Wiley told me I would not be able to choose the animal for my book cover!). Of course Brad left Sams before my book was published and Sharon left soon after for Hungry Minds that was subsequently bought by Wiley, I think. How do editors and writers even know where to stand when the tectonic plates of the publishing world shift around so frequently? :-)

    Anyway, the rule does not mean I would feel some morbid need to read the book later; it would serve no purpose, I probably know what it will say half the time.

    What the rule means is that unless it was the type of book that I would want to go into a book store and buy then I would not want to put it out there. Farnkly, there are way too many freaking books out there. And while there are some gems, there is also a lot of worthless crap out there, too. I wish more of the people who wrote books would just stop writing and spend some more time reading books. And articles. And documentation. Because what some of those people write just makes it harder to find the useful information!

    (I have similar feelings about a lot of the technical blogs out there, for what it is worth -- I'll blog about that some other time)

    When I wrote Internationalization with Visual Basic I had a bunch of years of consulting to draw on. And not just consulting for Microsoft, but for a lot of Microsoft customers, too. But in the time since then, I have not been doing nearly as much consulting, and therefore not nearly as much time looking at dozens of different projects and scenarios and helping people come up with solutions. I have mostly been working for one company, and while I have been working on projects that I loved, they are not the sort of thing you can write books about.

    Hell, they are barely the sort of thing I can write blog posts about!

    I think that any book I wrote now would be either (a) an insightful look into how to do international software on MS platforms, or (b) a rehash of what you could have gotten out of MSDN anyway.After seeing some of what the competition for the book is and will be and would be, I am reasonably certain I would come significantly closer to (a) then some of the others have on the grounds of accuracy, correct terminology, and scope of coverage alone.

    But hell, I could probably give them a run for their money on that by just publishing this freaking blog (with this entry as the introduction!). Which is not saying this blog hs ever nearly all that I would want to cover; I am saying a lot more about their weaknesses than my strengths, believe me.

    If you listen to the interview that Joel did, he points out one of the problems I would run into if I wrote a book -- the people who enjoy slamming me in anonymous book site reviews for (in their opinion deserved) revenge or whatever, certainly do come out of the woodwork. One of those people even pretended to be a Microsoft employee (this one was taken down from amazon.com after the real person got in touch with them and asked them to take it down -- since the person who posted it was not smart enough to realize that their review was linked to his previous reviews, we even found out who he was!).

    Anyway, it makes me realize that it is unlikely I could write something good enough to entirely escape this muck in which I have placed myself.

    This fact might rule out politics for me, too. Not that my unpopular political beliefs would get me elected anyway. :-)

    Now I was going to write another book for Sams called Internationalization with SQL Server. It had an ISBN (0-672-32099-1) and was not quite half written (8 chapters out of the 18 in the approved TOC!), but they decided not go with it (they paid off the advance plus a bit more for my troubles, so I am not entirely bitter about the whole thing). Of course that was nearly five years ago, so even what was done would need a major update to be relevant now. It would be easier to start over, probably. But not being an internationally recognized SQL Server expert would probably work against me and I would not have the just-published other book with a similar title from the same publisher to help, either.

    And there was one great idea I had for a book and I even pitched it to my former acquisitions editor Sharon when she started working for Hungry Minds (when it was its own company). But it was a slightly radical idea and her boss said no, and none of the editors I have talked to since then have been interested either. So perhaps it was a little too radical (or maybe just a bad idea). Perhaps I'll blog about it some day and readers here can tell me if I was on track or on crack.

    But I really can't look at other books that aren't getting it done as a source of inspiration to get it done myself. The book situation for internationalization in .NET is pretty bleak (and does not like it will be getting better any time soon), but I don't think I could tackle it and my job unless the book was a big part of the job while it was being written (which seems pretty unlikely in a product group that has too many projects and not enough resources as it is!).

    For now, I feel a little safer here anyway, where "for great justice somebody set us up the blog" and all that. I'll keep covering the internationalization issues I know about, and try to strike a chord here and there. Plus talk about the music I like from time to time, which seems to also entertain folks....

  • Sorting it all Out

    Vietnamese is a complex language on Windows

    • 17 Comments

    Back in May of 2004, Quan Nguyen sent a message to Dr. International about Vietnamese collation in Windows and the .NET Framework:

    I tried to sort Vietnamese characters according to Vietnamese collation rules, as precribed in http://vietunicode.sourceforge.net/charset/vietalphabet.html. However, .NET Framework's built-in sort order for CultureInfo("vi-VN") seems not correct. What should I do to get it to sort according to Vietnamese alphabetical order? 

    This was not the only place that this information was asked -- Quan had asked this same question on several newsgroups and other places. We requested some more details, did the investigation, and were able to report on the claim -- he was right, there were a few letters that did not sort properly. In the end, the problem basically consisted of the uppercase and lowercase versions of the following letters:

    Of course since these letters are in Unicode and are used by several other languages, they have some default weights -- but they are not in the Vietnamese exception table. And their weights in the default table are not completely correct....

    Now no one had reported this problem before, so hopefully these are letters that are not used often in Vietnamese in situations where the small but definite differences in collation would be noticed.

    Which is not to say it is not a bug or that it should not be fixed -- it definitely is.

    But it is to perhaps explain why it took so long for someone to report to Microsoft a bug that has been in the code page and sorting tables since the very first Vietnamese enabled versions of Windows....

    Now Windows code page 1258 has its own set of problems here, because the above characters are not in cp1258, either. Well, they sort of are as combining characters since the code page has U+0300, U+0301, and U+0303 on it -- but the conversion to and from Unicode of the above characters can be quite nightmarish, for the reasons I mention when I pointed out a few of the gotchas of MultiByteToWideChar. We would have had to include them as the precomposed form listed above, and there are not enough free slots to do so (even if we were able to modify code pages, which we are not when I explained about we cannot change the code pages).

    So let's just assume that cp1258 is about as limited in use as all of the rest of the attempts at the other (at last count 42!) 8-bit encodings of Vietnamese are (they all have problems due to the fact that there are too mny characters or not enough slots to put them) and stick with Unicode....

    Getting back to collation, this particular problem that Quan Nguyen reported is fixed in the updated sorting tables in LonghornVista Beta 1. It could not be fixed in earlier versions of Windows or the .NET Framework as requires a major version change for Vietnamese to change the weights of code points that already have weights defined, so Vista is our first chance to make the fix (Whidbey's sorting tables are not being updated so the fix could not be made in .NET 2.0).

    On a happier note, the font story for Vietnamese has been really good on Windows for a while now, for all of these various letters.

    And the Vietnamese LIP was released in March 2005 which is also pretty awesome.

    It just took a little while for the NLS side of GIFT to catch up with everyone else, that's all. :-)

     

    This post brought to you by "Ý" (U+00dd, a.k.a. LATIN CAPITAL LETTER Y WITH ACUTE)

  • Sorting it all Out

    Has it been three years?

    • 5 Comments

    Yes, as it turns out, it has been exactly three years since I started at Microsoft on a full-time basis. And my dork badge still let me in so I have not been fired just yet.... :-)

    Ordinarily, this would mean I should put out 3 lbs of M&Ms in front of my office.

    However, since I did not do this the first two years, I am making up for that by putting in the 1 lb. for year one and 2 lbs. for year two as well. With a little extra for interest (well, actually because the bags were 21.30 oz. in the store and I could not make it come out even, but why quibble?).

  • Sorting it all Out

    The Czech is in the email

    • 8 Comments

    Well, in an email I just received, at least (and now in yours if you have email notifications for this blog turned on). :-)

    The Czech localized version of the Keyboard Convert Service has been released, joining English and Greek....

  • Sorting it all Out

    The Milk Bet lives!

    • 30 Comments

    They are never going to learn this one.

    Marlins suspend batboy for milk-drinking dare

    I'll ignore the suspension issue and talk about the "milk bet" here.

    Now this particular bet has been around for a long time. I first heard about it when I was working for the Access team, probably around eight years ago.

    Heath, a fellow developer on the team with an office right next door to mine, was certain that he could drink a gallon of milk in an hour without throwing up. He went to CalTech and because of this had a very logical way of thinking this through. He could easily drink one of those one pint milk containers in just a few minutes. So the gallon could be polished off easily since it really is just eight of those one pint containers.

    (for those outside the US, there are four quarts to a gallon and two pints to a quart!)

    And he had a whole hour to do it, so he could take his time and make it with no problem, right?

    Well, actually, it is wrong.

    Fellow developer Nicholas Shulman volunteered the explanation for why the bet is never won as Heath was running to the restroom to avoid throwing up in the conference room to which we had all adjourned.

    "A stomach," Nick explained with the just the right inflection for irony, "is about a half a gallon."

    Milk needs time in the stomach to be broken down before it can go on -- it does a body good, but it needs a little time to do that good. And there is simply no way to break the milk down fast enough to take in a full gallon in an hour. If you try to do so, your body will rebel and if you try and force the issue, your stomach will settle the argument for you.

    Perhaps some future CalTech or MIT student who has read this blog will either refuse the bet, or anticipatorily buy something that will break down the milk and drink a bunch of that right before the bet starts.

    (via Spencer)

  • Sorting it all Out

    My kingdom for some Unicode controls

    • 18 Comments

    I have certainly done my share of pushing for Unicode controls in various programming languages on Windows. From the UniToolbox controls link on this very blog to the book I wrote for Visual Basic (see Chapter 6 online!) -- this is the one that Joel Spolsky said all of the very nice things about in this post and the audio interview it links to (in the interview I was an example, he was mainly talking about how Amazon ratings/comments can be particularly biased/skewed by folks with a "smear tactic" agenda, a point on which I agree with him -- but I usually just filter the anonymous comments to get a more accurate answer!).

    (Joel, I'll cover my thoughts on a book in another post!)

    Anyway, I am a huge fan of Unicode controls.

    In prior versions of VB (<= 6.0) they were only half-Unicode, by which I mean they all had Unicode interfaces but for the most part were wrappers around non-Unicode intrinsic or common controls. Which means a lot of conversions back and forth (and back again in many cases on NT-based platforms since the underlying controls themselves are Unicode!). So you get all of the space and performance penalties of Unicode with none of the benefits (like the Shell Unicode interfaces in Windows 95!).

    It was very exciting that in .NET all of the WinForms controls are 100% Unicode any time the OS could support it happening. Even on Win9x all of the owner draw controls still support Unicode, and some of the common controls. You can see some of this in the documentation, like in this topic:

    However, certain controls do not support Unicode in Windows 98 and Windows Millennium Edition. These controls, all of which inherit from the common control, will process data with the Windows code pages, as ANSI. These controls are: TabControl, ListView, TreeView, DateTimePicker, MonthCalendar, TrackBar, ProgressBar, ImageList, ToolBar, and StatusBar. The result of this is that you cannot display Unicode data in these controls on the listed platforms. For example, you cannot display Japanese characters on an English Windows 98 system.

    We'll ignore the technical mistakes here and the fact that it does not mention some of the intrinsic-based controls like the TextBox also have this problem (and especially the fact that some of the common controls actually do support Unicode on Win9x, and will work properly in WinForms!) and concentrate on the issue that there are a few controls which will not support Unicode on Win9x, even in WinForms.

    It is easy to get worked up about this, but these days I do not. After all, the only time I ever run Windows 98 or Millenium these days is when I am looking at an MSLU bug, and it has been a long time since one of those has needed a look. And even if the controls fully supported Unicode, usually the fonts would not be there so all you would see is a bunch of square boxes a.k.a. NULL glyphs (��������) which is really not much better in terms of information than a bunch of question marks (????????).

    For me it is enough that everything is Unicode whenever it can be. Thats cool.

    Now the final frontier is C++ projects -- since so many people still don't create the projects as Unicode ones, and a lot of developers still write that TCHAR code even if they are only writing for NT-based platforms like Win2000 or XP or Server 2003 or Vista -- LPTSTRs and TCHARs, yuck!

    In NLS we put our foot down in Windows Server 2003 -- no new NLS API functions will be written with ANSI counterparts. And we're continuing that in Vista. Not everyone has gotten the word on this yet, so we'll need to step up on the "internal evangelism" with other teams and groups. But it should be easier to suggest that people write less code, I think -- much easier than to suggest that people need to write twice as many functions and messages!

    The old functions will still work, sure. But there is plenty of new functionality like FindNLSString and NormalizeString and lots more that I will be covering in future posts -- and it is Unicode only, like many of the new locales in Vista are.

    So if you are writing C/C++ applications, you have to ask yourself if you really want half the world to have to speak fluent question mark to use the products you write?

     

    This post brought to you by "" (U+0f40, a.k.a. TIBETAN LETTER KA)
    (A letter that you are probably not looking at a NULL GLYPH for if you are running Vista Beta 1!)

  • Sorting it all Out

    The loader without the loaded

    • 0 Comments

    I have mentioned in the past about customers who did not need the Microsoft Layer for Unicode on Win9x Systems itself. They had their own layer, perhaps built based on the article written by F. Avery Bishop, David C. Brown, and David M. Meltzer.

    But they did like the loader in unicows.lib, and wanted to hook up their layer to this loader.

    So how hard is it to have the loader without having the loaded?

    It is actually quite easy!

    Basically, you need to have include unicows.lib in your project just as the Platform SDK describes and add a file in your project that has a custom loader function -- something that will return an HMODULE such as that returned by LoadLibrary. What this needs to look like is on the Platform SDK, here. The code looks like this:

    #ifdef __cplusplus
    extern "C" {
    #endif
    extern FARPROC _PfnLoadUnicows = (FARPROC) &LoadUnicowsProc;
    #ifdef __cplusplus
    }
    #endif

    Of course you need to define your own LoadUnicowsProc procedure. :-)

    Now what you do next depends. You have two choices:

    A) If you have a library that has all of the functions you call in it and their names are identical, then once you have a LoadUnicowsProc() defined that returns its HMODULE, you are done! The MSLU loader does all of the work for you and loads the functions. You're done!

    B) If your library's function names are different or there are other reasons you want to handle having the pointers to functions yourself, you can actually add the following to that source file you added (as is described here) for each function:

    #ifdef __cplusplus
    extern "C" {
    #endif
    extern FARPROC Unicows_<function name one> = (FARPROC) <function you want MSLU to call>
    extern FARPROC Unicows_<function name two> = (FARPROC) <function you want MSLU to call>
    ...
    extern FARPROC Unicows_<function name n> = (FARPROC) <function you want MSLU to call>
    #ifdef __cplusplus
    }
    #endif

    And then you are done with this method. The loader will never use that HMODULE you returned from LoadUnicowsProc(), so you can pass any non-NULL value (I would recommend calling a guaranteed invalid value like (HMODULE)0x00000001 or something).

    Now as long as you cover all of the functions your application is calling (via either method), then you get the advantage of the platform-based switching that the MSLU Loader provides, without any worries related to your functions being called on non-Win9x platforms.

    Now not everyone needs such a method, but if it would be handy for you then it is available. :-)

     

    This post brought to you by "٭" (U+066d, a.k.a. ARABIC FIVE POINTED STAR)

  • Sorting it all Out

    It didn't make my brown eyes blue

    • 2 Comments

    Crystal Gayle would be so disappointed....

    The Novantrone did not make the whites of my eyes take on a bluish tinge, as I thought it might.

    Darn, I was thinking that might have been kind of cool.

  • Sorting it all Out

    You probably don't want to use Microsoft's code page 20269

    • 5 Comments

    Yes, there is a problem with code page 20269. And there has been, since birth.

    It is intended to be an implementation of ISO-6937. Unfortunately it cannot really be used for its intended purpose, to provide a form for combining characters for Latin-1. The ISO standard works as follows:

    ISO 6937 has for characters single letters and combinations of a letter with a diacritic. Only those which occur in a list are legal, the "repertoire" of ISO 6937. The diacritic shall preceed the letter, but is no character in itself. A diacritic as a free-standing character is created by coding a space behind the byte that represents the "diacritical mark". In this way some characters are coded with one, others with two bytes. The number of codeable characters is finite, basically the 333 characters defined in the repertoire.

    (The scheme of 6937 was abandoned in favor of the ISO-8859 scheme, which uses precomposed characters.)

    Now both Windows and Unicode do things the other way around (base character followed by combining character). In order to properly handle conversions for ISO 6937, any of the following characters would have to be reversed with the character following it when calling WideCharToMultiByte(20269,...) and the character preceeding it when calling MultiByteToWideChar(20269,...)

    Unicode cp 20269 Character
    U+0306 0xC6 Combining Breve
    U+0307 0xC7 Combining Dot Above
    U+0308 0xC8 Combining Diaeresis
    U+030a 0xCA Combining Ring Above
    U+030b 0xCD Combining Double Acute
    U+030c 0xCF Combining Hacek
    U+0327 0xCB Combining Cedilla
    U+0328 0xCE Combining Ogonek
    U+0332 0xCC Combining Low Line

    Technically, we should only do this for chars within the legal list of 333 chars, all others should fail to convert properly. But the simple reversal above might be enough....

    Since 20269 is a table based code page, this kind of special handling is not being done and really cannot be done; to fix, a new (algorithmic or 'baby DBCS') code page would have to be defined. And we are not defining new code pages, so this one is going to need to be file under the "do not expect useful results without doing a lot of work yourself" category....

    Not the end of the world or anything, but it seemed worthy of at least a blog entry. :-)

     

    This post brought to you by "A" (U+0041, a.k.a. LATIN CAPITAL LETTER A)

Page 1 of 5 (64 items) 12345