Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Yes, that's right. I am at zero bugs for the milestone and heading to Adobe in San Jose for the Unicode Technical Committee meeting, a quarterly gathering that goes through changes, additions, updates, and technical reports for the Unicode Standard.
I'll still be posting while I am out, though I cannot promise that I will be approving anonymous comments as often. So be patient if you are posting comments anonymously.
If you see anything odd in the 'Way Back Machine' like a post count that is bigger that the number of posts, it is just some of posting ahead to make sure I can stay as consistent as I usually am....
And if you are going to the UTC also then I will see you soon!
(Although the title may suggest it to some people, this is not a post about whether I should be giving my ex-girlfriend a call; it is not!)
There are many functions in the Windows API set that have Ex or Ex-style varations that get added.
In lots of cases, the Ex version acts as extension, and extends the functionality of the original function. But in some cases it is simply one that changes the functionality in a particular way.
So it is important not to try to pick the one to call by assuming that the old function just calls the new one.
Sometimes such an assumption will work -- TextOutW basically just calls ExtTextOutW, and FindResourceW does little more than call FindResourceExW. And in most cases on NT platforms the "A" functions convert and call their "W" equivalents.
But other times, it will not -- ToUnicode and ToUnicodeEx actually both call the same internal function and thus all you get for trying to short circuit what appears to be the wrapper is that you push an extra 4-8 bytes of stack.
The code for the Unicode and ANSI versions of CharNext are entirely different bits of code, trying to solve entirely different kinds of issues. You will actually cause more problems than you solve by trying to convert yourself and calling one of them.
And many people feel like you need a spirit dancer, a Magic 8-Ball and a ouija board if you want to know the difference between GetTextExtentPoint, GetTextExtentPoint32 and GetTextExtentExPoint, or between the EnumFonts, EnumFontFamilies, and EnumFontFamiliesEx functions. Every time you think you get a handle on the differences in the Platform SDK documentation, a topic like this one or this one makes you start wondering again.
Now on top of all that, in a particular very special case, the Ex version is actually a wrapper around the non-Ex version.
That case is GetStringTypeA, GetStringTypeW, and GetStringTypeEx, the main subjects of this post.
These three functions have an interesting, some would say the professional equivalent of sordid, history.
You see, there is no GetStringType function in the documentation or the header files, because the decorated functions (GetStringTypeA and GetStringTypeW) take different parameters -- the former takes an LCID and the latter does not.
The story varies when asking oldtimers about how this all came about.
Perhaps more will chime in, ones without bias especially welcome -- bias clearly is what makes so many people have such different recollections!
But the most consistent story I have been able to glean from various sources is that GetStringTypeW was done by the NT team for NT 3.1 and GetStringTypeA was done by the Win9x team for Win95. Now NT 3.1 shipped first, and it was only when the Win95 project hit this bit of code that the need for an LCID became obvious. The NT team could not change their function signature and the 9x team could not proceed without a context for the conversion, so they split the difference and kept them separate, and Win95/NT 3.5 both shipped with the GetStringTypeEx function.
(No one could recall if NT 3.1 ever was supposed to support a GetStringTypeA that had no LCID parameter, but I have checked the NT 3.1 kernel32.dll and it did not export such a function (the NT 3.5 version matches the current function, as the documentation indicates). One person hypothesized that the Win9x issue came up in time to yank out the LCID-less GetStringTypeA but too late to change GetStringTypeW, but admitted that this was pure guessing!)
Now, looking at all of the functions to explain what they do:
Now 'uses the locale' means looking it up in our list of 'loaded' locales, an if it is not there, loading it and adding it to that list. Maybe GetStringTypeExW could have skipped looking at the locale (since it does not use it) but it was not written that way originally (at this point who knows what kind of dependencies might be in customer code, so we can't change the behavior.
Notice what happened here though -- GetStringTypeW (which technically came first) is the main function, and the other two that came later are the wrappers that are doing extra work just to call the original function. The other functions provide no additional funtionality unless you want to consider it as a confusing way to get IsValidLocale-esque functionality (which seems crazy, but is the reason we can't take out the behavior, unfortunately).
So, the answer to the original question "To Ex or not to Ex?" in this case is (ironically enough, just as with the ex-girlfriend?) probably not to Ex. And for the wider question, it is probably better to decide based on the functionality you need and not on which one looks newer, unless you have someone writing a blog entry to explain why you might want to do something different.
And how often does that really happen, anyway? :-)
This post brought to you by "¿" (U+00bf, INVERTED QUESTION MARK)
Yes, it is true. Julie Bennet has made an appearance on the TV show Charmed!
(note the way it is spelled, that is not a typo!)
Ah, we probably are thinking of two different people. I was noticing that in the big identity shift at the end of last season that Alyssa Milano's character Phoebe Halliwell had taken on the name Julie Bennet as her name.
I did a double take when I was catching up on episodes from a few weeks ago and saw that name on an employee ID on the show!
Maybe we should start calling Julie by another name.... Phoebe and Alyssa are both available....
It is funny how you can't have a simple discussion about the Framework Design Guidelines about Case Sensitivity without bringing out a nearly religious hysteria with the two camps claiming their thoughts on the matter are best.
Personally, I don't care. Everyone should just do whatever they want and then if they want case insensitivity come back talk to me about how to get the casing figured out....
But what I think is funniest about one of the VB stances on the whole matter is that they want case insensitivity because it is too easy to confuse people with mixed case but they have no problem with different normalized forms obfuscating meaning.
I mean, speaking personally, I have no problem distinguishing these two letters:
A a
But in an application like Visual Studio.NET (which has had complex script support in the designer since v.1), I would have serious trouble distinguishing these two letters:
ë ë
Of course, it is a lot of fun putting together samples like that obfuscation one, but that is just to help call attention to the problem.
I would not object if Visual Studio was bundled with a tack hammer so users tempted to do this sort of thing could hit themselves over the head to cure the temptation. :-)
Now here is a sample I could put together that might help -- a VB sample that had all of the following kinds of characters co-existing in ways that appear to violate the casing rules:
Ä ä
É é
Ì ì
Õ õ
Û û
I am going to have to find Jay Roxe (who was at the Karaoke Challenge on Friday and sang Great Balls of Fire!) and talk about putting together a "case sensitive VB" sample. Maybe then I could get the VB folks who want case insensitivity interested in getting the issue fixed? :-)
This post brought to you by "ë" (U+00eb, a.k.a. LATIN SMALL LETTER E WITH DIARESIS)
Yoshihiro kawabata asked the suggestion box:
Hello, Michael. About U+221E. I test this code in SQL Server 2005. select convert(varchar, nchar(0x221E)) Result: '8' Collation(Latin1_General_CI_AS). '∞' Collation(Japanese_CI_AS). Why U+221E is convert to 8 in Latin1_General_CI_AS ?
The answer can be found in prior posts such as If the shoe [best-]fits.... and BestBetter than nothing fit mappings, unleashed, #1. In fact, if you look at that second post you will see that I marked this particular mapping as my favorite one.
What is clear is that SQL Server is simply calling WideCharToMultiByte without passing the WC_NO_BEST_FIT_CHARS flag. So any of the best fit mappings that exist in the code page are picked up.
At least now you know why a full eight-hour day at work can seem to take forever. Because depending on the code page you use, they can kind of be the same thing....
This post brought to you by "8" and "∞" (U+0038 and U+221e, a.k.a. DIGIT EIGHT and INFINITY)Two characters who are great friends, merely a single best fit mapping away from each other!
I'm sure if you have been reading blogs on http://blogs.msdn.com/ that you have been reading about the Giving campaign. I probably do not get in to all of the activities that people like to do during this time (as much as some people do), but I found it mildly intriguing that today our group is going to be doing a Karaoke Challenge.
The idea is that you can bid to have somebody in the group sing a particular song. And of course others who are intrigued or amused at the prospect of that person singing that song may also put their bids in. Then, at the time of the actual event, that person will either need to sing that song or outbid the people who wanted the song sung. Where it gets very interesting is that people at the event can then raise their bid.
It would ordinarily be something kind of obnoxious, but the fact that all the money is going to charity mitigates things nicely.
When I was looking at the song list, I was amazed at the number of songs that I realized I would probably be willing to do if asked. Since I'm probably only doing from 0-1 songs, I figured it might be nice to capture the official list somewhere for people who are curious about the spectacle of me singing more than one song (which nobody is ever going to see).
Here is the official list in no particular order!). If you are somebody who is able to get me into a bar on a karaoke night, you may find this information valuable. I consider this list something that I'd be willing to do only when either charity is involved or I'm very drunk:
Now there is a lot more than I would be willing to do if the songs were there. For example, there is no Matthew Sweet, no mollycuddle, no Michael Penn, no Pretenders, no other Aimee Mann or Kathleen Edwards or Joni Mitchell songs. And no other Coldplay songs that I would want to sing (since I hate Yellow immensely!).
Now one important thing to keep in mind about this list is it has to be songs and not only like but that I believe I could actually sing (sometimes I miscalculate, like when I tried to do The Scientist (Coldplay) in Amsterdam). They may not even be songs that I would listen to on a regular basis (e.g. Neil Diamond, but I loved the riff that David Spade did on Brother Love in an otherwise entirely forgettable movie a while back).
But it should be an interesting event. I may even sing one of those songs.... something I will report on later for those who are interested. :-)
In Windows 2000, support for several of the ISCII (Indian Standard Code for Information Interchange) code pages was added to Windows:
It happened in the same release of Windows that support for some of these Indic languages was first added (more were added in XP, still more in XP SP2, with the final ones being added in Vista.
It leads people to occasionally ask the question that Bob Eaton asked this morning on the Unicode List:
Does anyone know whether (or why not) it is possible to use the ISCII Devanagari code page (57002) as the default system code page in Windows?That is, there is no code page support for "Ansi" Devanagari, but with the ISCII encoding having code page support, why didn't they associate it with Ansi programs associated with the Hindi locale (c.f. cp. 932 with JIS) ?The ISCII encoding seemed like a logical choice... but it wasn't used...
The answer to the first question is that no you can't, because it is technically impossible. The ANSI and OEM code pages on Windows can act as the ACP and OEMCP for a locale (and the CP_ACP and CP_OEMCP of a default system locale. In order for a code page to be used there it has to be a table-based code page that can be used in both kernel and user modes, not one of "DLL-based" ones that are in the 5xxxx range. And the conversion between ISCII and Unicode is not the simple affair that can be done with a table, it requires a more algorithmic approach....
Additionally, the ANSI Win32 API code throughout Windows has (in many places) a solid "no bigger than two bytes per character" bias, such that any attempt to add a code page that could ever be more than that is not technically possible without updating a great deal of code, code that is provided mainly for backwards compatibility.
So it would have been a logical choice, had it been possible. Unfortunately, it was not. All of these code pages are provided to help move people in India using them to Unicode by allowing them to convert their legacy data....
This post brought to you by "ॠ" (U+0960, a.k.a. DEVANAGARI LETTER VOCALIC RR)
The other day, my manager's manager's manager Delan was looking at various web sites and on one of them there was an unusual display issue on several pages.
Basically, each of the pages in question had text on them that looked like this:
"?/font>"
Very odd, and maybe a little frustrating too.
Of course when you are the head Globalization Infrastructure, Fonts, and Tools, what better place to start then with some of the font experts on your team? I mean, clearly something was messing with a font tag....
In actuality it wasn't a font issue. After a few hours, the net was widened a bit and I ended up on a mail. It kind of reminded me of my MSLU days when a string was converted to the wrong code page when just a few characters were wrong. So based on that, I responded thusly:
I would suggest looking at the source on the page to see what might be next to those font tags, and check the IE detected encoding to see if matches the page's encoding -- it may be a CJK font name that is being misunderstood and combined with bytes of the less than sign.
The page was sent on to me. So I set the encoding in IE (which for me was going through AutoDetect thinking the page was in Windows 1252) to be Chinese Traditional (Big5), and suddenly all of the news items that had bullets (0x95 or U+2022) wrapped in <font> tags had seen the bytes of the bullet and the the less than sign turned into a question mark.
Now as it turns out she was actually having the page Auto Detected as being Chinese Simplified (GB2312), but the results were the same -- U+2022 U+003c (•<) which for me was being read as 0x95 0x3c was for her being converted to "?" (since 0x953c is undefined on both code pages 936 and 950, in the former a lead byte with an illegal trail byte and in the former an unused lead byte with no assigned trail bytes).
The page itself:
http://local.msn.com/t3/?zip=93301
had no charset meta tag and clearly the server was not communicating the charset. We both had the AutoDetect checkbox set (IE6 for me and IE7 for her), but clearly it was not detecting much to distinguish the page from the bias of our own individual locale settings.
Wouldn't the illegal sequence have been a good indication that the AutoDetect guess was wrong? And isn't the lack of any charset bad too? And that lack of other communication about the charset from the server?
Of course it was a page from MSN, so I figure we found at least three bugs in various Microsoft offerings from the exercise, which was actually a lot of fun, too! :-)
This post brought to you by "•" (U+2022, a.k.a. BULLET)
Visual Basic ≤ 6.0 has Unicode strings. But as I point out in Chapter 6 of my book, it converts out of Unicode to the default system code page any time you do just about anything to those strings.
Lots of people try to avoid this by calling the StrConv function with the vbUnicode parameter first, figuring they convert the string to Unicode, VB converts it back out, and all is well.
Ugh.
Remember how I said that VB strings were already Unicode strings?
Well, calling StrConv to convert a Unicode string to Unicode is not a no-op; it actually converts the string to Double Secret Unicode, an encoding not found in nature.
So your string:
My name is Michael.
which is laid out as follows:
004d 0079 0020 006e 0061 006d 0065 0020 0069 0073 0020 004d 0069 0063 0068 0061 0065 006c 002e
will be converted by VB (which thinks it is not Unicode since you are converting it) to:
0000 004d 0000 0079 0000 0020 0000 006e 0000 0061 0000 006d 0000 0065 0000 0020 0000 0069 0000 0073 0000 0020 0000 004d 0000 0069 0000 0063 0000 0068 0000 0061 0000 0065 0000 006c 0000 002e
and this "Double Secret Unicode" is basically meaningless.
Luckily when VB converts it out of Unicode you will get what you are looking for. So maybe you converted twice for no reason. No need to worry, right?
Wrong.
If you look to my posts The new compiler error C4819 and How does it detect invalid characters? then you'll start to get a glimmer of where this method gets into trouble.
Because for most strings in Unicode you do do not have those things that look like NULL values. And because any time you are on a machine whose default system code page does not contain all of the characters in question (and as those posts indicated this is not unheard of), the conversion back from Double Secret Unicode will not perfectly round trip back to the original string. And once you get into Chinese, Japanese, and Korean, the exacting requirements of particular lead bytes and trail bytes can cause you to lose ideographs in all kinds of unexpected ways.
In summary, you will lose information.
The way around this is really quite simple. Any time you are going to pass a string to an external function, declare it ByVal As Long rather than ByVal As String, and then pass StrPtr(<your string>) to that Long. For example:
Public Declare Function CompareStringW Lib "kernel32" ( _ ByVal Locale As Long, _ ByVal dwCmpFlags As Long, _ ByVal lpString1 As Long, _ ByVal cchCount1 As Long, _ ByVal lpString1 As Long, _ ByVal cchCount1 As Long) As Long Public Enum CmpFlags STRINGSORT = &H1000& IGNORECASE = &H1& IGNORENONSPACE = &H2& IGNORESYMBOLS = &H4& IGNOREKANATYPE = &H10000 IGNOREWIDTH = &H20000End Enum Public Enum CSTR_ LESS_THAN = 1 'string 1 less than string 2 EQUAL = 2 ' string 1 equal to string 2 GREATER_THAN = 3 ' string 1 greater than string 2End Enum Public Function CompareString(ByVal lcid As Long, ByVal flags As CmpFlags, ByVal st1 As String, ByVal st2 As String) As CSTR_ CompareString = CompareStringW(lcid, flags, StrPtr(st1), Len(st1), StrPtr(st2), Len(st2))End Function
Public Declare Function CompareStringW Lib "kernel32" ( _ ByVal Locale As Long, _ ByVal dwCmpFlags As Long, _ ByVal lpString1 As Long, _ ByVal cchCount1 As Long, _ ByVal lpString1 As Long, _ ByVal cchCount1 As Long) As Long
Public Enum CmpFlags STRINGSORT = &H1000& IGNORECASE = &H1& IGNORENONSPACE = &H2& IGNORESYMBOLS = &H4& IGNOREKANATYPE = &H10000 IGNOREWIDTH = &H20000End Enum
Public Enum CSTR_ LESS_THAN = 1 'string 1 less than string 2 EQUAL = 2 ' string 1 equal to string 2 GREATER_THAN = 3 ' string 1 greater than string 2End Enum
Public Function CompareString(ByVal lcid As Long, ByVal flags As CmpFlags, ByVal st1 As String, ByVal st2 As String) As CSTR_ CompareString = CompareStringW(lcid, flags, StrPtr(st1), Len(st1), StrPtr(st2), Len(st2))End Function
And if you use this technique, the call will be faster (you avoid two extraneous conversions) and you will never lose data from conversion mistakes.
Best of all, you avoid the evil Double Secret Unicode!
This post brought to you by "٭" (U+066d, a.k.a. ARABIC FIVE POINTED STAR)
You can use the PrintDlg and PrintDlgEx functions to show the Print Dialog Box and Print Property Sheet, respectively. And these dialogs have full support for MUI in that they will do the right thing for your UI language in Windows (though not for the .NET Framework UI culture settings, as I just pointed out here).
But unfortunately, this will not work 100% of the time.
This is not due to lack of work of people on the development or localization teams in Windows; it is due to the fact that there is are Advanced and Properties dialogs that are in many cases are printer-specific and provided within the printer drivers, and not all printer drivers support all of the MUI languages (since the drivers are mostly provided by the companies who make the printers, Microsoft obviously cannot force them to support all languages).
Furthermore, since these are external companies, some may support localized dialogs but ones that do not support MUI; in this case you would install the driver for a particular language and if you wanted to have a different language you would have to run another install progream (possibly even removing the original one first).
There are folks at Microsoft who work with these companies and help to provide info and on how to support MUI, but obviously not every single one of them will necessarily do this.
The best thing that you can do if you run into this situation?
Don't blame Microsoft (in this case, that is; I am sure there are other problems you can blame us for!). Instead send feedback to the maker of the printer and its drivers. Point out the problem. How MUI support might be in their best interest. They hear it from us, but obviously the people who buy the hardware really do have an important voice here (the people who provide the money obviously do!)....
This post brought to you by "Њ" (U+040a, a.k.a. CYRILLIC CAPITAL LETTER NJE)
If you go back to March of this year, I post about how the WinForms DateTimePicker and MonthCalendar do not support culture settings. But if you look at the text, you will see how I had my 'NLS eyes' on, thinking about CultureInfo.CurrentCulture and how different calendars were not supported there.
Putting on my 'GIFT eyes' though, there is a wider issue, and that is the fact that the localized text in all of the Windows Shell's Common Dialogs picks the localization based on the UI language of Windows, rather than the .NET Framework's CultureInfo.CurrentUICulture.
Now this would obviously be the case when you do not have the language resources available -- after all (for example), the Russian version of Windows cannot spontaneously create French resources for the FontDialog class just because you write code to set the Thread.CurrentUICulture property to be new CultureInfo("fr-FR"). And that is because the underlying Font Dialog Box in the Shell loads its resources based on the UI language in Windows.
But the same problem is true if you have the Windows MUI language pack for the appropriate language installed; the common dialogs are still loaded according to the Windows UI language, not the CurrentUICulture that you might have set in .NET.
Of course it is terribly frustrating that the resources might be there yet they still might not be used. But there is no function to change the UI language without logging off, and of course it is not very likely that a managed application running in the context if the desktop can survive past a logoff/logon....
The ironic1 part is that the Windows common controls do have a workaround for this issue, even though there do not seem to be many areas of localized text in them. If you look at the topic entitled Localization Support for the Common Controls, it talks about the InitMUILanguage function (which initializes the UI language for the common controls within a process) and the GetMUILanguage function (which retrieves the UI language for the common controls within a process, or the user UI language if it has not been set). In theory there is no reason why you could not call this function at any point, but in practice it is unclear from the documentation if it will work when set multiple times (hell, I am still having trouble picturing when it works at all, since I cannot think of any localized text!).
But in any case, for the time being it looks like the CurrentUICulture and Thread.CurrentUICulture cannot be used to affect the Windows Shell's Common Dialogs that are wrapped by Windows Forms in the .NET Framework. If you want to support this, you will need to wrap the Shell functions in comdlg32.dll and then use the HOOKPROC to do a bunch of owner draw work and/or custom template loading to get this done. Which is a bit much for any project....
1 - When I say ironic here, I mean it in the Alanis sense. You know, like rain on your wedding day. In other words, when I say IRONIC what I mean is UNFORTUNATE.
This post brought to you by "ǻ" (U+01fb, a.k.a. LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE)
Mike pointed me at this article by Charles Petzold entitled Does Visual Studio rot the mind?
Based on a talk he gave for the NY .NET Developer's Group, here is the abstract:
Visual Studio can be one of the programmer's best friends, but over the years it has become increasingly pushy, domineering, and suffering from unsettling control issues. Should we just surrender to Visual Studio's insistence on writing our code for us? Or is Visual Studio sapping our programming intelligence rather than augmenting it? This talk dissects the code generated by Visual Studio; analyzes the appalling programming practices it perpetuates; rhapsodizes about the joys, frustrations, and satisfactions of unassisted coding; and speculates about the radical changes that Avalon will bring.
I find myself agreeing with a lot more of it then I thought I would after reading the abstract. Definitely worth a read....
Regular reader Maurits asked:
The "official" euro is U+20AC http://www.fileformat.info/info/unicode/char/20ac/index.htm But U+0080... while officially is a "control character"... has a strikingly familiar image: http://www.fileformat.info/info/unicode/char/0080/index.htm What happened there?
Well, I cannot speak for someone else's web site, of course. :-)
If I had to guess, I would scour my brain and then recall the fact that in almost every single Windows code page the Euro is located at position 0x80.
Bonus points for anyone who knows the two exceptions to this, and the manifestation of each of the exceptions -- WITHOUT looking at the code page tables. You are all on the honor system on this one!
So perhaps it was based on a font that, having nothing better to do with a control character at U+0080 just ended up shoving a Euro in there, much the way one tucks a dollar in an out of the way place for a rainy day? :-)
This post brought to you by "€" (U+20ac, a.k.a. EURO SIGN)
Another one of those "new in Vista" posts. :-)
Unicode has added a great deal to support mathematics, from Unicode Technical Report 25 (Unicode Support for Mathematics) to the various Mathematical subranges in Unicode (see the Mathematical Symbols column in the Code Charts for Symbols and Punctuation).
My favorite range is the Mathematical Alphanumeric Symbols block in Unicode, which currently has all of the characters from U+1d400 to U+1d7ff (almost 1000 in all, with some spaces that were left in, as you can see from the code chart).
Why is it my favorite?
Well, I was having a conversation a few years back with Murray Sargent of Microsoft (one of the representatives of MS at Unicode Technical Committee meetings and a co-author of UTR #25). He was explaining why Unicode, which is generally speaking a plain text standard, was going to approve a block of characters that included many different letters and numbers with bold, italic, and other variations usually reserved for "rich text" outside the scope of Unicode.
"It is all about mathematics, and representing it in plain text," he explained. And he has a point; while I may use bold or italicized text for emphasis, in mathematics there is actual semantic meaning that is expressed in symbols an variables that have such attributes.
At that point, thinking about collation, I asked him if there was ever a time that it would be interesting or important to fold those differences together, for all of the following:
At first, Murray thought I was trying to make them all equal, and objected strenuously to that; luckily I had something different in mind. I pointed out some scenarios:
And he definitely saw the benefit to such a collation.
So, after this conversation (and a few others with other various math experts), in Vista a special LCID is being added:
0x0001007f (MAKELCID(MAKELANGID(LANG_INVARIANT, SUBLANG_NEUTRAL), SORT_INVARIANT_MATH))
It is an alternate sort for the invariant locale, because mathematics is independent of specific locale (kind of like invariant is!).
This locale causes each of the above letters to be a mere secondary and/or tertiary difference away from everything else on the list. The same principles were applied to all of the Greek letters and numbers in the block.
Please note that this is not something that can be selected in Regional and Language Options as a locale (neither can invariant, so obviously an alternate sort of invariant cannot be chosen). But it can be used in any programmatic situation where one is looking to compare strings, find within strings, or create sort keys.
And it is right there in Vista, for those who are mathematically inclined....
This post brought to you by "𝐀" (U+1d400, a.k.a. MATHEMATICAL BOLD CAPITAL A)
Warren asked me (via the contact link):
First, I'd like to say I enjoy reading your blog. Second, I've got a question about retrieving currency names. I'd like it if you could answer my question (if the solution is even possible). If it's not possible, maybe you could add a blog entry as to why it was decided not to have this available. I'm trying to enumerate all of the currency names for a system (I'm using MFC C++), like the second table in the following MSDN page:
First, I'd like to say I enjoy reading your blog. Second, I've got a question about retrieving currency names. I'd like it if you could answer my question (if the solution is even possible). If it's not possible, maybe you could add a blog entry as to why it was decided not to have this available.
I'm trying to enumerate all of the currency names for a system (I'm using MFC C++), like the second table in the following MSDN page:
e.g. I would like to programmatically retrieve "Canadian Dollar" from the system. I have no problem getting the ISO symbols, country name, three letter codes (etc) but I can't find a way to get the currency name. It seems like a common thing to list for a country (or it wouldn't be listed in the table on that MSDN page ?!?)
Thanks
Warren
Well, first I would like to day thank you very much. :-)
Second, I have some bad news. There is no currency object that can be used to enumerate currency names. However, currency information can be returned for every locale in Windows, by calling the GetLocaleInfo function with either the LOCALE_SENGCURRNAME or LOCALE_SNATIVECURRNAME LCType values.
This post brought to you by "₯" (U+20af, a.k.a. DRACHMA SIGN)