Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Just over a week ago I was posting about Unicode? Zip don't need no stinking Unicode!
Well, as Heath points out in this post, Palm has re-released their update, sans the non-ASCII characters in file names.
Given the current limitations in ZIP and the additional complications I pointed out in that earlier post, I suppose I could hardly blame them....
It is funny how Microsoft can't really win in this kind of situation -- if we do nothing, then we are branded a bunch of ignorant provincials for the limitations in the platform; if we provide any kind of extension to compression, then we are evil for ignoring existing standards in favor of our proprietary solutions.
Of course those two groups are (mostly) different people, so I guess it makes sense.
But the fact that we need to keep things in ASCII is more than a little disappointing. :-(
This post brought to you by "®" and "™" (U+00ae and U+2122, a.k.a. REGISTERED SIGN and TRADE MARK SIGN)
I guess there is no denying the problem. As this post from over as year ago proves, I am not even as smart as a starling.
But that's okay, because as Mark points out on Language Log, no one else is either....
:-)
Oh, just to keep the balance going (I am behind again!):
)))))))))))))))))))))))))))))))
This post brought to you by "]" (U+ff3d, a.k.a. FULLWIDTH RIGHT SQUARE BRACKET)
Remember how I have talked in the past about the difference between two different purposes for collation (comparison vs. identity, or alternately the difference between CompareString and EqualString)?
(if not you can follow those links for some of the backstory!)
Over on the BCLTeam WebLog, Krzysztof Cwalina explains about Why IComparable<T> does not extend IEquatable<T>.
Basically the same issue -- in most cases, one does not need to answer both the "which comes first?" and the "are they equal?" questions when two objects of the same type are being looked at.
Even when they are fairly basic items like strings, when both questions are valid in different contexts, the use of the strings themselves often suggests that only one of them is appropriate at a given time....
This post brought to you by "<" (U+ff1c, a.k.a. FULLWIDTH LESS-THAN SIGN)
You don't want to ever use DEFAULT_GUI_FONT. Just like the thread locale, AVICAP32.DLL, and SetLocaleInfo, it really stinks.
Raymond Chen talked about some of its history of this GetStockObject derived font last year when he answered the question What are SYSTEM_FONT and DEFAULT_GUI_FONT?
He even explained one of the big problems about it in international scenarios:
"...on a typical US-English machine, they map to bitmap fonts that do not support ClearType."
But beyond that, there is another big problem. The problem is that the actual font it resolves to is SKU dependent; it is not dependent on any of the various locale settings on the machine. It therefore becomes one of those "features" that keeps an MUI system with a changed system/user locale from being a 100% substitution for a localized system in testing.
And even in Vista, where the single world wide binary goal has really been largely achieved, the backcompat issues that come up with (for example) a Korean user who expects a particular legacy application's font settings to not change disallow 'fixing' this behavior. So we are stuck with this bit of cruft that will forever differentiate some particular language SKUs of what is otherwise a SWWB model system....
Don't forget that as Raymond talked a bit about, all of the GetStockObject derived fonts are no longer what the help topic claims to be -- they are not used by the system for anything, and remain only for the sake of applications that have depended on these ugly beasts.
So please stay away from DEFAULT_GUI_FONT; it stinks, truly!
This post brought to you by "ᇸ" (U+11f8, a.k.a. HANGUL JONGSEONG HIEUH-PIEUP)
Hugh may have unintentionally come up with a great new slogan for the Microsoft Locale Builder....
Though it probably would never make it past the lawyers.... :-)
A few days ago, I wrote a post entitled Why do we call w 'double u' -- doesn't it look more like a 'double v' ?
The post itself had nothing to do with the title; it was actually about the Swedish Academy changes to consider 'w' and 'v' to be separate letters, as well as some of the potential consequences for Microsoft products if such a change is reflected in actual customer usage that needs to be captured in the future.
(the title was actually a little experiment to see whether a catchy title would get more attention in comments than the substance of the post; you can look for yourself what was proven in that experiment.)
Anyway, in the process of my speculation on products and what they might do in the future, I forgot that Vista already has such a change, in a country that is quite near to Sweden and Finland!
It has to do with the use of "aa" in Danish and Norwegian.
Both languages have the same basic alphabet -- the 26 latin letters used in English and most places, plus æ, ø, and å. Though in earlier days the letters aa were used instead of å (in fact in Danish this letter was not added until a spelling reform that happened in the late 1940s -- it was already widely used in both Norwegian and Swedish).
Now the Danish and Norwegian sort in Microsoft products was using the special sorting of the aa as a unique letter after z that was basically equal to å.
But while this was a (relatively) recent addition for Danish, it has been in Norwegian for much longer. And feedback had come in from customers such as the following:
If you take a Norwegian dictionary you would in fact find the German town Aachen as one of the first entries under the letter A, but Explorer will put it at the end after the letter Z. As I pointed out earlier Aa is never interpreted as Å for anything but family names dating pre-1917, and even then it is not uncommon in Norway to sort those names as double A. A person with the name Aalberg might frown to see his name listed at the top, but would be far from surprised. However a person searching for the town Aachen would not understand why Explorer put it at the end.
Feedback such as this led to some more investigation and the final assesment was made that the time had come to remove the aa entries from the compression tables for Norwegian.
"But Michael," you might be asking, "What about Danish?"
Well, the answer there is that it is still too common to expect it to be treated as a letter -- and there are way too many textbooks and websites that put an extra (aa) at the end of the alphabet. Danish could not make the same change that Norwegian needed.
So, in Vista, the Norwegian tables have had the three variants of the aa compression removed, while the Danish (and also the Greenlandic, which uses the same sort as Denmark) have not. Therefore, these formerly unified sorts will now been disunified.
That theoretical question I posed the other day has become decidely non-theoretical! :-)
You may be filled with one or more of the following questions:
The answer for questions 1-5 is simple -- not a bleeding thing. We can't change those prior version results.
The answer for question 6 is also simple -- it is going to be changed. The two sorts with the two different LCIDs (0x0406 versus 0x0414/0x0814) and different names (da-DK versus nb-NO/nn-NO) will return two different results.
The answer for questions 7-9 is also simple (though that may change at some point!) -- I do not have a freaking clue. But you can bet your lunch money that I will be asking people some questions about this issue for WinFS and for the next version of SQL Server and for the upcoming version of Access that ships with Office 12.
This post brought to you by "ø" (U+00f8, a.k.a. LATIN SMALL LETTER O WITH STROKE)
Regular reader Mike Lippert asked the Suggestion Box:
Hi Michael, Your blog is great and I really appreciate all you've written. I just ran into some odd behavior I was wondering if you could explain. Our app uses the Symbol font to display certain characters. We recently converted the app from ANSI to Unicode. While testing QA set the system codepage to Russian (charset cyrillic 1251). Now many of characters drawn using the symbol font show as square boxes. One of the characters displaying as a square box is at 0xD9, which is a logical and (U+2227). In the 1252 charset, that has the same codepoint as in Unicode U+00D9 (Latin capital letter U with grave). In the 1251 charset that position contains the Unicode U+0429 (Cyrllic capital letter shcha). Since our app is now Unicode the MFC CDC TextOut maps to TextOutW. Here's the odd behavior: when TextOut is called with the symbol font selected in the DC and a string consisting of the single Unicode character U+00D9, a square box is displayed. When it is called with the character U+0429, the "logical and" glyph is displayed. So what seems to be happening when drawing with the symbol font selected, is that the Unicode string is converted to the current system codepage and those codepoints are drawn. Is that really what's going on? I couldn't find any documentation to that effect... Thanks, Mike
He then followed up the next day with: Michael, If you've got a sec to look at the topic I just posted above I'd appreciate it as I'm trying to figure out how to work around that behavior now. If you can't I totally understand, and I'll come up with something. Thanks, Mike Lippert ps I understand if you want to delete this comment as it isn't really a topic request, but was the best way I could think of to communicate w/ you. Apparently Mike thought he would make me his personal support line representative. I decided to get my revenge by pointing this out. :-) Now I am the first person to tell people to move to Unicode, believe me. But symbol fonts aren't really Unicode. In fact, like I pointed out in More than you ever wanted to know about CP_SYMBOL, GDI and NLS can't even agree on how to try and fit them into the "character" metaphor. In this case, it is clear from the behavior that Mike is seeing that the claim folks in GDI made that "GDI maps by a different scheme and will accept U+0020 - U+00ff" is not always going to be true -- especially (as in this case) if you are calling an MFC method that maps the bytes to Unicode for you before the call to TextOut/ExtTextOut happens.... So although I am a fan of Unicode, these symbols aren't Unicode in their current form -- so making an app Unicode but passing on symbol bytes like this will cause them to be mapped using CP_ACP. Which is pretty much guaranteed to be wrong. To fix? Well, if you make sure to pass the symbols as the appropriate Unicode characters -- either by converting the bytes you have directly with a MultiByteToWideChar call with the CP_SYMBOL code page, or making sure to put them in that U+0000 - U+001f, U+f020 - U+f0ff range that GDI will recognize as being symbol font stuff OR you could just not do this one piece with Unicode at all -- symbols are just as happy not having to be in Unicode. So the advice in the title of this post can help a lot! Then (with any of those three methods) you should be able to see the symbols from that symbol font. Now at this point I will apologize to Mike Lippert for teasing him, hopefully he won't be too angry (and I doubt I have enough readers to start scaring away the ones who aren't insufferably rude!). Though I will say to everyone that you may want to look at the text in the Contacting Michael link about looking here for Product Support. I might have to add that the punishmentfor violations may be a tiny bit of good natured ridicule.... :-) This post brought to you by "∑" (U+2211, a.k.a. N-ARY SUMMATION)
Michael, If you've got a sec to look at the topic I just posted above I'd appreciate it as I'm trying to figure out how to work around that behavior now. If you can't I totally understand, and I'll come up with something. Thanks, Mike Lippert ps I understand if you want to delete this comment as it isn't really a topic request, but was the best way I could think of to communicate w/ you.
Apparently Mike thought he would make me his personal support line representative. I decided to get my revenge by pointing this out.
Now I am the first person to tell people to move to Unicode, believe me.
But symbol fonts aren't really Unicode. In fact, like I pointed out in More than you ever wanted to know about CP_SYMBOL, GDI and NLS can't even agree on how to try and fit them into the "character" metaphor.
In this case, it is clear from the behavior that Mike is seeing that the claim folks in GDI made that "GDI maps by a different scheme and will accept U+0020 - U+00ff" is not always going to be true -- especially (as in this case) if you are calling an MFC method that maps the bytes to Unicode for you before the call to TextOut/ExtTextOut happens....
So although I am a fan of Unicode, these symbols aren't Unicode in their current form -- so making an app Unicode but passing on symbol bytes like this will cause them to be mapped using CP_ACP. Which is pretty much guaranteed to be wrong.
To fix? Well, if you make sure to pass the symbols as the appropriate Unicode characters -- either by
OR you could just not do this one piece with Unicode at all -- symbols are just as happy not having to be in Unicode. So the advice in the title of this post can help a lot!
Then (with any of those three methods) you should be able to see the symbols from that symbol font.
Now at this point I will apologize to Mike Lippert for teasing him, hopefully he won't be too angry (and I doubt I have enough readers to start scaring away the ones who aren't insufferably rude!).
Though I will say to everyone that you may want to look at the text in the Contacting Michael link about looking here for Product Support. I might have to add that the punishmentfor violations may be a tiny bit of good natured ridicule.... :-)
This post brought to you by "∑" (U+2211, a.k.a. N-ARY SUMMATION)
I have talked about digit substitution many times in the past.
I was reminded of it recently when developer Kollen pointed put a pretty lame article:
I have a bug on parsing fullwidth Unicode digits and I noticed the (poorly named) "How to: Parse Unicode Digits" section of the .NET docs that indicates this behavior is by design. Are there any alternatives? Why doesn't this API support more than just the ASCII equivalent digits in Unicode?
Kollen is right, this article is poorly named. But beyond that, what is the benefit of putting together a huge function with no other purpose than to not work? How many times will people not read the surrounding text and copy/paste that code into their applications?
Yuck!
Well, at least the answer to Kollen's other question can be found here.... :-)
This post brought to you by "9" (U+ff19, FULLWIDTH DIGIT NINE)(a Unicode charascter that is dressed to the nines....)
Back at the Unicode Conference, after the "Design Principles for A Regional, Multilingual Keyboard" birds-of-a-feather, I had a chance to talk with Klaas Ruppel, who has been helping with the Finnish government standards.
(Among other things, he gave us some data about how the Cyrillic script versions of Sami work to help with our collation efforts. I'll talk more about this another day....)
One of the interesting things he mentioned was something that Raymond Chen mentioned in passing about Swedish collation:
(In marginally related news, the Swedish Academy recently released its latest official Swedish word list, and it changed its longstanding policy and now lists the words beginning with "W" separately from words beginning with "V". Up until now, "W" and "V" had been considered merely typographical variants of one another and had been treated as identical for alphabetization purposes.)
For the record, neither the government contacts in Sweden nor the MS subsidiary PMs in Sweden have asked Microsoft to follow this particular recommendation from the Swedish Academy, in part due to the general reluctance that the Swedes have to cause their sort to be different from the one in Finland (and the Finns have not agreed to make this change at this time).
it will put an interesting cat among the pigeons for Access 12, Jet 4.0, SQL Server 7.0, SQL Server 2000, and SQL Server 2005, given the fact that they have folded the two locales into a single sort (called Swedish/Finnish in Access and Finnish_Swedish in SQL Server).
As I mentioned in International Features of SQL Server 2000:
It should be emphasized that the developers of SQL Server are not "political" people and there really is no desire to offend any one country/region by asking them to "use another country/region's sort order." In fact, someone living in Serbia and Montenegro may not have to worry about using the Croatian sort order; because both the Croatian and Serbian languages use the same collation, the name is arbitrary. In working with customers in other countries/regions, just use the numbers. because the names are really arbitrary descriptions. What is most important is that you can choose a collation that will allow your data to be handled appropriately.
Of course what will be most important down the road (if the Swedish language goes along with the recommendations and Finnish language does not) is how they plan to deal with the disunification of the two sorts!
There is no way to know what will happen eventually, though luckily change will not come too terribly quickly, if it does come....
This post brought to you by "w" and "v" (U+0077 and U+0076, a.k.a. LATIN SMALL LETTER W and LATIN SMALL LETTER V)
I honestly can't believe how strange the conversations I have seem to get, sometimes.
I mean, Cathy came by to grab a snickers bar from the candy dish, and she was fussing with the stuffed cows on the bookshelf. She was trying to figure out why they were all facing the books instead of out at the room. I pointed out that this is just something Shawn does while he is telling me that he is stealing something from the candy dish.
So as she was sitting there repositioning the cows to more outward facing positions I asked her not to hurt the cows (the three stuffed cows and one beanie baby cow are all I have left of the Godot project, UniCOWS!).
She seemed shocked and offended that I suggested she would ever hurt them, which forced me to remind her of the "cow on wheels" incident.
We were at Crate and Barrel waiting on a dinner reservation (meeting someone else there, if memory serves), and Cathy had found one of those European toys, a 'cow on wheels', something like this one:
She was pulling it around by a string that was attached, and she accidentally ran it into the wall.
Shocked, I asked her how did she think Julie would feel if she saw Cathy's little exercise in cow abuse?
She paused, and said apologetically "you're right."
And then she proceeded to run it into the wall repeatedly. I was certain that she'd be giving them a credit card to pay for damage to the wall (and the cow!).
Of course at this point I remembered the recent posts from over at Language Log about the cow, and I asked whether that cow on wheels (assuming it was still ambulatory after Cathy's exercise in bovined abuse) clouded the issue of the "cow a motor vehicle" further....
Occasionally we manage to get work done there, too. I promise.
This post brought to you by "⺧" and "𐂌" (U+2ea7 and U+1008c, a.k.a. CJK RADICAL COW and LINEAR B IDEOGRAM B109F COW)
It is easy to take cheap shots at documentation writers for their mistakes.
But it is usually placing the blame in the wrong place, since in many cases it is the actual functionality that was confusing first. So if the writer's attempts to clarify that which is hard to explain fall short then it falls upon the PM/developer/tester of the functionality to step up and help with that....
(In fact this week we are doing a doc. review for Vista to try and help with this!)
Anyway, way back near the beginning of this whole blogging adventure for me, I posted about API Consistency and Developer Comfort. In it I talked about some of those consistencies between small groups of functions within the Win32 API.
It is a comfort thing, as I said. But sometimes it can be a burden.
Let's take return values for example. In most of the NLS API that does any kind of conversion, transformation, or retrieval, the pattern is simple:
It is simple, sure. And pretty consistently applied across most of the functions.
Of course one problem with this pattern is that you have to call it at least twice to get the size to use, or else risk allocating too much. Another is that there is a performance hit to getting the exact size. And yet another is that to not touch the destination buffer, the function may have to allocate.
So, when Shawn did the dev work on the NormalizeString function, a different pattern was used:
On success, the function returns the length of the normalized string in the destination buffer.If the destination buffer is NULL or if cwDstLength is zero, the return value is the estimated buffer length required to do the actual conversion.If the string in the input buffer is null-terminated or if cwSrcLength is -1, then the string written to the destination buffer will be null-terminated and the returned count of characters will include the terminating null character. If a problem occurs, the function return will be less than or equal to zero. The application should call GetLastError, which will return one of the following values: ERROR_SUCCESS No error; this occurs when the actual size of the output string is zero. ERROR_INSUFFICIENT_BUFFER Need a bigger destination buffer. The return value is the negative of a better estimated guess of the required length. Try the conversion again with a buffer of -(Return Value) size. ERROR_INVALID_PARAMETER Input pointers were incorrect or normalization form was incorrect. ERROR_NO_UNICODE_TRANSLATION Invalid Unicode was found in string. The return value is the negative of the index of the location of the error in the input string. ERROR_BADDB The configuration registry database is corrupt.
On success, the function returns the length of the normalized string in the destination buffer.If the destination buffer is NULL or if cwDstLength is zero, the return value is the estimated buffer length required to do the actual conversion.If the string in the input buffer is null-terminated or if cwSrcLength is -1, then the string written to the destination buffer will be null-terminated and the returned count of characters will include the terminating null character. If a problem occurs, the function return will be less than or equal to zero. The application should call GetLastError, which will return one of the following values:
There is inded a lot that is different in there, and you will probably notice that if pretty aggressively tries to deal with the three problems I mentioned. In fact, there are really only two real problems with it that I can see:
Now if anyone decides that the documentation for this function is confusing, I think it is pretty obvious that it has more to do with the underlying functionality than the actual SDK topic. :-)
Everything up until now has been an introduction to a completely different example of overloading meanings that is much, much worse!
Let's talk about the LOGFONT structure.
It's most important characteristic (for our purposes) is that it is not an actual font. It is a descripion of characteristics of either an actual font (e.g. if it is returned by the GetObject function when it is passed an HFONT) or what a developer might want from a font (e.g. if it is being passed to the CreateFontIndirect function, or to the EnumFontFamiliesEx function -- though for latter only a few members are looked at).
And its one member that is most completely overloaded to the point of confusion is the lfHeight member:
Specifies the height, in logical units, of the font's character cell or character. The character height value (also known as the em height) is the character cell height value minus the internal-leading value. The font mapper interprets the value specified in lfHeight in the following manner.
For all height comparisons, the font mapper looks for the largest font that does not exceed the requested size.This mapping occurs when the font is used for the first time.For the MM_TEXT mapping mode, you can use the following formula to specify a height for a font with a specified point size:
lfHeight = -MulDiv(PointSize, GetDeviceCaps(hDC, LOGPIXELSY), 72);
So basically, there are three different ways that this member can specify the height (depending on whether the 32-bit value in lfHeight is greater than, less than, or equal to zero. None of which map to what most humans (developer or otherwise) would use to specify a font size.
And, to add insult to injury, in attempting to translate between what those humans would want and what the member specifies, it gives a formula that most people do not understand that depends on a functionality that it does not explain (the MM_TEXT mapping mode). You can find out what the MM_TEXT mapping mode is by looking at the GetMapMode and SetMapMode functions. Though they take an HDC and it may be confusing to many people what happens to fonts in different mapping modes since the font and the mapping mode are set in such disconnected contexts....
But let's get back to those three different usages of the lfHeight member. They also have descriptions that start with "The font mapper..." which would indicate they forgot that the LOGFONT is also used to describe a font after the mapping has occurred. Oops?
Another oops along the same line -- there is no indication in the docs about which type of return is expected if you use GetObject with an HFONT. Or if the expectation is that the result would be inconsistent or not. Or how to definitively get the size in such a case.
And then of course, two of them talk about device units (which are also never defined within the function). You can look at the topic Device vs. Design Units to get some clarity around those, though it is of course completely unclear what insight talking about them here provides. How about, since the LOGFONT is in no way tied to a device, avoid mentioning them here entirely?
Ok, never mind all that communal critiquing, Let's take the and take those apart one at a time:
When it is zero, "The font mapper uses a default height value when it searches for a match." What default value? How is it determined? Hmmm... a quick look at the source is very suggestive of a size of 12pt being considered a default, at least since NT 4.0. Hopefully it is never returned when one is querying font information like in a GetObject call. :-)
When it is less than zero, "The font mapper transforms this value into device units and matches its absolute value against the character height of the available fonts." Once again, an undefined phrase "character height" -- what is meant there? Maybe they mean the TEXTMETRIC/NEWTEXTMETRIC's tmHeight member, which "Specifies the height (ascent + descent) of characters." That kind of makes sense, right? Unfortunately, this would be incorrect -- the calculation in this case is based on the the UnitsPerEm mentioned in that Device vs. Design Units topic, "the em square size for the font". And that topic even says how to get it -- by using the ntmSizeEm member of the NEWTEXTMETRIC structure.
When it is greater than zero, "The font mapper transforms this value into device units and matches it against the cell height of the available fonts." And of course the cell height is also not defined. Though this one actually is based on the TEXTMETRIC/NEWTEXTMETRIC's tmHeight member, which "Specifies the height (ascent + descent) of characters" specified in the font.
So given all of the above, at least this lfHeight member can theoretically be figured out, too.
Though it is not particularly intuitive, is it?
On Friday, typography PM Judy came by my office to explain to me that this was the topic that her fellow typography PM Carolyn had mentioned was really confusing. It should be of no suprise to anyone that even people who work in typography find this member to be really confusing -- there is no one outside of a few people over in GDI who would not be confused at this very low level member that has been partially exposed in this not-quite-so-low-level structure.
I mean, how often can we really expect that people will specify it correctly, except when they use the built-in specified formula that maps to point size of the font? And in that case why did they even bother to expose it any other way?
It does help things like old time KB article 74299, which actually explains the things I did above in more practical terms, and is still just as valid beyond NT 4.0, despite what the "Applies To" section claims.
And it helps third parties like Dr. Dobbs come up with useful topics like Font Creation and Rounding Differences when trying to understand why the MulDiv function is needed here (and the problems in the area help people understand why this one usage of MulDiv is easier to find than documentation on the function itself!).
This would be a case where the managed world made it all a little easier, not just in the GDI+ classses around fonts but in helpful topics like this one on interrogating fonts....
So, as bad as the documentation here may be, I think it is fair to blame the actual implementation for any actual confusion here....
This post brought to you by "ཛྷ" (U+0f5c, a.k.a. TIBETAN LETTER DZHA -- a character that is not afraid to use its descender powers!)
Back in January of this year, reader J. Daniel Smith asked:
Did I miss your other blog entry on this topic? ----- re: Comparing Unicode file names the right way Tuesday, October 18, 2005 3:51 PM by Michael S. Kaplan Ah, the reasons I am resistant to *that* particular path are the subject of another blog entry, coming soon! :-)
My sense of temporal logistics is pretty skewed compared to everyone else's. :-)
He was referring to this post, which was from last October, and the question that he was asking:
In <3 weeks, there's not going to be the big gulf between managed & unmanged code (I'm talking about VS2005 & C++/CLI). So if you show us the unofficial 100% correct "static int Compare(FileSystemInfo a, FileSystemInfo b)" in managed code, it will soon be fairly straight-forward to use it from unmanaged code. (Of course, this assumes that you're OK doing such a thing...there may be other reasons for staying completely in unmanaged code). Yes, I realize Whidbey is frozen harder than the Antarctic ice pack…I thought the "static" function taking two arguments made things clear; I guess not. Sorry. Show us the exact C# code for your FileNameCompare() utility function; that way there can be no confusion as to the proper technique. My preference would be to take stronger-typed FileSystemInfo parameters (rather than just strings)…and also to indicate that eventually such code should perhaps be part of that class.
I probably would have taken after Triumph the Insult Comic Dog and said that Whidbey was frozen colder than an Iditarod huky's nutsack, but that is neither here nor there, he was right in any case. :-)
The reason why I am troubled about the development architecture path that leads to a
static int Compare(FileSystemInfo a, FileSystemInfo b)
is that the FileSystemInfo class is of course just a base class for the FileInfo and DirectoryInfo classes, which contains a lot of information:
Given the level of complexity inherent in these objects, the exact definition of a meaningful comparison between two FileSystemInfo objects is far from clear -- in any given situation, a developer could clearly be expecting it to mean any of these things, and not all of those varied and sometimes conflicting definitions are very farfetched!
However, the fact that the OrdinalIgnoreCase semantic is a 100% match for what many of the methods and functions that make use of the NTFS/FAT filesystems it itself clear (to anyone who reads this blog? <grin>), and it really is comparing the file names (which are in fact strings), not FileSystemInfo objects, which are something else entirely.
Which is not to say that no one would ever add such a method; it happens all the time over in the Windows Shell -- it is why the Shell Lightweight API has so many wrappers around our stuff!
But this leasd (unfortunately) to another, more insidious, problem:
And there are theoretical benefits to such a method if the various underlying filesystems exposed their actual comparison semantics such that one could wrap up the case sensitive nature of CDFS and the case insensitive nature of NTFS/FAT32/FAT and all of the other various differences.
The problem there is that the information is simply not exposed and until it is, the only way to do such tests with 100% accuracy is to create the file, which is not always possible for a myruad of reasons related to permissions and media.
And what happens when you have multiple media forms connected in a single path via symbolic links? In such cases support in the underlying file systems is crucial -- since there is NO ONE TEST that would suffice.
I had high hopes that I could convince Kevin Phaup to require such a comparison method when he was still at Microsoft and that he could convince the people in charge of the various drivers. But now he is no longer at Microsoft and there is no one on that side of the company who I can say I have gone to a party with, or seen their house. Which for some reasons dashes my hopes a bit.... :-(
But anyway, these are but a few of the reasons why the direction of a FileSystemInfo wrapper to handle comparison is something that I am highly resistant to.
This post brought to you by "𐃺" (U+100fa, a.k.a. LINEAR B IDEOGRAM VESSEL B305)
(This post is so frighteningly offtopic that it is impossible to quantify the issue!)
Last year some time I was back in Cleveland, and I saw my first episode of Ghost Whisperer. I made the offhand comment during a commercial break that Jennifer Love Hewitt in this show (with her new darker hair) looked a lot like Alyssa Milano (if you went back a hairdo or three -- both Alyssa and Rose among the Charmed coven change their hair styles and colors a lot!).
I wasn't saying they were twins or anything, just that it would be easy to confuse them at first glance....
My father disagreed with me and suggested that I might be a little bit crazy. Or if nothing else that I did not know what actresses looked like :-)
Anyway, looking at the following two pictures, as a baseline example:
It should be easy enough to see why I suggested there might be a similarity, I think.
Which is not to say that I am not crazy; but if I am it is for other stuff, not for this! :-)
Assuming you are on this planet, you have probably heard about the name of Tom and Katie's new baby.
Of course I lost the faith I had in Katie (dating from Teaching Mrs. Tingle) by the time the second season of Dawson's Creek was over, and I don't think I ever had faith in Tom.
Ah well, at least he was supposedly kidding about the placenta eating bit.
Are there any non-weird actors out there?
In any case, over on Language Log, Benjamin Zimmer gives a good summary of the whole Tom Cruise/Katie Holmes baby naming announcement snafu in the post Adventures in Celebrity Onomastics.
Not sure how many Jews there are in Oklahoma, but I thought I'd throw in my own theory about the name (see the post title!). :-)
So, the question I got in email from someone named Kenny was quite timely:
The .NET Framework has a file named normalize.dll, and Vista has one named normaliz.dll. What is the difference between these two files other than the name?
I say timely because yesterday was an interestingly busy day....
In the morning I was just doing laundry and such. That part was kind of boring.
But in the afternoon, I decided to use my FlexPass to run some errands downtown. I had had good luck with the scooter on the bus previously, and I had the day, so I figured why not?
First I wanted to go to Caffe Bella to give them some posters for the show that Kristin Connell is doing there on May 5th:
I left them in the care of Kim to hang up appropriately, and then I made my way over to the UW Medical Center to get some bloodwork done (not sure what they were testing for exactly but they got three lavendar tops, one red top, and two jungle tops out of me; I was a bit worried losing so much blood right after I donated a pint but I was encouraged not to worry!).
Anyway, I was waiting to head home on the 540 bus but there was a bit of a wait. While I was waiting around in front of UWMC for the bus, my phone rang. It was my old friend Liz!
Liz and I go way back, like about 15 years back, give or take. When we first became friends there was another Elizabeth in our circle who went by Liz also. But since the other Liz was really wild, we called her 'Crazy Liz' or 'Wild Liz'. And then my friend the not-wild one we just called 'Liz.'
But then, just after that fifth book in the Hitchhiker's Trilogy (Mostly Harmless) came out, the text that inspired us all further and became the subject of a late night discussion over kamikaze shots went something like this:
Trillian picked up the sandwich and looked at it. She sniffed it carefully. "Try it," said Arthur, "it's good." Trillian took a nibble, then a bite and munched on it thoughtfully. "It is good," she said, looking at it. "My life's work," said Arthur, trying to sound proud and hoping he did not sound like a complete idiot. He was used to being revered a bit and was having to go through some unexpected mental gear changes. "What's the meat in it?" asked Trillian. "Ah yes, that's um, that's Perfectly Normal Beast." "It's what?" "Perfectly Normal Beast. It's a bit like cow, or rather a bull. Kind of like a buffalo in fact. Large, charging sort of animal." "So what's odd about it?" "Nothing, it's Perfectly Normal." "I see." "It's just a bit odd where it comes from." Trillian frowned, and stopped chewing. "Where does it cone from?" she said with her mouth full. She wasn't going to swallow until she knew. "Well, it's not just a matter of where it comes from, it's also where it goes to. It's all right, it's perfectly safe to swallow. I've eaten tons of it. It's great. Very succulent. Very tender. Slightly sweet flavor with a long dark finish." Trillian still hadn't swallowed. "Where," she said, "does it come from, and where does it go to?" "They come from a point just slightly to the east of the Hondo Mountains. They're the big ones behind us here, you must have seen them as you came in, and they sweep in their thousands across the great Anhondo Plains and, er, well, that's it really. That's where they come from. That's where they go. Trillian frowned. There was something she wasn't quite getting about this. "I probably haven't made it quite clear," said Arthur. "When I say they come from a point to the east of the Hondo Mountains, I mean that's where they suddenly appear. Then they sweep across the Anhondo Plains and, well, vanish really. We have about six days to catch as many of them as we can before they disappear. In the spring they do it again, only the other way around, you see." Reluctantly Trillian swallowed. It was either that or spit it out, and it did in fact taste quite good. "I see," she said, once she had reassured herself that she didn't seem to suffer any ill effects. "And why are they called Perfectly Normal Beasts?" "Well, I think because otherwise people might think it was a bit odd. I think Old Thrashburg called thm that. He says that they come from where they come from and they go to where they go to and that it's Bob's will and that's all there is to it." "Who--" "Just don't even ask."
So after many kamikazes one of us (I think it was Sam) was thinking about the fact that Liz's boyfriend at the time was named Robert and he said
She comes where she comes and she goes where she goes and that's Bob's will and that's all there is to it. She is a Perfectly Normal Liz.
What can I say, the nicknames stuck -- Perfectly Normal Liz when we were being formal, and Nomal Liz or PNL when we weren't. I think her sister disagreed with our assertion that Elizabeth was actually, in fact, normal. But we didn't really hang out with her sister so we ignored that. :-)
Then, to get back on topic, a few years ago, I was adding the normalization files that were already in the Whidbey builds to the Longhorn build, so that they would be there for users to try out in M6 (Shawn was in a class that week, I can't remember which one it was). Some folks on the base team recommended that since the file was being called by kernel32.dll that an 8.3 name would probably be better than using normalize.dll. Obviously NORMALIZ.DLL was the best answer to that concern, and after I was testing the setup change, waiting for a reboot, I thought back to my friend Perfectly Normal Liz and realized what a hilarious tribute it would make, even if it was only realized thirty minutes after the actual change was made (I often find that I am only clever in retrospect).
Liz was (by her own report) on the floor laughing when I finally had a chance to tell her about it, a little while after that PDC build of Longhorn that contained the file NORMALIZ.DLL in the system32 directory....
This post brought to you by "e" (U+0065, a.k.a. LATIN SMALL LETTER E)