Is RtlCompareUnicodeString used correctly?

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!

Is RtlCompareUnicodeString used correctly?

  • Comments 13

I'm not sure how many of you remember when I posted Hungarian is even more complicated than I thought and More on the fabled EqualString.

Not because I don't have stats or anything, but because there is no way to gauge how many of you are new readers and how many of you really have nothing better to do that read what I am posting here. :-)

Anyway, I am going to talk about RtlEqualUnicodeString and RtlCompareUnicodeString, the functions in ntdll.dll that do binary comparisons that can be cae insensitive, again.

I found out something really interesting about them the other day.

Now it is obvious how RtlEqualUnicodeString might be used -- I mean, if you have two strings and you need to know in a binary sense whether they are equal (possibly ignoring case) then it can be very handy. Because no matter how un-natural the comparison seems to humans, the fact is that lots of Windows loves it.

Of course the actual usage of RtlCompareUnicodeString is a bit less clear -- I mean, the order has no meaning to humans. So a function that uses it to order two strings seems like a ripe source for incorrect usage.

Don't worry, it turns out that nobody is using that order inappropriately.

In just about every case, the return value of the function is tested to see whether it was equal to or not equal to zero.

Yes, that is right -- almost everyone who uses it is essentially duplicating the functionality of RtlEqualUnicodeString.

When you get down to it, one has to wonder how much more expensive is operation A than operation B:

A - compare two strings, one WCHAR at a time, return the difference if there is one as soon as you find it, then compare that number to zero to see if there is in fact a difference.

B - compare two strings, one WCHAR at a time, return TRUE or FALSE as soon as you know whether they are in fact equal.

Remembering for a moment that a difference that makes no difference, makes no difference -- do you think it makes a significant difference?

Hopefully not. Though it worries me that no one seems to be doing anything beyond what RtlEqualUnicodeString would provide. So why take a hit at all?

I resisted the temptation to just go and fix all of the occurrences (it is a 100% safe change but even so, I hate when people do it to code I own).

I also resisted the temptation to send out a bunch of mail to all of the owners to tell them to change their code (I hate when people do that to me, too).

Now that I read this post again, it occurs to me that this will probably not actually be very interesting to people. It just seemed weird to me.

Though if you own one of those calls to RtlCompareUnicodeString, then feel free to change it; at worst it will just be more self-documenting as to the intention, and at best (if the code is called many times in a tight loop) it could even help performance!

 

This post brought to you by "P" (U+0050, a.k.a. LATIN CAPITAL LETTER P)

Comment on the blather
Leave a Comment
  • Please add 6 and 4 and type the answer here:
  • Post
Blog - Comment List
  • It is very useful
  • The links suffer from the usual MSDN URL instability, do they work for you Michael?

    RtlCompareUnicodeString ought to be useful except that it is poorly described, which tends to make one think that the people who wrote it didn't know what it was for either...

    Anyway, an optimist would expect it to be useful where strings (e.g. filenames) need an order, any unique order. This allows you to binary search, build various types of tree structure etc. Obviously doing such things with a locale-sensitive function would be silly (Microsoft has done it before, but it's still silly).

    The UTF-16 WCHAR algorithm described in this article would probably work for such a purpose, but if that's what was used it should be clearly indicated because UTF-16 isn't code point order, unlike UTF-8 or UTF-32. So the result is even /more/ unintuitive.
  • They do work for me, maybe something was down?

    Well, it is as useful as the *_BIN collations in SQL Server (which actually have the same order!).
  • Yeah, I would have imagined that RtlCompareUnicodeString would be useful for building sorted lists or trees needed for searching, but not necessarily by humans.

    It does seem a little silly to use it instead of RtlEqualUnicodeString, but surely the only difference would be the return statement (i.e. just return the difference between the two inequal WCHARs, instead of a constant TRUE/FALSE) so it's probably at most 1 extra seembly instruction...
  • Just what I wanted to write: comparing the strings and returning an order is needed for skip lists and other lists that need a way to find a key. This function is well needed.

    So my question would be: what's better with RtlEqualUnicodeString()? All us C/C++ programmers will mix TRUE with "==0" is the APIs are mixed... Who has not yet written ("if (strcmp())") at east once and meant that the following code should be executed when the strings are equal? ;-)

    The execution speed should be the same, testing for a Zero or non-Zero-result is no difference.

    My conclusion: both are needed, but adding the "Equal" function helps for clarity of source code, in places where only the "equalness" counts.

    Christian
  • I agree both are potentially needed (each for different sitautions), it is just bothersome to me that most of the usage of one ought to be the other, that's all. :-)
  • Ouch... for a while, I figured Rtl stands for Right-to-left!
    I hope I don't need some coffee because I hate coffee :-D
  • I was having a converation with Shawn earlier and had something of an epiphany that I thought I'd share.......
  • Remember how I have talked in the past about the difference between two different purposes for collation...
  • (no, this post is not about a rap or hip hop song, or its lyrics, though I admit the title may have been

  • (Negative assessment word ( blows ) chosen via a magic eight ball and the info in this post ) Benski

  • The title quote comes from a Friends episode with Lisa Kudrow ending the mini conversation by giving

  • "Ouch... for a while, I figured Rtl stands for Right-to-left! "

    Nope, it stands for "Run-Time Library". Do you know about the NT Native APIs in NTDLL.DLL? Rtl* APIs were utility functions that in user mode did not call directly to kernel mode. Nt* and Zw* APIs were in user mode stubs to do a system call to kernel mode which implements the API. In kernel mode, the Win32 API do not exist, and so the NT Native APIs are always used, and in kernel mode there is a difference between Zw* and Nt*. These APIs are mostly undocumented, the third-party that created Interix (a replacement for the POSIX subsystem) had to obtain NT source code to do so.

Page 1 of 1 (13 items)