Every Problem Looks Like A Nail

Every Problem Looks Like A Nail

Rate This
  • Comments 14

I wish all the questions I got were this straightforward:

“I need to compare two strings for non-culture-sensitive equality. I notice that there are methods String.Equals and String.Compare which can both do that. What is the guideline on which one I should use?”

I’ll answer your question, but first, a funny story (*).

Roofing HammerMy buddy Steve and I were fixing the flashing that is leaking around the seal between my roof and my chimney the other day. “Steve,” I said, “hand me that screwdriver.” Steve handed me his roofing hammer. “No, dude, the screwdriver. Thin circular shaft, handle at one end, Robertson bit on the other.”

“Dude, if you wanted the screw remover you should have said so.”

Ha ha ha ha ha, I crack myself up. That joke never gets old.

Anyway, I seem to have digressed slightly from the topic of today’s blog.

The documentation in MSDN clearly states that the purpose of String.Equals is for comparing strings for equality, and the purpose of String.Compare is for sorting a list of strings into a culture-sensitive alphabetical order.

You can use a hammer to drive screws, and in a world without screwdrivers, you might have to. But in a world with screwdrivers, why would you? You would use the tool that was designed for the task.

You can use String.Compare to test for equality if you want, but why would you? You have a method that is designed to do exactly and specifically what you want to do, so use it.

Unless there is a really compelling reason to do otherwise, always use the tool that was specifically designed for your problem, if there is one. Don’t use some other tool designed for a different task that just happens to work; that’s a recipe for bugs.

*******************

(*) Not a true story, except for the part about my roof leaking at the flashing. I need to fix that.

  • The main reason that we use string.Compare() to do string comparison is simple: 99% of the time we want to do a case INSENSITIVE comparison. String.Equals() is case-sensitive.

    (Ok I made up the 99% number, but you get my point).

  • No, string.Equals has several comparison options:

    string.Equals(var1, var2, StringComparison.InvariantCultureIgnoreCase)

  • Yes. Equal has InvariantCultureIgnoreCase option. I have been using string.Compare. Any reasons why it is better to use Equal ??

  • I think part of the problem is there in the description of String.Equals: "for comparing strings for equality", it's got that pesky word "comparison" in it.  And then you have Stuart's example of string.Equals(var1, var2, StringComparison.InvariantCultureIgnoreCase), where the 3rd parameter has Comparison in it; again we're mixing the language semantics of compare with equals.  Perhaps if they had named it StringEquality.InvariantCultureIgnoreCase and the documentation said "for checking strings for equality".

  • That is so true! Like using FormsAuthentication.HashPasswordForStoringInConfigFile() to calculate MD5 values :) (http://blog.veggerby.dk/2008/07/06/abuse-of-formsauthenticationhashpasswordforstoringinconfigfile-method/)

  • This reminds me...  Every now and then, we get a question along these lines on microsoft.public.dotnet.languages.csharp:

    "I want to use LINQ in my program; I'm trying to replace my foreach loop with Select [sometimes Where], but it's not working for me. Please help me make it work!"

    What they're trying to do is something like:

     xs.Select(x => Console.WriteLine(x));

    Which obviously fails to compile. It's worse when the method that is called for its side effect actually returns the value, and so the code does compile... and then deferred evaluation kicks in :)

    At those times, I think that whoever designed syntactic sugar for LINQ queries had a very good reason to use "from" as a keyword, rather than "for" that had traditionally being used in that context elsewhere (e.g. XQuery): "for" is easy to confuse with a traditional side-effect loop, while "from" emphasizes the pure and declarative nature of the query. I wonder if that was deliberate.

  • There is a difference. In all of my experience there is no difference in English, but you will get differences in other languages. http://blogs.msdn.com/bclteam/archive/2007/05/31/string-compare-string-equals-josh-free.aspx.

  • There are overloads for string.Equals which can make it work like string.Compare, and there are overloads for string.Compare that make it work like string.Equals.

    The REAL difference between the two (apart from the "default" behaviour) is the return value, and in my opinion at least, THAT is the deciding factor when trying to choose which to use. If all you want is a true/false equal/not-equal, use string.Equals. If you want a "before"/equal/"after" result, use string.Compare.

    And I personally ALWAYS use the overload of each that takes a StringComparison. That makes it clear that I've at least spent a couple of seconds thinking about what kind of comparison I want :-)

  • Actually the return values (including out/ref parameters) define the method purpose. The method name only describes or names it. Virtually the same logic holds for all names in the program. So we may open a discussion about good naming conventions :)

    In my opinion the most important thing is thinking this way: I need a method that does ... -> method signature -> name. It also well correlates with Hungarian notation principles.

  • In C# (and .NET for that matter), strings are immutable objects.

    So that being said, the comparisons for String.Equal(A,B) is basically "Is REF of A equal to REF of B?" versus String.Compare doing "Is Char at A[I] equal to Char at B[I], for all Chars in A and B?" with a return value indicating which is different, or 0 if they are the same?

    So, particularly for long strings, Equal would be O(1) because it's a value against a value, while Compare is O(n) for a match as it has to match every character, including the possibility of handling cases such as case insensitivity or differing cultural settings for writing, right?

    Your analysis is not quite correct.

    Yes, strings can be compared by reference for equality, and that gives you a fast "early out" if they are reference equal. But it does not give you the fast early out if they are NOT reference equal. If they are not reference equal then you cannot just assume that they are not equal! You could have two strings that are references to different memory but the memory contains the same characters.

    A system in which two strings which are content-equal are guaranteed to also be reference equal is said to "intern the strings". In .NET, string literals within a given module are guaranteed to be interned. If the literal "abc" appears twice in your program, both strings at runtime will be reference-equal. But if you magic up a new string by appending a to b to c at runtime then the resulting string is not interned. It is a new string with the same content, but a different reference. So comparing them is still O(n) in the length.

    We don't intern all strings by default for performance reasons. Interning makes every allocation expensive in return for making some comparisons cheap, and that's not a good tradeoff. It is better to make every allocation cheap and some comparisons expensive. If you need to make the comparisons cheap you can force the string to be interned when you allocate it. If you require a particular string to be interned, the String.Intern method searches the intern pool for the string, adds it to the pool if its not there, and returns the ref from the pool.

    Contrariwise, if you need a string to be NOT interned for some strange reason then you can set the CompilationRelaxations attribute to "don't intern literals in this assembly". -- Eric

  • Computer scientist are interesting to observe over time. It's like we have to go through a complex evolution to get back to basics.

    "99% of the time we want to do a case INSENSITIVE comparison"

    YES. This is true. This is so true and yet it is a truth which has been blantantly ignored for years with all sorts of pretentious rationalization behind it. No matter what computer geeks tell you case-sensitivity is desirable in only the smallest minority of situations. Human beings RARELY work that way and because your software is used by human beings it will rarely work that way. There are a handful of ordinary cases where people need to be case sensitive e.g. passwords should be case insensitive for security reasons whereas user names never should be for usability reasons. email addresses, urls, search queries, product numbers, license keys, intellisense, on and on.

    I can't tell you how bothersome it is to work in SQL Server with a case-sensitive language collation turned on.

    Everyone thinks it's really great because you have to be really geeky to remember it and it's very precise and cold and may be a performance optimization because binary equality is SOOOO much faster to check but it's not cute or smart or "right" it's just silly and it's the kool-aid we've been drinking for years. (in another flame war starting lists off with an index of 0 isn't any smarter, it's an implementational detail leaking through the abstraction that makes certain kinds of math easy but is otherwise counter-intuitive - I'm not complaining about it - just making the statement that it's not really all that great no matter how smart it makes you feel that you "get it")

    This is just common sense (and the lack of it in computer interfaces). Why if we want to do case-insensitive comparisons 95+% of the time is THAT the behavior we have to opt into? Why not default to the thing I do the most and let me opt into the thing I only occassionally do - that is to say only make me special case cases that are ACTUALLY special.

    In summary:

    Option Compare Text 'A really good feature that makes = mean what you want it to mean.

  • This is similar to the "dude lets use a hash table" phenomonon, when a generic Dictionary is usually a much more suitable approach.

  • The problem with "Option Compare Text" is that it also enables locale-specific comparisons by default. This is bad for a very simple reason: a lot of people don't understand how vastly different things can be on a locale different from the one they use (this particularly applies to Americans and native English speakers in general - no offense intended, it's just my personal observation). The result is that they don't understand why the code that ran perfectly fine on en-US locale exhibits weird behavior when run on, say, ru-RU, much less something more exotic than that. What's worse is that often people can't even comprehend what could possibly go wrong. I call it the "what do you mean, one 'char' is not the same as one character?" syndrome.

    On the other hand, defaulting to ordinal comparisons means that you do not always get it right for a specific culture, but you always get same, consistent behavior. For UI, not getting it right according to locale can be annoying; but if you happen to compare some internal strings with locale-specific operations (say, while parsing XML file), the difference between two locales may be the difference between parsing it successfully, and failing to do so.

  • The difference between Equal and Compare is comparable (pun!) to the difference between == and the <, >, <= and >= operators.

    Equal can be faster than Compare for simple equality tests, but cannot be used to sort strings.

Page 1 of 1 (14 items)