We do get our fair share of silly questions here in NLS.

I should perhaps explain what I mean by silly. :-)

I don't think I'd ever consider a question where somebody is asking about language and how it might work in a certain situation and call that silly. I mean, that's how people learn. It's the kinds of questions that I ask of native speakers and of linguists, and even if they smile or laugh I never get the sense that they are thinking me silly for the question.

But today, somebody who is thinking about 64-bit Windows and who assumed that one day strings that are greater than 2 GB would be common looked at our signature for CompareString:

int CompareString(
    LCID Locale,
    DWORD dwCmpFlags,
    LPCTSTR lpString1,
    int cchCount1,
    LPCTSTR lpString2,
    int cchCount2
);

and suggested that perhaps those int parameters containing the string lengths ought to be size_t instead.

Now I would like to forget about the argument that this is a public API that is been around since NT 3.1. It's obviously important here, and makes a suggestion a little bit silly, but not everyone really pays attention to what's in NLS API or how long it's been there.

I'd also like to forget about the argument that 2 GB strings are uncommon, because one day they may not be. Especially in the 64-bit world. There may be a perfectly valid reason to have huge strings.

The real problem I have here, and what makes the question in silly to me, is the notion that you need to do linguistic comparisons on strings that are greater than 2 GB in size.

There is simply no way to justify this is a reasonable use of the collation functionality in NLS API.

Perhaps some of you may disagree with this notion, and I'll be curious how people respond to this post. If you are somebody disagrees, please be sure to include information about your "reasonable example" so that people have a chance to appropriately judge the judgment being used. :-)

 

This post brought to you by "ยง" (U+00A7, a.k.a. SECTION SIGN)