Which comes first, 'a' or 'A' ?

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!

Which comes first, 'a' or 'A' ?

  • Comments 15

A wise man (well, I think it was the comedian Emo Phillips, does he count?) once spoke the following little fable:

I had an argument with my father. I argued that Plato was the father of philosophy. My dad of course took the opposite position, that I should wax the kitchen floor.

I said: "Well, the kitchen floor doesn't exist! At least not in the permanent sense that the concept 'floor' does."

He said: "Do you think the concept 'your skull' exists?"

I said: 'Yes'. And then he surprised me by juxtaposing the two concepts.

Someone was trying to tell me about it the other day but I made it clear I had already heard it (my sources of knowledge are numerous but perhaps not impressive).

Later on, I decided I would juxtapose some things in a blog post. :-)

Here goes....

The concept of alphabetic case is interesting. And so is the concept of linguistic collation. So let's juxtapose those two concepts for a moment.

Which comes first -- uppercase or lowercase?

Well, in a binary sort, the answer is simple -- uppercase comes first. Every time. It is how code points are encoded in Unicode. Period.

In a dictionary, the uppercase also often does come first (or they are put together as multiple definitions in one entry).

In linguistic collations on Windows, in most locales1, lowercase by convention comes first.

Like I said in the post Why do the high surrogates have the low numbers?, however, it is simply a conceptual construct.

When you deal with collation in terms of weights, it is easy to take the uppercase letters as being somehow heavier since they are usually (bordering on always) bigger and taller.

I have had people tell me that they think this is incorrect; they believe that it should always be the other way around. But for the most part that is simply rebelling against the construct we are using, and preferring a different one.

So, those of you out there who think uppercase should be sorted before lowercase, what is the conceptual construct you are using?

Just curious....

 

1 - Bonus points for anyone who knows which collation(s) under Windows break this rule without testing them first!

 

This post brought to you by "" (U+1e4f, a.k.a. LATIN SMALL LETTER  O WITH TILDE AND DIARESIS)

Comment on the blather
Leave a Comment
  • Please add 1 and 5 and type the answer here:
  • Post
Blog - Comment List
  • On the main blog page, the topic title is in all caps. Makes this particular post's title very strange.

    Vorn
  • I think you meant that uppercase letters are usually bigger and taller, not lowercase ones.

    I personally think uppercase should come first but that may be influenced by the years of using computers. Still, uppercase usually denotes two things - an abbreviation or a name. Should those be sorted before the lowercase variations, which would usually denote a generic term (Windows versus windows)? That'd be a matter of personal preference I guess.
  • If "uppercase comes first" is a rule in Unicode, then it is not without exceptions. ÿ comes before Ÿ in a comparison based on code points.
  • People preferring <b>AaBb</b> will argue that uppercase letters indicate <em>proper names</em>, and named things and people should have priority to generic words. It started from Adam, they say, and I bet there's a lot of women among them.
    <br>
    <b>aAbB</b> fiends, on the other hand, may object that everything grows from small to big. It seems illogical if a tall letter predecese its lower counterpart in <em>ascending</em> sequence. When I was a kid, I was lower and thin, only much later I ascended to my six feet uppercase.
  • Old printers had only upper case letters. To keep compatibility they just added lower case letter to the already existing set.
    So, from this point of view, 'A' comes first.
  • The whole idea of sorting (at least for latin-based scripts) is just convention anyway... I mean, you may well ask "why should 'A' come before 'a'?" but then why not ask "why should 'a' come before 'z'?" there doesn't seem to be any actual reasoning behind the 'sort order' of our alphabet at all!

    Unless, I'm missing something here...
  • Well, that is mostly true, Dean. Though individual languages often and individual purposes often do have strong preferences for ordering that have to be respected. Even though it will at some level be arbitrary....
  • Note that someone might prefer AaBb for historical reasons -- latin was originally written using majuscule only. (The official current (it differs from its previous version, where lowercase were sorted before uppercase) Czech standard now explicitly does not distinguish between upper- and lowercase BTW.)
  • OT: Is half of the readers of this blog Czech? Or is it that we just like to comment on things?
  • Hmmmm.... not sure. I might be really big in the Czech Republic after my visit there a few years ago!
  • Another thing that is not clear here: are both upper and lowercase A sorted before either case of B? An upper/lowecase letter weighs less than its counterpart but not then the next letter in the alphabet or is it whether all upper/lowercase weigh less than all the other cased characters?
  • Yes, they both come before B.

    'Ignore Case' here means literally ignore the case difference, do not consider there to be any difference.
  • Take any dictionary. Look at the page that lists the alphabet. In Russia, most sort the alphabet first by letter, then by case, with upper coming before lower.
  • I have decided that I need to channel the spirit of the father of comedian of Emo Phillips and juxtapose

  • Yesterday, I was blathering about how In SQL Server, the distance between A and Z is wider than you might

Page 1 of 1 (15 items)