The disunification of Norwegian and Danish sorting

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!

The disunification of Norwegian and Danish sorting

  • Comments 9

A few days ago, I wrote a post entitled Why do we call w 'double u' -- doesn't it look more like a 'double v' ?

The post itself had nothing to do with the title; it was actually about the Swedish Academy changes to consider 'w' and 'v' to be separate letters, as well as some of the potential consequences for Microsoft products if such a change is reflected in actual customer usage that needs to be captured in the future.

(the title was actually a little experiment to see whether a catchy title would get more attention in comments than the substance of the post; you can look for yourself what was proven in that experiment.)

Anyway, in the process of my speculation on products and what they might do in the future, I forgot that Vista already has such a change, in a country that is quite near to Sweden and Finland!

It has to do with the use of "aa" in Danish and Norwegian.

Both languages have the same basic alphabet -- the 26 latin letters used in English and most places, plus æ, ø, and å. Though in earlier days the letters aa were used instead of å (in fact in Danish this letter was not added until a spelling reform that happened in the late 1940s -- it was already widely used in both Norwegian and Swedish).

Now the Danish and Norwegian sort in Microsoft products was using the special sorting of the aa as a unique letter after z that was basically equal to å.

But while this was a (relatively) recent addition for Danish, it has been in Norwegian for much longer. And feedback had come in from customers such as the following:

If you take a Norwegian dictionary you would in fact find the German town Aachen as one of the first entries under the letter A, but Explorer will put it at the end after the letter Z.

As I pointed out earlier Aa is never interpreted as Å for anything but family names dating pre-1917, and even then it is not uncommon in Norway to sort those names as double A. A person with the name Aalberg might frown to see his name listed at the top, but would be far from surprised. However a person searching for the town Aachen would not understand why Explorer put it at the end.

Feedback such as this led to some more investigation and the final assesment was made that the time had come to remove the aa entries from the compression tables for Norwegian.

"But Michael," you might be asking, "What about Danish?"

Well, the answer there is that it is still too common to expect it to be treated as a letter -- and there are way too many textbooks and websites that put an extra (aa) at the end of the alphabet. Danish could not make the same change that Norwegian needed.

So, in Vista, the Norwegian tables have had the three variants of the aa compression removed, while the Danish (and also the Greenlandic, which uses the same sort as Denmark) have not. Therefore, these formerly unified sorts will now been disunified.

That theoretical question I posed the other day has become decidely non-theoretical! :-)

You may be filled with one or more of the following questions:

  1. What does this mean for prior versions of Windows and the .NET Framework?
  2. What does this mean for Microsoft Access (7.0 - 11.0) and their Norwegian/Danish sorts?
  3. What does this mean for DAO 3.5/3.6 and their dbLangNorwDan/dbSortNorwdan enumeration members?
  4. What does it mean for the Norwegian/Danish sort in SQL Server 7.0?
  5. What does it mean for the DANISH_NORWEGIAN_* collations in SQL Server 2000 and 2005?
  6. What does this mean for Windows Vista?
  7. What does it mean for future versions of Access, Jet, and SQL Server?
  8. What does it mean for WinFS?
  9. What does it mean for future versions of the .NET Framework?

The answer for questions 1-5 is simple -- not a bleeding thing. We can't change those prior version results.

The answer for question 6 is also simple -- it is going to be changed. The two sorts with the two different LCIDs (0x0406 versus 0x0414/0x0814) and different names (da-DK versus nb-NO/nn-NO) will return two different results.

The answer for questions 7-9 is also simple (though that may change at some point!) -- I do not have a freaking clue. But you can bet your lunch money that I will be asking people some questions about this issue for WinFS and for the next version of SQL Server and for the upcoming version of Access that ships with Office 12.

 

This post brought to you by "ø" (U+00f8, a.k.a. LATIN SMALL LETTER O WITH STROKE)

Comment on the blather
Leave a Comment
  • Please add 8 and 5 and type the answer here:
  • Post
Blog - Comment List
  • So, I have been reading your blog off and on for a while. This post did prompt one additional question for me though. Why was it not sponsered by the letter å? :^)
  • Well, å has already been getting a lot of ad time in other posts, so I figured ø deserved a little time, too. :-)

  • Editorial note: there will be a certain type of Drunkard's Walk feel to this post, but that is because...
  • You may recall last week when I mentioned that In any CASE, it is somewhat INSENSITIVE to point out to

  • When I posted part one of this two part series, I should have guessed that Dean Harding would have a

  • Previous posts in this series: Part 0: The empty string sorts the same in every language Part 1: The

  • Regular reader Jan Kučera asks over in the Suggestion Box: Hello, Yet another possible suggestion from

  • The story behind today's blog post started in Why do we call w 'double u' -- doesn't it look more like

  • By necessity, my blogs are often about something on the micro scale -- one customer report, one phenomenon

Page 1 of 1 (9 items)