Every character has a story #32: U+1e9e (CAPITAL SHARP S, Microsoft edition - Part 1)

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!

Every character has a story #32: U+1e9e (CAPITAL SHARP S, Microsoft edition - Part 1)

  • Comments 12

 

Previous blogs about this letter:

Now once again, keep in mind that for most of the German speaking world this still isn't a letter....

So Windows 7, which once again (just like in Vista) was made to be updated to the most recent version of Unicode it could, LATIN CAPITAL LETTER SHARP S needed to be integrated.

Let's take a look at in WordPad, using the Segoe UI font:

Ok, interesting. Obviously they couldn't make it much taller. So they made it a little wider and left it at that.

I think that is probably what Unicode did too.

Makes me wonder what happened in Character Map. Let's take a look:

 

Hmm... Undefined? Oh I guess someone forgot to regenerate the list of names that Charmap uses. Luckily it can still display characters even when it doesn't know what they are.

Any testers want to put that bug in? :-)

You know I kind of wonder what they did for fonts that can't change the width.

Let's take a look at Consolas:

It's not there.

Oh damn, let's look at some other fonts, too.

Like Tahoma:

and Microsoft Sans Serif:

and the fixed width font that is in the font link chain, Courier New:

 

 And I am sincerely curious what the upper and lower case look like next to each other on that one. Let's take a look:

Interesting!

And it does meet the fixed width rules -- notices how the surrounding text lines up?

Though it makes me wonder what might have changed from the old font's lowercase character. Just a little bit curious....

Okay, so Courier New has it, yet Consolas does not.

Uh oh -- is this a C* font thing?

Let's look at Calibri, the default font in WordPad:

Crap.

Notice how RichEdit doesn't seem to be looking very hard for the substitute. Thank goodness Word is not this lazy!

How about everyone's favorite uber-font, Arial Unicode MS?

Double crap.

Or maybe we'll get another 20 or 30 people who will agree with me that Arial Unicode MS effectively [bites|sucks|blows].

Silver lining of a sort....

On last font I want to check out though.

Times New Roman:

  

Wow, I think I like this one best -- this is on I can really tell the difference on. Much more than the others. Truly.

Okay, let's move on, there is kind of a pattern and kind of a logic here. I'm happy. Well, as happy as I can be about a letter that doesn't really exist in the first place....

But just wait until tomorrow when I do part 2 of this blog. :-)

Comment on the blather
Leave a Comment
  • Please add 7 and 2 and type the answer here:
  • Post
Blog - Comment List
  • If you have a look at the string resources in getuname.dll where the character map gets the names from you'll notice that many if not all character names that where added with Unicode 5.1 are missing. For example the range for the Sundanese Script starting at 0x1B80. The other scripts are:

    • Lepcha 0x1C00
    • Ol Chiki 0x1C50
    • Cyrillic Extended-A 0x2DE0
    • Vai 0xA500
    • Cyrillic Extended-B 0xA640
    • Saurashtra 0xA880
    • Kayah Li 0xA900
    • Rejang 0xA930
    • Cham 0xAA00
    • Ancient Symbols 0x10190
    • Phaistos Disc 0x101D0
    • Lycian 0x10280
    • Carian 0x102A0
    • Lydian 0x10920
    • Mahjong Tiles 0x1F000
    • Domino Tiles 0x1F030

    But what's more important is that the collation algorithms seem to process "ẞ" right. At least in explorer with filenames.

    Regards,

    Peter

  • Yep, the name thing was my point. :-)

    I'll be jumping into the other issues tomorrow....

  • I drew the one used in the Unicode chart in close consultation with Andreas Stötzner. There are fairly useful specifications out there about how to construct the character using bits and pieces of other characters in the font to get the right proportions.

  • Really stupid question here, but I can't seem to find that character in charmap. What did you do to be able to select it? The "Go to Unicode" function does not appear to be able to find it either. If I search for "sharp", it finds the LATIN SMALL LETTER SHARP S ok, but not the capital one.

  • The uppercase eszett didn't make it into the recent extensions to Calibri, Cambria and Consolas. It wasn't included in Unicode when work on those extensions was spec'd.

  • Gwyn -- new for Windows 7, it is....

  • Someone at Ubuntu should get a memo on how that is supposed to work.

    The conversion tables in Ubuntu version (8.10) map lower case sharp s to upper case sharp s (all locales, including German).

  • Ok cool, thanks I'm not going mad then :) Carry on

  • Mihai -- I think you wanted the part 2 blog here.

    I am jealous of Ubuntu -- they did the thing I wish Windows had. It is the better behavior in my opinion....

  • > I am jealous of Ubuntu -- they did the thing I wish Windows had.

    > It is the better behavior in my opinion....

    It feels a bit tricky.

    I think the mapping should be to "SS", at least for the German locale.

    But Ubuntu is limited by the design of the POSIX API, which does the case conversion in place.

    I would really like a mapping to "SS" in public case conversion API, the way ICU (and Mac OS) do. That is what the (German) users expect.

  • Microsoft does simple casing here too.

    I am willing to bet that within five years they will want simple (1 to 1) mappings to use the Capital Sharp S. What we should have done and what Ubuntu apparently does....

  • "I am willing to bet that within five years they will want simple (1 to 1) mappings to use the Capital Sharp S. What we should have done and what Ubuntu apparently does...."

    Yes, it might make sense because the API is crippled (like the POSIX one). But as an API consumer I would want what my client wants: proper linguistic behavior.

    So I would want a non-simple casing API, mapping to SS, like ICU.

Page 1 of 1 (12 items)