Optimized for English (oh, and also Japanese, and maybe a few others)

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!

Optimized for English (oh, and also Japanese, and maybe a few others)

  • Comments 26

The recent post about Are ligatures supposed to be thought of as 'single characters'? had a comment from RubenP that I thought could use some further conversation:

It must be said, but all the ClearType fonts with automatic fi ligatures look exceptionally bad for the sequence 'fij'; if you remember, the ij is quite frequent in Dutch, so that's a little troublesome.  (To me at least ;-)

But then again, the few fonts that contain a combining acute accent, hardly ever actually combine it with the j, and if they do, the accent is markably different from the accent on the (pre composed) i. Adding acutes to ij is actually something you'd want in Dutch (the acute is an emphasis mark and ij is a vowel; well a diphtong actually). But because of the very poor support for this kind of thing, even the official rule has become i acute + j, rather than i acute + j acute.

Oh, and how does one stop these ligatures from happening? For example, in Turkish? IIRC the fi ligature is a big no-no in Turkish typography, because you cannot distinguish it from f + dotless i.
With such silly things, I guess non-American digital typography still has a long way to go...

It is a fair point. What is often hinted at (like in Bill Hills's first post on fontblog) is that the two languages that got the most research and attention when it comes to ClearType and the many ClearType fonts are English and Japanese. And there id no shortcut to skip that research step....

It becomes obvious, when one considers the needs of languages like Dutch and Turkish such as those that RubenP pointed out, that not all of the Western Latin script languages were truly having their individual needs considered when the development of some of the so-called "C* fonts" took place.

The needs here are inded sometimes script-specific but more often language-specific. And it is way too easily (when adding features that might be thought to look good for one language) to unintentionally screw over another language. Not to screw it over too much, mind you. Just to screw it over about the usual amount, if you know what I mean.

It's not like you can change these defaults later -- imagine what it would do to page flow and formatting in documents if such a global change were made -- a backcompat nightmare, to say the least!

Perhaps, in retrospect, a more generic approach to these kinds of issues like the fi ligature could have been done in the C* fonts. After all, this is a lesson we already learned in Microsoft Sans Serif and Tahoma. But typeface design at its best is a much more organic process than trying to imitate another font.  So in the end if a particular feature is on by default in a font and that feature is not so good for your language, then perhaps using logic to come to the conclusion that this is not the best font for the language in question is in order? :-)

So while it is true that many people are excited about the optional language features in OpenType and the exciting readability of ClearType, I find myself much more excited about the next ten years -- when the work that has happened here can be further tuned to cater to the needs of even more languages than the ones for which ClearType is optimized now. And when the ability to work with optional OpenType features is available in products like Microsoft Word and Publisher. When the promises devlopered upon in technologies in Vista and Office 2007 are extended to cover so much more of the world....

In the meantime, my Visual Studio font is either Consolas or Courier New, depending on how much "Terminal Services to XP" work I have to do (since "ClearType over TS to an XP box" is not really quite there just yet!).

Makes for an exciting future, in any case. :-)

 

This post brought to you by and ij (U+fb01 and U+0133, a.k.a. LATIN SMALL LIGATURE FI and LATIN SMALL LIGATURE IJ)

Comment on the blather
Leave a Comment
  • Please add 5 and 1 and type the answer here:
  • Post
Blog - Comment List
  • "It's not like you can change these defaults later -- imagine what it would do to page flow and formatting in documents if such a global change were made -- a backcompat nightmare, to say the least!"

    Wha...?

    I'm imagining what it would do to page flow and formatting in documents, and I don't see what the problem is. Some words get put on different lines, and some paragraphs become one line shorter or longer. So some table columns get a tiny bit wider and some get a tiny bit thinner. Very occasionally a paragraph may spill onto an extra page, or unspill from an extra page just before a page breaking element (new section header) causing all the following elements to be a page (or even two if the section element is e.g. always on an odd page) different from last time the document was viewed.

    So what? Are you saying that our document processors can't handle that?

    Last I heard, Word does that between different versions anyway, and can also do it in the same version depending on the printer/paper settings of the computer it's loaded on.

    Heck, if my document processor gets a better layout engine, I *want* it to rearrange the document so its better than it was before. That's what it's supposed to do! That's the point. It's *supposed* to be automating these things.

    Isn't it?
  • Hi Adam,

    Actually, even one pixel differences can cause that sort of problem -- and if documents suddenly become hundreds of pages longer, the universe breaks as far as Word is concerned. They look at regressions here *very* carefully and treat them as huge bugs.
  • "if documents suddenly become hundreds of pages longer, the universe breaks as far as Word is concerned."

    *boggle* Why? What can't Word handle about this? What's wrong with it? Can it be fixed?

    "They look at regressions here *very* carefully and treat them as huge bugs."

    How is it a "regression" if the new rendering is *better* than the old? That's like saying that all the CSS fixes in IE7 are "regressions" from IE6. "Different" does not necessarily imply "going backwards".

    Say, with the new processor speeds available, you figured out a way to add TeX's paragraph-at-a-time linebreaking/hypenation rules to Word in realtime - something that improved the "color" of the document and could reasonably be objectively regarded by professional typesetters as "better". That could make large documents "hundreds of pages longer", but wouldn't be a regression.

    So how is fixing this sort of thing up a regression?
  • Hi Adam,

    Word "handles" it by repaginating.

    But if upgrading to a new version of Word means hasving to check and fix every document, then people don't upgrade. So the "fix" here is a break --- if the fonts change.

    So the fonts, for the sake of back compat, don't break old documents.
  • Sorry, having another thick day. What's wrong with repaginating? Why would you have to "check and fix" all your documents? How does this break a document?
  • Because documents can have lots of things that interact -- like figure labels on the same pages as the figures, pages that do not have just single lines on them, descriptions that are close enough to what they are describing, tables that line up, and so on.

    Not everyone cares, but lots of people do. So it is a big deal to keep things working here.
  • People do orphan control, "keep with previous/next", allowing breaks within tables, etc... *by hand* on 100+ page documents?!?

    Wow! People are /weird/.
  • Most people do not understand how to use some of those advanced features in Word and many do not even know they exist.

    I won't say people are weird, but I will say that people who do all that work don't want an upgrade to break them....
  • Well, I've known about the orphan control and such, but somehow Word still never seems to be willing to do my bidding. :-(

    Plus, you do need to apply such tools dilligently, because adding a line halfway through your document can really screw up the partitioning of the rest of the document *because* of orphan control and "keep with next paragraph". Technically speaking, removing a word in TeX could potentially cause a longer pragraph than you started out with. (It's easier to get Word to behave on a 80+ page document, mind you. In other words: *very* hard.)

    Still, for some reason, TeX seems a lot more robust in this sense (and others). Now only if someone could coerse TeX into properly supporting multiple columns and floating figures with text wrapping around them... OpenType in TeX is already well underway (XeTeX, IIRC), and there you actually do get control over each and every feature (such as complex scripts, ligatures, or explicitly *no* ligatures).

    It boggles the mind when an ancient system like TeX can be extended to support OpenType even though it started out as a 7-bit system, but the only significant upgrade Word has gotten since Word 95 seems to be a spiffy new GUI. I know from good sources, that typographers were (again) not pleased with the 'new' Word.

    Still, Word c.s. beat the heck out of the ancient TeX when you're writing a simple document, and the ease of use argument still holds. Just don't expect pretty typography: it's a word processor, not a typesetter.
  • Ruben > "Plus, you do need to apply such tools dilligently, because adding a line halfway through your document can really screw up the partitioning of the rest of the document *because* of orphan control and "keep with next paragraph". Technically speaking, removing a word in TeX could potentially cause a longer pragraph than you started out with."

    I still fail to see *why* having a longer paragraph just because you've removed a word is inherently a problem - especially if orphan control, "keep with next", etc... are doing their job properly.

    How do things "screw up" exactly? How are you differentiating between a "screw up" and "the right thing" for the new text?
  • Adam: Imagine you have a 1,000 page document. If page 1 suddenly becomes one line longer, it will cause page 2 to reflow, and page 3 and so on. After a few hundred pages, you might be moving stuff down dozens of lines on each page - it's a snowball effect.

    If you add figures and tables into the equation, you might find that the figure that used to be on page 547 is now on 548, but the description text for it is mixed in with the text on page 546 and 547 still. That means you'll have to go in and re-write that section so that the description of the figure is closer to the figure itself.

    Now repeat that for all of the figures you have, all the tables you have, and the 1,000 pages and you can see how a one-line difference on page 1 is such a huge deal.
  • "Imagine you have a 1,000 page document. If page 1 suddenly becomes one line longer, it will cause page 2 to reflow, and page 3 and so on. After a few hundred pages, you might be moving stuff down dozens of lines on each page - it's a snowball effect."

    Um, no. *I* won't be moving anything. The reflow/layout engine will do that. That's what it does. That's what it's for.

    "If you add figures and tables into the equation, you might find that the figure that used to be on page 547 is now on 548, but the description text for it is mixed in with the text on page 546 and 547 still. That means you'll have to go in and re-write that section so that the description of the figure is closer to the figure itself."

    Or just mark the preceding paragraph "keep with next", the next paragraph as "keep with previous" and both "do not break". Reflow engine sorts it all out again, and the whole section is now immune from being screwed up against any further changes.

    Your document is also now immune against screwups from being printed out on different paper sizes, or against weirdness if you decide to put extra whitespace above/below your chapter/section headings. (Given that with a long document you are probably more likely to worry about the content before playing with things like header style formatting, this is a valid concern.) Or if you've got a new image to replace diagram 34, but it's not exactly the same size/aspect ratio. In fact, the new diagram 34 is a better aspect ratio for what it's displaying than the old one was, and you don't want to squish it and spend ages trying to make it *exactly* the same size as the old one was.

    Laying stuff out on a page is a completely automatable task, and has been since around 1978. Having a document reflow to fit new edits and technologies (e.g. better font rendering) is *how it's supposed to work*.
  • Hi Adam --

    Having written a book in Word in the past, worked with publishers, and talked to them, I can tell you for fact that WRITING everything that way is not a natural task that people simply do. It is only something that is done when one is having trouble make a particular break work.

    Once again, no one is doing it unless circumstances force it. And this what *development editors* of books are used to seeing.

    I can understand that you may be a bit stubborn about accepting this one (esp. considering your single-minded traffic on the subject here!), but are you willing to concede that:

    (a) not everyone does these things, and
    (b) even those who do it may not do it all the time, and
    (c) given (a) and (b), NOT changing font metrics avoids breaking users as easily?
  • *grumble*

    Yeah, you're right, of course. :)

    Again, I am the weird one. I have to admit, I find writing the content first and sorting out presentation at the end more natural. But, as people have often told me, I do occasionally lose track of how much weight other people put on style. :-/

    It is a pity you guys can't /ever/ update Word's layout engine to fix this sort of thing though - something I'd never quite realised before. That's a real shame.
  • Ah, but what could Word really do to fix things, in the layout engine or even anywhere else?

    I do know that attempts to automatically set these attributes have been attempted in the past (though I am not sure how recently), and have tended to not work very well -- it is simply too hard to guess at the intent of the author in all of these situations....
Page 1 of 2 (26 items) 12