Sorting it all Out Michael Kaplan's random stuff of dubious value Be sure to read the disclaimer here first!
(This is information I talked a bit about at Microsoft Research India when I was in Bangalore)
Now there have been many times that I have talked about the wonderful feature that Uniscribe's font fallback provides for us (ref: these blogs).
Of course there is one thing I forgot to mention about it.
And that is that it has a pretty fundamental flaw that comes up quite easily in the most common scenario under which it is used....
You see, one of the main goals of this default built-in fallback behavior is to allow you to keep your fonts as they are and then any time your fonts don't have a glyph for the text you are showing, it finds one from its own little list of fonts that ought to work.
As an example, we'll take our poetic word from Learning to spell in Bengali (when one doesn't know the language) and intersperse it inside the English, making the string something like:
Godhuli গোধূলি Godhuli গোধূলি Godhuli
Godhuli গোধূলি Godhuli গোধূলি Godhuli
Now I am going to take that string and shove it over in Notepad, where my default font is usually Consolas 8 pt. Here is what it looks like:
OMG that is awful - take a poetic word and turn it into an ink smudgy looking kind of thing. I guess we have to make it bigger and try 10pt:
That still looks pretty awful, maybe 12pt will be better?
Okay, it is less of a smudge now -- I can actually tell it is Bengali. But I am struggling to read it. I guess we have to move to 14pt:
Okay, now that text I can read. For completeness we'll look at 16pt too:
Maybe you can see the problem when you look at these last two ones. The trick is not look at the Bengali so much and instead look at the English.
Do you see it?
Yes, that's the problem -- you have the choice of either illegibly small Bengali or obnoxiously large Latin text.
Kind of the same problems that came up in When a font looks like crap.... and 'But there's no Latins in our Divehi font, Duckman.' 'Well, they don't *speak* Latin in the Maldives, do they?', come to think of it.
But wait a minute, why should that be? I mean even in XP, the Vrinda font does have those Latins in it. So why should this be a problem if the font used for Bengali has the Latins?
Think about it -- the way font fallback works, you are combining your original font for the Latins with the fallback font for the non-Latins!
In other words, Microsoft paid someone to build a font that might have done extensive work to make sure that the scripts within the font are scaled together quite well, and the most common way that people will use the feature will never benefit from the work!
Let's test the theory and look at with Vrinda as the font for all of the text, explicitly. First at 16pt:
Wow that looks better! Let's see 14pt:
Also really attractive. How about 12pt?
Still looking good! If we move much further down we'll run into problems again, as you can see with 10pt:
Well, let's also look at some of these in Vista. Like here is 8pt an 14pt Consolas:
Looks like the same problems are there, and we'll verify with 8pt, 10,pt, and 12pt Vrinda:
Wow, Vista even looks pretty good at the 10pt size!
And the main point, that the effort to scale the two scripts such that they look good together is a very powerful way to support better looking text.
Now the only problem is that the principal goal of Uniscribe font fallback (to not require one to explicitly choose the best font) is kind of subverted by requiring that explicit choice to be made, now isn't it?
Now as I pointed out in 'But there's no Latins in our Divehi font, Duckman.' 'Well, they don't *speak* Latin in the Maldives, do they?', there is clearly an effort to include the Latins in the fonts used for fallback, but that ah not completely happened yet, and even if was there of course the problem for all the other characters in Unicode exists -- the hard work of typographers to do the best scaling should indeed be leveraged, but a reasonable automatic scaling should exist for the less than ideal case where not all of the needed characters are provided.
But what would be needed is an effort for Uniscribe to take a more holistic approach to font choices that works beyond the individual run when adjacent runs cross script boundaries, so that the right font will be selected in such cases.
This is a reasonable approach used for the analogous problem of handling combining diacritic placement -- ideally you should provide precomposed glyphs for the font to use, but if not then information about attachment points can do a less than awful job for fallback?
Of course it would require some work to do that scaling between different fonts.
We will assume that the attempt to scale different scripts within the same font when the font foundry did not do the work to make them look good together -- case in point Arial Unicode MS -- is completely out of scope and thus will continue to look as bad as it always has looked; perhaps they wanted it this way!
Now some people might look at this blog oddly since I seem to be providing not just workaround but suggestions on how the product itself could work to fix the problem. But the issues in providing that scaling are quite complex, to which I'll add the discussion from the crap cartoon font blogs (Part 1 and Part 2) that talk about other issues that complicate the problem further....
This post brought to you by গ (U+0997, aka BENGALI LETTER GA)
I thought a point was a fixed size?
Given the same point setting, the size of the glyphs (excluding overhang?) should be the same for both fonts. Obviously Vrinda's glyphs are smaller than Consolas's, so has the foundry misinterpreted the sizing? Or is there some more subtle point here that I've missed?
Every font has different metrics and always has, so if one is flawed, they all are. The point size has very little to do with what each font chooses to do with it....
Put another way -- compare Arial and Times New Roman, and then tell me again if you think it is reasonable to expect letters to be the same size. :-)
Yes, I do think it's reasonable (in terms of height anyway - width is a different matter) but not for each letter/character, but the font as a whole.
The problem with the Latin/Bengali fonts in the main post is that the Bengali characters are all noticeably shorter than the Latin characters. Comparing W to ো is obviously unfair, but in terms Godhuli and গোধূলি, it is obvious that every Bengali character is shorter than the Latin characters.
To (attempt to) draw an analogy, the Replay Gain functionality of some media players (Winamp calls it that, others have different names) doesn't make every single audio sample the same volume - it works on the entire collection of audio samples.
And automatic scaling would try to do the same, maybe?
As far as hight of those two LATIN fonts go, try creating a 1mb document in Word with each, the Times one will be hugely shorter. Try it if you don't believe me!
Interesting that you suggest a file size rather than a line count... I'm sensing a trick here.
It's reasonably obvious in the font previewer that Vrinda is much shorter than Times New Roman with bigger gaps between the lines, which I am assuming is related to the font rather than the previewer.
If I were to attempt to come up with an automatic scaling algorithm, it would be based on some 'standard' (which points are supposed to be...) and would be map (again) desired font size/height to a point value that will give that height. For example, Vrinda 72 is about the same height as Times New Roman 48. So transparently, taking TNR as the standard, 48pt would have Vrinda drawn using 72pt.
I'm sure there's some flaw here (besides the obvious backwards compatibility issues), but I haven't spotted it yet. :-)
No trick, the line count is also different. But in large documents the page count disparity is more impressive.
Conceptually he scaling is that simple, but in practice the scaling ends up being much more complicated than this (I'll talk about this in a future blog).
If the line count is different it means the width of the font is also different, hence it is not a valid comparison of height (alone).
I'll hang out for the scaling discussion when you blog about it :-)
The width is different on the characters, too. :-)
The height can affect the page count, independent of lines. Bottom line -- different fonts take up different amounts of space....
Now lots of people have already been talking about cool changes to fonts in Windows 7, such as blogs