Wednesday, June 21, 2006 12:01 AM
Michael S. Kaplan
Give me a break [Char] !
Over on the Shell team, Jeff Miller is one of those very cool developers who knows how to get stuff done. And I am not just saying that because he let me fix a bug in the Shell's code (this bug, in fact1).
Anyway, he sent email to one of the internal aliases asking whether there were actually fonts that specified anything other than U+0020 for the tmBreakChar member of the TEXTMETRIC structure.
It is an interesting question. And it is true that tmBreakChar is documented as:
Specifies the value of the character that will be used to define word breaks for text justification.
which suggests that per language (or perhaps per script) that any character might be a good candidate, which given the fact that several different language keyboards put different characters on the space bar, might make some sense. Considering especially the MSKLC limitations on the space bar that was affecting Tibtan and still currently affects Khmer and a few other languages and scripts, it seems perfectly reasonable that other characters might be here for other scripts, other languages.
However, the truth is that none of the fonts that Microsoft uses appear to ever return anything other than the ordinary space at U+0020. Even if you look at languages which in theory might consider a different character as the best one to use for word breaks, in general this particular member (which Carolyn suggested is most likely coming from the usBreakChar entry in the OS/2 and Windows Metrics table:
This is the Unicode encoding of the glyph that Windows uses as the break character. The break character is used to separate words and justify text. Most fonts specify 'space' as the break character. This field cannot represent supplementary character values (codepoints greater than 0xFFFF).
Looking at the real complexities with complex script handling and how various languages and scripts have to handle word breaking, it is most likely that this particular member is not really used by most TrueType/OpenType implementations. So while it is true that some fonts may be setting it to something else, it does not seem like most fonts do (or that Windows would actually use the infomation if a different character were used!)....
Maybe it is just one of those holdovers in the days before complex scripts but at a time that people were thinking far enough ahead that there might be some other character between words? :-)
1 - If memory serves I asked Jeff just after I had posted that blog entry whether he woud mind if I poached the bug, and he responded with something clever like "Please, poach all you want -- we'll make more!". And he had good feedback during the code review, to boot! :-)
This post brought to you by " " (U+0020, a.k.a. SPACE)