Sunday, August 26, 2007 12:41 AM
Michael S. Kaplan
Blame Kannada! (ಕನ್ನಡ)
(Inspired by the alternate title from Oh Kannada... (ಕನ್ನಡ) and the South Park movie!)
One of those interesting issues related to rendering Indic properly came up the other day, in this case with Kannada....
The string in question, first:
ಅಹ್ಮ್ದ್ ಷರೀಫ್
If you are running on an OS that does the rendering correctly, it will not look identical to this other string:
ಅಹ್ಮ್ದ್ ಷರೀಫ್
Or this third string:
ಅಹ್ಮ್ದ್ ಷರೀಫ್
The customer was in this case seeing that third string visually for all three using some fonts, but not others, and in some technologies, but not others. And it was never working right in .NET 1.1 using GDI+ and its Graphics.DrawString method.
Now as you might have guessed, we are dealing with the combination of several different issues here, including:
- the one I pointed out in Why don't all the half forms sort right?, and the fact that the decision to unify the meanings of U+200c and U+200d across all Indic scripts is relatively recent idea recommended by Peter Constable and adopted by Unicode in recent versions;
- the one I pointed out in A quick look at Whidbey's TextRenderer, and the fact that the GDI+ shaping engines are hopelessly out of date and see little chance of being updated -- so that TextRenderer.DrawText is much preferred over Graphics.DrawString;
- the fact that (given all the above) later shaping engines and fonts and technologies will have a much better chance of displaying strings correctly.
These issues are ones that will improve over time as the older implementations that do not have right rendering story are replaced by those that do. Though I can't help wondering whether it would have been so bad to update all of the supported technologies (including GDI+) so that customers could see text correctly without depending on technology shifts....
This post brought to you by ್ (U+0ccd, a.k.a. KANNADA SIGN VIRAMA)