Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Please read the disclaimer; content not approved by Microsoft!
I have talked about Font Association several times in the pat, like in the following posts:
but was able to get some more information on this... um... charming feature that so many people dislike (as I promised to do in some of the prior posts).
It is even more restricted then I originally thought.
Here are the conditions:
If any of these conditions are not met, then Font Association does not happen.
This information goes a long way to helping to understand why the results have been so confusing for so long, since most commonly the software is developed and it is only later on that all these conditions are true and suddenly results change.
The bug is reported and the developer tweaks the font settings randomly until the problem goes away (usually they don't even find themselves able to reproduce the problem, though the judicious use of screenshots guarantees they'll believe the problem exists....
It is things like this that make me wonder how things work at companies like Apple where they do have so many opportunities to chuck legacy with newer versions and not have to deal with the cruft like this that is so hard to figure out later on.
Is the loss of backward compatibility too high of a price to pay?
Since my career has been on platforms that either did go away but not in a way that affected me since I was not tied to the technology, or that had extensive backcompat guarantees. So I can't help wondering what the starting over is like....
We get the starting over with new technologies all the time, but the old stuff is always there and sometimes the new stuff is even built on top of the old stuff!
This blog brought to you by ⺋ (U+2e8b, CJK RADICAL SEAL)
(Apologies to Steve Austin!)
The announcement went out to quite a few interested parties, it occurred to me that some of them might also be here!
It went like this:
The Unicode Consortium has recently strengthened the Unicode Character Encoding Stability Policy in accordance with the recommendations of the Unicode Technical Committee, adding the following new stability constraints: Strong normalization stability Case Pair stability Immutability of the General_Category property values Control (Cc), Private_Use (Co), and Surrogate (Cs) In addition, the text of the Property Value stability constraints has been edited for clarity, adding the formal property names and property value names. See http://www.unicode.org/policies/stability_policy.html
The Unicode Consortium has recently strengthened the Unicode Character Encoding Stability Policy in accordance with the recommendations of the Unicode Technical Committee, adding the following new stability constraints:
In addition, the text of the Property Value stability constraints has been edited for clarity, adding the formal property names and property value names.
See http://www.unicode.org/policies/stability_policy.html
I added the emphasis above since it was an interesting point for Microsoft in particular
The exact addition was:
Case Pair Stability Applicable Version: Unicode 5.0+ Two assigned characters form a case pair when the full uppercase of the first character is the second character, and the full lowercase of the second character is the first character. If two characters form a case pair in a version of Unicode, they will remain a case pair in each subsequent version of Unicode. If two characters do not form a case pair in a version of Unicode, they will never become a case pair in any subsequent version of Unicode. More formally, for given versions V and U of Unicode, and any two characters X and Y that are both assigned according to both V and U:: toLowercaseV(X) = Y AND toUppercaseV(Y) = X if and only if toLowercaseU(X) = Y AND toUppercaseU(Y) = X Note that these conditions apply to two existing, assigned characters. A character that is not part of a case pair could become part of one if the new case pair is formed at the time of the addition of a new character to Unicode. For example, a new capital version of U+028D ( ʍ ) LATIN SMALL LETTER TURNED W could be added in the future to form a new case pair.
Case Pair Stability
Applicable Version: Unicode 5.0+
Two assigned characters form a case pair when the full uppercase of the first character is the second character, and the full lowercase of the second character is the first character.
If two characters form a case pair in a version of Unicode, they will remain a case pair in each subsequent version of Unicode.
If two characters do not form a case pair in a version of Unicode, they will never become a case pair in any subsequent version of Unicode.
More formally, for given versions V and U of Unicode, and any two characters X and Y that are both assigned according to both V and U::
toLowercaseV(X) = Y AND toUppercaseV(Y) = X
if and only if
toLowercaseU(X) = Y AND toUppercaseU(Y) = X
Note that these conditions apply to two existing, assigned characters. A character that is not part of a case pair could become part of one if the new case pair is formed at the time of the addition of a new character to Unicode. For example, a new capital version of U+028D ( ʍ ) LATIN SMALL LETTER TURNED W could be added in the future to form a new case pair.
You see, this was done largely at the request of Microsoft.
It was really due to the fact that both Unicode and Microsoft had casing stability rules that were not entirely compatible, a fact that could easily lead to future problems with Microsoft moving to keep more up to date with the standard (as they did in Vista) if issues like the additions to Unicode I talked about in Every character has a story #13: U+0241 and U+0294 (upper and lower case glottal stops) were to happen again.
Because there are so many components of Windows that depend on its casing tables, changes like that would really not be possible. Therefore being able to make sure that two letters that were defined in a version of the standard but were not considered to be cased variants of each other could sit in the same directory in an NTFS partition without some future version claiming that they could not anymore....
These are the fun effects that I am really happy Microsoft does, after spending so many years sitting in Unicode without trying to drive its own requirements of it software in the standard as well.
I remember talking with Asmus, Mark, and Ken at separate times before the meeting, and all of them were very supportive -- primarily since a Microsoft that is closer to the published standard is not just a good thing for Microsoft; it is also a good thing for Unicode!
So it just makes sense if their stability policies can be aligned.
Now at the same time, it is important (in my opinion) for Microsoft to not abuse the implied power in people thinking along those lines, especially watching how another not-too-long-ago example played out in the end (the Devanagari Sindhi characters escalated into Unicode 5.0 despite the synchronization issues with 10646 when the originally hoped-for but never promised implementations never managed to appear).
But examples like this are pretty rare, and Microsoft has in the past been less proactive about things than they probably should have been so as long as the behavior is responsible then I think it really is a good thing. I wish Microsoft could be more involved in standards like Unicode than they are, sometimes!
This post brought to you by ॻ, ॼ, ॾ, and ॿ (097b, 097c, 097e, and 097f -- DEVANAGARI LETTERS GGA, JJA, DDDS, and BBA -- the Sindhi implosives)
Please read the disclaimer!
It is probably a good thing that there are people who feel comfortable turning to me to get information from time to time.
And it is probably better for my own peace of mind that even while mired in conflict (such as is hinted at here!) that they happen to be doing it very recently, since that helps keep up both my resolve to help people and my morale while helping. :-)
Like the other day when that developer who managed to impress me with her insight that led to finding some interesting bugs that I mentioned previously.
She called the other day to ask me about a problem she was having when it came to the results she was getting back from GDI and GDI+ when trying to retrieve the size of text.
Obviously with two different technologies one could expect some differences, but when the font settings and the text are identical, you really wouldn't expect the differences to be too massive, right? I mean shouldn't it boil down to two different techniques that ought to come to nearly the same answer?
She was almost at that "guess it's by design" stage that developers get to when there doesn't seem to be a good explanation for behavior that they might have to live with, but before completely giving up she thought's she'd ask if I had any ideas about what might be behind the differences, which in one of her test cases was leading to two whole characters worth of difference when comparing the two sizes.
I immediately knew what the cause was because it was on my list of things to blog about.
But her call then accidentally saved me from embarrassing myself by actually writing that blog, since it is already explained in something I wrote back in September 2006, in Font sizes vary more than one might expect. So far I think I have only made that mistake once and I was quite happy to avoid doing it again. :-)
Many of the underlying issues are covered in the MS Knowledge Base in 307208 (Why text appears different when drawn with GDIPlus versus GDI), though the KB article's principal concern is with the unusual drawing results in GDI+ in the many edge cases that lead to narrower or wider glyphs. The basics of how to scale them together are almost out of scope in this rather impressive article!
Anyway, back to that Font sizes vary more than one might expect post, it explains the only real workaround one can use for managed applications:
One workaround that can be used is to ALWAYS request font sizes that will be on the right pixel boundary for GDI -- thus the size that is passed will be the same as the rounded size would be. Note that this means you have to set the font size at runtime, since the twips and pixel settings can be different depending on runtime settings....
One workaround that can be used is to ALWAYS request font sizes that will be on the right pixel boundary for GDI -- thus the size that is passed will be the same as the rounded size would be.
Note that this means you have to set the font size at runtime, since the twips and pixel settings can be different depending on runtime settings....
The workaround reminds me more than a little bit of the Private fonts: for members only blog from November 2005, with the goal of providing sample that would allow manged code to consistently work with private fonts, or that Rhymes with Amharic series from April of last year that Scott Hanselman inspired me to write about font embedding in managed code.
Given the mixed nature of WinForms, this workaround has a nice holistic feel about it, just as these other two solutions do.
Suggests that it might make an interesting future blog to write up a sample that does the sizing! Sound interesting to anyone?
And then, like Scott suggested with the font embedding sample and I have suggested to others previously about the private fonts sample, this also might actually make for an interesting core feature to consider in WinForms!
Imagine it -- a setting that would automatically adjust the font size at runtime to make sure that all controls (the purely managed ones, the managed wrappers around unmanaged ones, and the managed ones that work through TextRenderer) have a single size that will return consistent results. The only downside to such a feature is that it could easily lead to clipping based on the font size increasing a bit in that quest for the ideal GDI pixel boundary size, but that seems like a small price to pay for controls that have the same font size setting having identically sized text, right? :-)
This blog brought to you by ভ (U+09ad, aka BENGALI LETTER BHA)
I would like to think that all of the people who read some or all of this blog (whether said reading takes place occasionally, all the time, or somewhere in between), understand the blog's nature.
In particular, when I think this, I like to think specifically about the Blog's disclaimer, which is:
Maybe I should add some emphasis there. I'll try again:
That's better!
Now over the life of Sorting it all Out, this text has subtly changed, e.g. "if it were ever to" morphing to "whenever it manages to" after it turned out that my management was running across the odd post from time to time, or perhaps more often. But the spirit of the words has remained steadfast.
Everything I post here, no matter what the source (whether from within Microsoft or without), is posted through the uniquely strange prism that is my brain, and therefore it is:
In fact, I would go so far to say that hypothesizing that any opinion I state could even result in policy maker(s) within Microsoft changing direction completely for the sole purpose of proving me officially wrong is, while probably not true in either the capital B Blog or lowercase B blog sense, not something I would be willing to bet heavy against, myself.
In previous posts I have spoken ill of the people who might show such lapses in judgment, even going so far as to cast aspersions like utter moronic wingnut on therm. In retrospect this is kind of mean and even if true it was probably unkind of me to say so, and just as I do not make fun of those who do walk in front of trains or stick their tongues in light sockets, I should not make fun of these people.
Bottom line: I don't want people making fun of me for being unable to stand upright without falling, and therefore should not make fun of people who fail to grok something I write here (or who I'm speaking for when I write it).
Therefore I won't do that anymore.
Now as to the future, if you get a business card from me as of my next order then my blog URL won't be on it. And not that you can see my commitments (my management excluded of course!), but any mention of the blog within my commitments (incidental or not) will be removed during an explicit scrub during the midyear.
While various facets of my job and the people I work with will likely continue to inspire the Blog or the blogs therein, I neither desire nor expect any facet of my Blog or the blogs therein to inspire my management's perception of the work I do.
With all that said, I personally believe that what I do here in this blog actually does have some modicum of usefulness for both me as the author and for some of you as the readers. And after Denethor (and Tolkien!) I'll say that I will continue to blog in just the same way, f"rom this hour henceforth, until R.I.F. release me, or death take me, or the world end" (or I am directed not to!).
It is vaguely possible that, over time, some of the people within Microsoft who feel that (all things being equal) Sorting it all Out tends to have a less than stellar effect on customer perception of the work the group does will reconsider their position, though offhand I'm inclined to doubt it will make a bucket of piss's worth of difference, and therefore I won't hold my breath hoping for change.
Though if you are internal to Microsoft and run across a particular blog that helped you then feel free to tell my management about the experience, especially if you plan to be specific about it....
My own voice (enhanced by my opinions), is an unsuitable representative of Microsoft Corporation, and given the choice between being an honest rebel and a dishonest boy scout, I choose the former, with neither regret nor apology. So help me blog.
Many of the characters within Unicode were eager to sponsor this blog (and/or the "character" the author looks at in the mirror each morning!) but I did not want to tarnish their reputations and thanked them but then regretfully declined.
The announcement came in this afternoon:
Unicode 5.1.0 beta period now closedThe beta period for Unicode 5.1.0 has closed. We are now in the pre-publication phase and expect to have the final release around March 31. No more substantive changes are planned, beyond those already approved by the Unicode Technical Committee. However, if you have editorial comments on the text of Unicode 5.1.0 please report via the online reporting form.Unicode 5.1.0 page:http://www.unicode.org/versions/Unicode5.1.0/Online contact form:http://www.unicode.org/reporting.htmlRegards,Rick McGowanUnicode, Inc.
Okay, we are not quite at the point where one would yell Stop the Presses! if there was some kind of urgent problem that would require attention. I have no idea what that process would look like -- say if the website were on fire or something?
Well, you can also report any editorial comments, too....
Anyway, we can sit around and wait for the release.
I'll open a box of Grapheme Clusters
and we'll make a party of it....
It is funny, it was just yesterday I was mentioning grapheme clusters in Unicode in a comment, and today we're eating the cookies. Awesome!
From that link above on the version:
This is a draft page for the eventual specification of Unicode 5.1.0. This page is under development and may be modified without notice until Unicode 5.1.0 is released. Unicode 5.1.0 is currently in the pre-publication phase and is due for release at the end of March 2008. No more substantive changes are planned, beyond those already approved by the Unicode Technical Committee. However, if you have editorial comments on the text of Unicode 5.1.0 please report via the online reporting form. Last updated 26-February-08 A. Summary Unicode 5.1 brings major benefits: improvements for security in data exchange, character additions to support Indic and South East Asian scripts, improvements to the Unicode Linebreaking Algorithm statement of conformance, standardized named sequences for Lithuanian, and provisional named sequences for Tamil. Identifiers were expanded to allow full support for Indic and Arabic scripts. Implementers will find new test data files and additional new XML data files with character properties for all Unicode characters. Several important property definitions were extended, improving linebreaking for Polish and Portuguese hyphenation. The Unicode Text Segmentation Algorithms, covering sentences, words, and characters, were greatly enhanced by creating extended combining character sequences that improve the processing of Tamil and other Indic languages. The Unicode Normalization Algorithm now defines stabilized strings and provides guidelines for buffering. This latest version of Unicode adds new characters required for Malayalam and Myanmar and important individual characters such as Latin capital sharp s for German. Version 5.1 extends support for languages in Africa, India, Indonesia, Myanmar, and Vietnam, with the addition of the Cham, Lepcha, Ol Chiki, Rejang, Saurashtra, Sundanese, and Vai scripts. Scholarly support includes important editorial punctuation marks, as well as the Carian, Lycian, and Lydian scripts, and the Phaistos disc symbols. Other new symbol sets include dominoes, Mahjong, dictionary punctuation marks, and math additions. Unicode 5.1 contains significant additions and improvements that extend text processing for software worldwide.
This is a draft page for the eventual specification of Unicode 5.1.0. This page is under development and may be modified without notice until Unicode 5.1.0 is released.
Unicode 5.1.0 is currently in the pre-publication phase and is due for release at the end of March 2008. No more substantive changes are planned, beyond those already approved by the Unicode Technical Committee. However, if you have editorial comments on the text of Unicode 5.1.0 please report via the online reporting form.
Last updated 26-February-08
Unicode 5.1 brings major benefits: improvements for security in data exchange, character additions to support Indic and South East Asian scripts, improvements to the Unicode Linebreaking Algorithm statement of conformance, standardized named sequences for Lithuanian, and provisional named sequences for Tamil. Identifiers were expanded to allow full support for Indic and Arabic scripts.
Implementers will find new test data files and additional new XML data files with character properties for all Unicode characters.
Several important property definitions were extended, improving linebreaking for Polish and Portuguese hyphenation. The Unicode Text Segmentation Algorithms, covering sentences, words, and characters, were greatly enhanced by creating extended combining character sequences that improve the processing of Tamil and other Indic languages. The Unicode Normalization Algorithm now defines stabilized strings and provides guidelines for buffering.
This latest version of Unicode adds new characters required for Malayalam and Myanmar and important individual characters such as Latin capital sharp s for German. Version 5.1 extends support for languages in Africa, India, Indonesia, Myanmar, and Vietnam, with the addition of the Cham, Lepcha, Ol Chiki, Rejang, Saurashtra, Sundanese, and Vai scripts. Scholarly support includes important editorial punctuation marks, as well as the Carian, Lycian, and Lydian scripts, and the Phaistos disc symbols. Other new symbol sets include dominoes, Mahjong, dictionary punctuation marks, and math additions. Unicode 5.1 contains significant additions and improvements that extend text processing for software worldwide.
You can see the rest of the text here. :-)
This blog brought to you by 食 (U+2fb7, aka KANGXI RADICAL EAT, though not by Keebler since that was a bit of parody!)
Yun asked:
Hello,We are investigating issues regarding to surrogated pair character on Japanese OS, we couldn’t see these chars in CharMap. Are there any other tools or do we need to do anything special to make these chars visible to CharMap?Thanks,Yun
Well, unfortunately Character Map does not handle supplementary characters.
This is something I have mentioned previously, in blogs like Is Vista Leaving on a Jet Plane [1]? and I'm simply saying that life^H^H^H^Hcharacters, uh...find a way, as regular readers may recall.
In fact, if you have a font that only supports supplementary characters (as some beta versions of one or two of the CJK fonts did in Vista!), then Character Map will simply show a blank grid -- a genuinely bizarre sight outside of RFS scenarios that one expects might even ASSERT on debug builds, let me tell you. :-)
Fun tangent, and a bit of trivia --
I actually remember an ASSERT and AV from an MSLU-enabled debug version of Character Map that I ran on windows 98 back in the day (I found the MSLU bug that caused the problem, thankfully - so no one else had to!).
Another fun tangent --
Once upon a time back when Vista was called Longhorn, Sergey came and asked me if we could add Unicode subranges to the list of ones included in Character Map. I told him that we definitely could if he could find out how the .UCE files that contained the information were built. And then he never mentioned it again (and the subranges weren't updated in Vista, either!)....
Once upon a time back when Vista was called Longhorn, Sergey came and asked me if we could add Unicode subranges to the list of ones included in Character Map.
I told him that we definitely could if he could find out how the .UCE files that contained the information were built.
And then he never mentioned it again (and the subranges weren't updated in Vista, either!)....
Anyway, you can use the Word 97 version of the Insert Symbol... dialog, something that many people feel should be a part of the operating system.
Yet another fun tangent --
Murray Sargent once asked me a few years back why we didn't just add the Office Insert Symbol... dialog to Windows. I asked him how many calls to the Office shared DLL the Insert Symbol... dialog made. "The exact number?" he asked, puzzled. "Of course not," I reassured him, "just a ballpark figure." "I'm really not sure. A lot, probably." "So what does that dialog look like if the DLL isn't there? I mean, since the Office DLL doesn't ship with Windows, and all. I'm just curious..." I trailed off. Murray never asked me about including the dialog in Windows again. And it didn't end up in Vista, either. :-)
Murray Sargent once asked me a few years back why we didn't just add the Office Insert Symbol... dialog to Windows.
I asked him how many calls to the Office shared DLL the Insert Symbol... dialog made.
"The exact number?" he asked, puzzled.
"Of course not," I reassured him, "just a ballpark figure."
"I'm really not sure. A lot, probably."
"So what does that dialog look like if the DLL isn't there? I mean, since the Office DLL doesn't ship with Windows, and all. I'm just curious..." I trailed off.
Murray never asked me about including the dialog in Windows again.
And it didn't end up in Vista, either. :-)
I think about those conversations and realize that I am probably overstating their importance. I mean Murray talked to a whole bunch of people after that one long-ago conversation, and Sergey talked to a whole bunch of people after that other one long-ago conversation.
I can't get program managers to give up on things even when they really ought to after years of conversation, so what are the odds that I inspired a couple of developers to do so just by pointing out an obstacle or two in single serving conversation? :-)
In any case, I think I've run out of tangents and trivia. Plus the question has been answered (and if it is not sufficient, then the comments of those other two posts have some suggestions for other tools that help make it happen)....
This blog brought to you by 𐐲 (U+10432, aka DESERET SMALL LETTER SHORT O)
So while I was in India, I picked up a bunch of books (my suitcase was probably 30 pounds heavier!).
One book that hardly weighed anything at all was a small one titled Learn Tamil in 30 Days by N. Jegtheesh, B.A., part of the National Integration Language Series.
No, it wasn't that I was necessarily looking to learn Tamil in 30 days or anything like that.
And although the different logic it had for character counts than others had been talking to lately was intereting, that isn't what closed the sale, either.
I was mainly interested in seeing how a native speaker of Tamil would have explained the language to someone else.
Plus there was a big table spanning several pages that just caught my eye. I have mostly reproduced it here (though swapping the "X" and "Y" axes).You can note the different transliterations for letters that are used -- I'd say it would provide some hints for this post, though not as many as I might have liked since it has everything as uppercase.
(The most annoying part of this table was how much crap Word added to it, even when I saved it as filtered HTML. I guess the un prefix in that word filtered is silent in Microsoft Word? Though luckily my version of the filtered file is about 20% of the size yet looks identical!)
ஃAKH அA ஆAA இE ஈEE உU ஊOO எĀ ஏAE ஐI ஒO ஓOH ஔOU ஃAKH க்K கKA காKAA கிKE கீKEE குKU கூKOO கெKĀ கேKAE கைKAI கொKO கோKOH கௌKOU க்K ங்NG ஙNGA ஙாNGAA ஙிNGE ஙீNGEE ஙுNGU ஙூNGOO ஙெNGĀ ஙேNGAE ஙைNGAI ஙொNGO ஙோNGOH ஙௌNGOU ங்NG ச்CH சCHA சாCHAA சிCHE சீCHEE சுCHU சூCHOO செCHĀ சேCHAE சைCHAI சொCHO சோCHOH சௌCHOU ச்CH ஞ்GN ஞGNA ஞாGNAA ஞிGNE ஞீGNEE ஞுGNU ஞூGNOO ஞெGNĀ ஞேGNAE ஞைGNAI ஞொGNO ஞோGNOH ஞௌGNOU ஞ்GN ட்D டDA டாDAA டிDE டீDEE டுDU டூDOO டெDĀ டேDAE டைDAI டொDO டோDOH டௌDOU ட்D ண்NN ணNNA ணாNNAA ணிNNE ணீNNEE ணுNNU ணூNNOO ணெNNĀ ணேNNAE ணைNNAI ணொNNO ணோNNOH ணௌNNOU ண்NN த்TH தTHA தாTHAA திTHE தீTHEE துTHU தூTHOO தெTHĀ தேTHAE தைTHAI தொTHO தோTHOH தௌTHOU த்TH ந்N நNA நாNAA நிNE நீNEE நுNU நூNOO நெNĀ நேNAE நைNAI நொNO நோNOH நௌNOU ந்N ப்P பPA பாPAA பிPE பீPEE புPU பூPOO பெPĀ பேPAE பைPAI பொPO போPOH பௌPOU ப்P ம்M மMA மாMAA மிME மீMEE முMU மூMOO மெMĀ மேMAE மைMAI மொMO மோMOH மௌMOU ம்M ய்Y யYA யாYAA யிYE யீYEE யுYU யூYOO யெYĀ யேYAE யைYAI யொYO யோYOH யௌYOU ய்Y ர்R ரRA ராRAA ரிRE ரீREE ருRU ரூROO ரெRĀ ரேRAE ரைRAI ரொRO ரோROH ரௌROU ர்R ல்L லLA லாLAA லிLE லீLEE லுLU லூLOO லெLĀ லேLAE லைLAI லொLO லோLOH லௌLOU ல்L வ்V வVA வாVAA விVE வீVEE வுVU வூVOO வெVĀ வேVAE வைVAI வொVO வோVOH வௌVOU வ்V ழ்ZH ழZHA ழாZHAA ழிZHE ழீZHEE ழுZHU ழூZHOO ழெZHĀ ழேZHAE ழைZHAI ழொZHO ழோZHOH ழௌZHOU ழ்ZH ள்LL ளLLA ளாLLAA ளிLLE ளீLLEE ளுLLU ளூLLOO ளெLLĀ ளேLLAE ளைLLAI ளொLLO ளோLLOH ளௌLLOU ள்LL ற்RR றRRA றாRRAA றிRRE றீRREE றுRRU றூRROO றெRRĀ றேRRAE றைRRAI றொRRO றோRROH றௌRROU ற்RR ன்N னNA னாNAA னிNR னீNEE னுNU னூNOO னெNĀ னேNAE னைNAI னொNO னோNOH னௌNOU ன்N ஃAKH அA ஆAA இE ஈEE உU ஊOO எĀ ஏAE ஐI ஒO ஓOH ஔOU ஃAKH
ஃAKH
அA
ஆAA
இE
ஈEE
உU
ஊOO
எĀ
ஏAE
ஐI
ஒO
ஓOH
ஔOU
க்K
கKA
காKAA
கிKE
கீKEE
குKU
கூKOO
கெKĀ
கேKAE
கைKAI
கொKO
கோKOH
கௌKOU
ங்NG
ஙNGA
ஙாNGAA
ஙிNGE
ஙீNGEE
ஙுNGU
ஙூNGOO
ஙெNGĀ
ஙேNGAE
ஙைNGAI
ஙொNGO
ஙோNGOH
ஙௌNGOU
ச்CH
சCHA
சாCHAA
சிCHE
சீCHEE
சுCHU
சூCHOO
செCHĀ
சேCHAE
சைCHAI
சொCHO
சோCHOH
சௌCHOU
ஞ்GN
ஞGNA
ஞாGNAA
ஞிGNE
ஞீGNEE
ஞுGNU
ஞூGNOO
ஞெGNĀ
ஞேGNAE
ஞைGNAI
ஞொGNO
ஞோGNOH
ஞௌGNOU
ட்D
டDA
டாDAA
டிDE
டீDEE
டுDU
டூDOO
டெDĀ
டேDAE
டைDAI
டொDO
டோDOH
டௌDOU
ண்NN
ணNNA
ணாNNAA
ணிNNE
ணீNNEE
ணுNNU
ணூNNOO
ணெNNĀ
ணேNNAE
ணைNNAI
ணொNNO
ணோNNOH
ணௌNNOU
த்TH
தTHA
தாTHAA
திTHE
தீTHEE
துTHU
தூTHOO
தெTHĀ
தேTHAE
தைTHAI
தொTHO
தோTHOH
தௌTHOU
ந்N
நNA
நாNAA
நிNE
நீNEE
நுNU
நூNOO
நெNĀ
நேNAE
நைNAI
நொNO
நோNOH
நௌNOU
ப்P
பPA
பாPAA
பிPE
பீPEE
புPU
பூPOO
பெPĀ
பேPAE
பைPAI
பொPO
போPOH
பௌPOU
ம்M
மMA
மாMAA
மிME
மீMEE
முMU
மூMOO
மெMĀ
மேMAE
மைMAI
மொMO
மோMOH
மௌMOU
ய்Y
யYA
யாYAA
யிYE
யீYEE
யுYU
யூYOO
யெYĀ
யேYAE
யைYAI
யொYO
யோYOH
யௌYOU
ர்R
ரRA
ராRAA
ரிRE
ரீREE
ருRU
ரூROO
ரெRĀ
ரேRAE
ரைRAI
ரொRO
ரோROH
ரௌROU
ல்L
லLA
லாLAA
லிLE
லீLEE
லுLU
லூLOO
லெLĀ
லேLAE
லைLAI
லொLO
லோLOH
லௌLOU
வ்V
வVA
வாVAA
விVE
வீVEE
வுVU
வூVOO
வெVĀ
வேVAE
வைVAI
வொVO
வோVOH
வௌVOU
ழ்ZH
ழZHA
ழாZHAA
ழிZHE
ழீZHEE
ழுZHU
ழூZHOO
ழெZHĀ
ழேZHAE
ழைZHAI
ழொZHO
ழோZHOH
ழௌZHOU
ள்LL
ளLLA
ளாLLAA
ளிLLE
ளீLLEE
ளுLLU
ளூLLOO
ளெLLĀ
ளேLLAE
ளைLLAI
ளொLLO
ளோLLOH
ளௌLLOU
ற்RR
றRRA
றாRRAA
றிRRE
றீRREE
றுRRU
றூRROO
றெRRĀ
றேRRAE
றைRRAI
றொRRO
றோRROH
றௌRROU
ன்N
னNA
னாNAA
னிNR
னீNEE
னுNU
னூNOO
னெNĀ
னேNAE
னைNAI
னொNO
னோNOH
னௌNOU
Anyway, I just thought I'd share it with all of you.
Note how it puts two N entries in there (in Unicode the first one is ன (U+0ba9, TAMIL LETTER NNNA) and the second one is ந (U+0ba8, TAMIL LETTER NA). But beyond that, several of the transliterations do seem quite odd to me, as used to the character names as I am....
Some others letters are listed after these ones (like the ones used in Tamil Grantha, etc).
Let's try it again with some Unicode code points in it, just for grins:
ஃAKH0B83 அA0B85 ஆAA0B86 இE0B87 ஈEE0B88 உU0B89 ஊOO0B8A எĀ0B8E ஏAE0B8F ஐI0B90 ஒO0B92 ஓOH0B93 ஔOU0B94 ஃAKH0B83 க்K0B95 0BCD கKA0B95 காKAA0B95 0BBE கிKE0B95 0BBF கீKEE0B95 0BC0 குKU0B95 0BC1 கூKOO0B95 0BC2 கெKĀ0B95 0BC6 கேKAE0B95 0BC7 கைKAI0B95 0BC8 கொKO0B95 0BCA கோKOH0B95 0BCB கௌKOU0B95 0BCC க்K0B95 0BCD ங்NG0B99 0BCD ஙNGA0B99 ஙாNGAA0B99 0BBE ஙிNGE0B99 0BBF ஙீNGEE0B99 0BC0 ஙுNGU0B99 0BC1 ஙூNGOO0B99 0BC2 ஙெNGĀ0B99 0BC6 ஙேNGAE0B99 0BC7 ஙைNGAI0B99 0BC8 ஙொNGO0B99 0BCA ஙோNGOH0B99 0BCB ஙௌNGOU0B99 0BCC ங்NG0B99 0BCD ச்CH0B9A 0BCD சCHA0B9A சாCHAA0BBE சிCHE0B9A 0BBF சீCHEE0B9A 0BC0 சுCHU0B9A 0BC1 சூCHOO0B9A 0BC2 செCHĀ0B9A 0BC6 சேCHAE0B9A 0BC7 சைCHAI0B9A 0BC8 சொCHO0B9A 0BCA சோCHOH0B9A 0BCB சௌCHOU0B9A 0BCC ச்CH0B9A 0BCD ஞ்GN0B9E 0BCD ஞGNA0B9E ஞாGNAA0B9E 0BBE ஞிGNE0B9E 0BBF ஞீGNEE0B9E 0BC0 ஞுGNU0B9E 0BC1 ஞூGNOO0B9E 0BC2 ஞெGNĀ0B9E 0BC6 ஞேGNAE0B9E 0BC7 ஞைGNAI0B9E 0BC8 ஞொGNO0B9E 0BCA ஞோGNOH0B9E 0BCB ஞௌGNOU0B9E 0BCC ஞ்GN0B9E 0BCD ட்D0B9F 0BCD டDA0B9F டாDAA0B9F 0BBE டிDE0B9F 0BBF டீDEE0B9F 0BC0 டுDU0B9F 0BC1 டூDOO0B9F 0BC2 டெDĀ0B9F 0BC6 டேDAE0B9F 0BC7 டைDAI0B9F 0BC8 டொDO0B9F 0BCA டோDOH0B9F 0BCB டௌDOU0B9F 0BCC ட்D0B9F 0BCD ண்NN0BA3 0BCD ணNNA0BA3 ணாNNAA0BA3 0BBE ணிNNE0BA3 0BBF ணீNNEE0BA3 0BC0 ணுNNU0BA3 0BC1 ணூNNOO0BA3 0BC2 ணெNNĀ0BA3 0BC6 ணேNNAE0BA3 0BC7 ணைNNAI0BA3 0BC8 ணொNNO0BA3 0BCA ணோNNOH0BA3 0BCB ணௌNNOU0BA3 0BCC ண்NN0BA3 0BCD த்TH0BA4 0BCD தTHA0BA4 தாTHAA0BA4 0BBE திTHE0BA4 0BBF தீTHEE0BA4 0BC0 துTHU0BA4 0BC1 தூTHOO0BA4 0BC2 தெTHĀ0BA4 0BC6 தேTHAE0BA4 0BC7 தைTHAI0BA4 0BC8 தொTHO0BA4 0BCA தோTHOH0BA4 0BCB தௌTHOU0BA4 0BCC த்TH0BA4 0BCD ந்N0BA8 0BCD நNA0BA8 நாNAA0BA8 0BBE நிNE0BA8 0BBF நீNEE0BA8 0BC0 நுNU0BA8 0BC1 நூNOO0BA8 0BC2 நெNĀ0BA8 0BC6 நேNAE0BA8 0BC7 நைNAI0BA8 0BC8 நொNO0BA8 0BCA நோNOH0BA8 0BCB நௌNOU0BA8 0BCC ந்N0BA8 0BCD ப்P0BAA 0BCD பPA0BAA பாPAA0BAA 0BBE பிPE0BAA 0BBF பீPEE0BAA 0BC0 புPU0BAA 0BC1 பூPOO0BAA 0BC2 பெPĀ0BAA 0BC6 பேPAE0BAA 0BC7 பைPAI0BAA 0BC8 பொPO0BAA 0BCA போPOH0BAA 0BCB பௌPOU0BAA 0BCC ப்P0BAA 0BCD ம்M0BAE 0BCD மMA0BAE மாMAA0BAE 0BBE மிME0BAE 0BBF மீMEE0BAE 0BC0 முMU0BAE 0BC1 மூMOO0BAE 0BC2 மெMĀ0BAE 0BC6 மேMAE0BAE 0BC7 மைMAI0BAE 0BC8 மொMO0BAE 0BCA மோMOH0BAE 0BCB மௌMOU0BAE 0BCC ம்M0BAE 0BCD ய்Y0BAF 0BCD யYA0BAF யாYAA0BAF 0BBE யிYE0BAF 0BBF யீYEE0BAF 0BC0 யுYU0BAF 0BC1 யூYOO0BAF 0BC2 யெYĀ0BAF 0BC6 யேYAE0BAF 0BC7 யைYAI0BAF 0BC8 யொYO0BAF 0BCA யோYOH0BAF 0BCB யௌYOU0BAF 0BCC ய்Y0BAF 0BCD ர்R0BB0 0BCD ரRA0BB0 ராRAA0BB0 0BBE ரிRE0BB0 0BBF ரீREE0BB0 0BC0 ருRU0BB0 0BC1 ரூROO0BB0 0BC2 ரெRĀ0BB0 0BC6 ரேRAE0BB0 0BC7 ரைRAI0BB0 0BC8 ரொRO0BB0 0BCA ரோROH0BB0 0BCB ரௌROU0BB0 0BCC ர்R0BB0 0BCD ல்L0BB2 0BCD லLA0BB2 லாLAA0BB2 0BBE லிLE0BB2 0BBF லீLEE0BB2 0BC0 லுLU0BB2 0BC1 லூLOO0BB2 0BC2 லெLĀ0BB2 0BC6 லேLAE0BB2 0BC7 லைLAI0BB2 0BC8 லொLO0BB2 0BCA லோLOH0BB2 0BCB லௌLOU0BB2 0BCC ல்L0BB2 0BCD வ்V0BB5 0BCD வVA0BB5 வாVAA0BB5 0BBE விVE0BB5 0BBF வீVEE0BB5 0BC0 வுVU0BB5 0BC1 வூVOO0BB5 0BC2 வெVĀ0BB5 0BC6 வேVAE0BB5 0BC7 வைVAI0BB5 0BC8 வொVO0BB5 0BCA வோVOH0BB5 0BCB வௌVOU0BB5 0BCC வ்V0BB5 0BCD ழ்ZH0BB4 0BCD ழZHA0BB4 ழாZHAA0BB4 0BBE ழிZHE0BB4 0BBF ழீZHEE0BB4 0BC0 ழுZHU0BB4 0BC1 ழூZHOO0BB4 0BC2 ழெZHĀ0BB4 0BC6 ழேZHAE0BB4 0BC7 ழைZHAI0BB4 0BC8 ழொZHO0BB4 0BCA ழோZHOH0BB4 0BCB ழௌZHOU0BB4 0BCC ழ்ZH0BB4 0BCD ள்LL0BB3 0BCD ளLLA0BB3 ளாLLAA0BB3 0BBE ளிLLE0BB3 0BBF ளீLLEE0BB3 0BC0 ளுLLU0BB3 0BC1 ளூLLOO0BB3 0BC2 ளெLLĀ0BB3 0BC6 ளேLLAE0BB3 0BC7 ளைLLAI0BB3 0BC8 ளொLLO0BB3 0BCA ளோLLOH0BB3 0BCB ளௌLLOU0BB3 0BCC ள்LL0BB3 0BCD ற்RR0BB1 0BCD றRRA0BB1 றாRRAA0BB1 0BBE றிRRE0BB1 0BBF றீRREE0BB1 0BC0 றுRRU0BB1 0BC1 றூRROO0BB1 0BC2 றெRRĀ0BB1 0BC6 றேRRAE0BB1 0BC7 றைRRAI0BB1 0BC8 றொRRO0BB1 0BCA றோRROH0BB1 0BCB றௌRROU0BB1 0BCC ற்RR0BB1 0BCD ன்N0BA9 0BCD னNA0BA9 னாNAA0BA9 0BBE னிNR0BA9 0BBF னீNEE0BA9 0BC0 னுNU0BA9 0BC1 னூNOO0BA9 0BC2 னெNĀ0BA9 0BC6 னேNAE0BA9 0BC7 னைNAI0BA9 0BC8 னொNO0BA9 0BCA னோNOH0BA9 0BCB னௌNOU0BA9 0BCC ன்N0BA9 0BCD ஃAKH0B83 அA0B85 ஆAA0B86 இE0B87 ஈEE0B88 உU0B89 ஊOO0B8A எĀ0B8E ஏAE0B8F ஐI0B90 ஒO0B92 ஓOH0B93 ஔOU0B94 ஃAKH0B83
ஃAKH0B83
அA0B85
ஆAA0B86
இE0B87
ஈEE0B88
உU0B89
ஊOO0B8A
எĀ0B8E
ஏAE0B8F
ஐI0B90
ஒO0B92
ஓOH0B93
ஔOU0B94
க்K0B95 0BCD
கKA0B95
காKAA0B95 0BBE
கிKE0B95 0BBF
கீKEE0B95 0BC0
குKU0B95 0BC1
கூKOO0B95 0BC2
கெKĀ0B95 0BC6
கேKAE0B95 0BC7
கைKAI0B95 0BC8
கொKO0B95 0BCA
கோKOH0B95 0BCB
கௌKOU0B95 0BCC
ங்NG0B99 0BCD
ஙNGA0B99
ஙாNGAA0B99 0BBE
ஙிNGE0B99 0BBF
ஙீNGEE0B99 0BC0
ஙுNGU0B99 0BC1
ஙூNGOO0B99 0BC2
ஙெNGĀ0B99 0BC6
ஙேNGAE0B99 0BC7
ஙைNGAI0B99 0BC8
ஙொNGO0B99 0BCA
ஙோNGOH0B99 0BCB
ஙௌNGOU0B99 0BCC
ச்CH0B9A 0BCD
சCHA0B9A
சாCHAA0BBE
சிCHE0B9A 0BBF
சீCHEE0B9A 0BC0
சுCHU0B9A 0BC1
சூCHOO0B9A 0BC2
செCHĀ0B9A 0BC6
சேCHAE0B9A 0BC7
சைCHAI0B9A 0BC8
சொCHO0B9A 0BCA
சோCHOH0B9A 0BCB
சௌCHOU0B9A 0BCC
ஞ்GN0B9E 0BCD
ஞGNA0B9E
ஞாGNAA0B9E 0BBE
ஞிGNE0B9E 0BBF
ஞீGNEE0B9E 0BC0
ஞுGNU0B9E 0BC1
ஞூGNOO0B9E 0BC2
ஞெGNĀ0B9E 0BC6
ஞேGNAE0B9E 0BC7
ஞைGNAI0B9E 0BC8
ஞொGNO0B9E 0BCA
ஞோGNOH0B9E 0BCB
ஞௌGNOU0B9E 0BCC
ட்D0B9F 0BCD
டDA0B9F
டாDAA0B9F 0BBE
டிDE0B9F 0BBF
டீDEE0B9F 0BC0
டுDU0B9F 0BC1
டூDOO0B9F 0BC2
டெDĀ0B9F 0BC6
டேDAE0B9F 0BC7
டைDAI0B9F 0BC8
டொDO0B9F 0BCA
டோDOH0B9F 0BCB
டௌDOU0B9F 0BCC
ண்NN0BA3 0BCD
ணNNA0BA3
ணாNNAA0BA3 0BBE
ணிNNE0BA3 0BBF
ணீNNEE0BA3 0BC0
ணுNNU0BA3 0BC1
ணூNNOO0BA3 0BC2
ணெNNĀ0BA3 0BC6
ணேNNAE0BA3 0BC7
ணைNNAI0BA3 0BC8
ணொNNO0BA3 0BCA
ணோNNOH0BA3 0BCB
ணௌNNOU0BA3 0BCC
த்TH0BA4 0BCD
தTHA0BA4
தாTHAA0BA4 0BBE
திTHE0BA4 0BBF
தீTHEE0BA4 0BC0
துTHU0BA4 0BC1
தூTHOO0BA4 0BC2
தெTHĀ0BA4 0BC6
தேTHAE0BA4 0BC7
தைTHAI0BA4 0BC8
தொTHO0BA4 0BCA
தோTHOH0BA4 0BCB
தௌTHOU0BA4 0BCC
ந்N0BA8 0BCD
நNA0BA8
நாNAA0BA8 0BBE
நிNE0BA8 0BBF
நீNEE0BA8 0BC0
நுNU0BA8 0BC1
நூNOO0BA8 0BC2
நெNĀ0BA8 0BC6
நேNAE0BA8 0BC7
நைNAI0BA8 0BC8
நொNO0BA8 0BCA
நோNOH0BA8 0BCB
நௌNOU0BA8 0BCC
ப்P0BAA 0BCD
பPA0BAA
பாPAA0BAA 0BBE
பிPE0BAA 0BBF
பீPEE0BAA 0BC0
புPU0BAA 0BC1
பூPOO0BAA 0BC2
பெPĀ0BAA 0BC6
பேPAE0BAA 0BC7
பைPAI0BAA 0BC8
பொPO0BAA 0BCA
போPOH0BAA 0BCB
பௌPOU0BAA 0BCC
ம்M0BAE 0BCD
மMA0BAE
மாMAA0BAE 0BBE
மிME0BAE 0BBF
மீMEE0BAE 0BC0
முMU0BAE 0BC1
மூMOO0BAE 0BC2
மெMĀ0BAE 0BC6
மேMAE0BAE 0BC7
மைMAI0BAE 0BC8
மொMO0BAE 0BCA
மோMOH0BAE 0BCB
மௌMOU0BAE 0BCC
ய்Y0BAF 0BCD
யYA0BAF
யாYAA0BAF 0BBE
யிYE0BAF 0BBF
யீYEE0BAF 0BC0
யுYU0BAF 0BC1
யூYOO0BAF 0BC2
யெYĀ0BAF 0BC6
யேYAE0BAF 0BC7
யைYAI0BAF 0BC8
யொYO0BAF 0BCA
யோYOH0BAF 0BCB
யௌYOU0BAF 0BCC
ர்R0BB0 0BCD
ரRA0BB0
ராRAA0BB0 0BBE
ரிRE0BB0 0BBF
ரீREE0BB0 0BC0
ருRU0BB0 0BC1
ரூROO0BB0 0BC2
ரெRĀ0BB0 0BC6
ரேRAE0BB0 0BC7
ரைRAI0BB0 0BC8
ரொRO0BB0 0BCA
ரோROH0BB0 0BCB
ரௌROU0BB0 0BCC
ல்L0BB2 0BCD
லLA0BB2
லாLAA0BB2 0BBE
லிLE0BB2 0BBF
லீLEE0BB2 0BC0
லுLU0BB2 0BC1
லூLOO0BB2 0BC2
லெLĀ0BB2 0BC6
லேLAE0BB2 0BC7
லைLAI0BB2 0BC8
லொLO0BB2 0BCA
லோLOH0BB2 0BCB
லௌLOU0BB2 0BCC
வ்V0BB5 0BCD
வVA0BB5
வாVAA0BB5 0BBE
விVE0BB5 0BBF
வீVEE0BB5 0BC0
வுVU0BB5 0BC1
வூVOO0BB5 0BC2
வெVĀ0BB5 0BC6
வேVAE0BB5 0BC7
வைVAI0BB5 0BC8
வொVO0BB5 0BCA
வோVOH0BB5 0BCB
வௌVOU0BB5 0BCC
ழ்ZH0BB4 0BCD
ழZHA0BB4
ழாZHAA0BB4 0BBE
ழிZHE0BB4 0BBF
ழீZHEE0BB4 0BC0
ழுZHU0BB4 0BC1
ழூZHOO0BB4 0BC2
ழெZHĀ0BB4 0BC6
ழேZHAE0BB4 0BC7
ழைZHAI0BB4 0BC8
ழொZHO0BB4 0BCA
ழோZHOH0BB4 0BCB
ழௌZHOU0BB4 0BCC
ள்LL0BB3 0BCD
ளLLA0BB3
ளாLLAA0BB3 0BBE
ளிLLE0BB3 0BBF
ளீLLEE0BB3 0BC0
ளுLLU0BB3 0BC1
ளூLLOO0BB3 0BC2
ளெLLĀ0BB3 0BC6
ளேLLAE0BB3 0BC7
ளைLLAI0BB3 0BC8
ளொLLO0BB3 0BCA
ளோLLOH0BB3 0BCB
ளௌLLOU0BB3 0BCC
ற்RR0BB1 0BCD
றRRA0BB1
றாRRAA0BB1 0BBE
றிRRE0BB1 0BBF
றீRREE0BB1 0BC0
றுRRU0BB1 0BC1
றூRROO0BB1 0BC2
றெRRĀ0BB1 0BC6
றேRRAE0BB1 0BC7
றைRRAI0BB1 0BC8
றொRRO0BB1 0BCA
றோRROH0BB1 0BCB
றௌRROU0BB1 0BCC
ன்N0BA9 0BCD
னNA0BA9
னாNAA0BA9 0BBE
னிNR0BA9 0BBF
னீNEE0BA9 0BC0
னுNU0BA9 0BC1
னூNOO0BA9 0BC2
னெNĀ0BA9 0BC6
னேNAE0BA9 0BC7
னைNAI0BA9 0BC8
னொNO0BA9 0BCA
னோNOH0BA9 0BCB
னௌNOU0BA9 0BCC
I may talk more about this book from time to time -- like its different take on the ketter count (even with the additional letters it includes outside of the above table). As well as other, smiliar books I picked up covering other languages of India. Or maybe even other books from the pile.
Language is fascinating the living crap out of me at the moment, people....
No Unicode character was unconfused enough to comfortably sponsor this post -- the unfamiliar transliterations and "unfiltered" filtered HTML in Word stunned the lot of them!
Brian was once telling me about a meeting of various folks from the .NET Framework. Anders was there (yes, that Anders!).
Anyway, the meeting started in the normal way but people found themselves rat-holing on some very obscure technical issues. It was spinning further and further out of control into scenarios that were very obscure, bordering on invented or even imaginary. It was easy to follow the logic if you were there, but otherwise this shared form of associative linkage would be impossible to grok.
Finally, Anders stopped everyone, and asked whether anyone knew precisely what was being discussed at the moment. There was an embarrassed pause, as everyone realized that no one did.
And then Anders ended the meeting.
Apropos of nothing, perhaps. But I'll let you decide. :-)
The unrelated question that Rags asked yesterday was straightforward enough:
DateTime.ParseExact (apparently) does a case-insensitive comparison of the names of the days and months. Is there a way to make it do case-sensitive comparison?I’m dealing with parsing of date time strings given in RFC 1123 format. I use DateTime.ParseExact & pass ‘R’ to the format string to indicate RFC1123Pattern for date time format. However the following string is being considered as valid even though it’s invalid because the name of the day is case-sensitive (& must be Sun not SUN) according to RFC."SUN, 06 Nov 1994 08:49:37 GMT"Thanks
Malcolm stepped up with some information from the referenced standard that helped understand the urgency of the request:
RFC 822, which RFC 1123 references re Date/Time formats, specifically states:<<3.4.7. CASE INDEPENDENCE Except as noted, alphabetic strings may be represented in any combination of upper and lower case.... When matching any other syntactic unit, case is to be ignored. For example, the field-names "From", "FROM", "from", and even "FroM" are semantically equal and should all be treated ident- ically.>>
Ah good, so that takes care of that. :-)
Or maybe not?
Rags replied back:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.3 states the following in the context of HTTPAll HTTP date/time stamps MUST be represented in Greenwich Mean Time (GMT), without exception. For the purposes of HTTP, GMT is exactly equal to UTC (Coordinated Universal Time). This is indicated in the first two formats by the inclusion of "GMT" as the three-letter abbreviation for time zone, and MUST be assumed when reading the asctime format. HTTP-date is case sensitive and MUST NOT include additional LWS beyond that specifically included as SP in the grammar. Thanks
Let's pretend that RFC1123 and RFC2616 are tied together despite no such reference and that RFC1123 and RFC822 are not despite having one, and then take this one around the dancefloor three times. :-)
Well, if one calls DateTime.ParseExact then one will have a DateTime.
One could take that DateTime and then then turn around and call DateTime.ToString("R") on it.
And then compare the resulting string to the original one before the parse, using whatever means one wishes to use!
I would definitely recommend against an Ordinal comparison here, since út (U+00fa U+0074) == út (U+0075 U+0301 U+0074) thanks to Unicode canonical equivalence, and you would almost certainly want them to be considered equal to each other. Since they are meant to be considered equal to each other.
If you wanted to use Ordinal comparisons in spite of all that, you could always normalize the strings, I suppose....
Of course that seems like a lot of work, but then that's what artificial requirements often bring to the party.
I am going to take this opportunity to point out that we are so far from the original scenario that the methods may as well be called TryParseImaginary and ToStringFictional. The blog (though not the Blog) has lost sight of what it is trying to say.
And although I'm not Anders1, I am now going to thank you all for reading this far, and end the blog post.
1 - Well, I suppose I could claim that by virtue of being the owner and sole author of SiaO that I am the Chairman, CEO, COO, CTO, CFO, VP, TF, and DE of the entire Blog, but that is really kind of silly, you know? :-)
This blog brought to you by ú (U+00fa, aka LATIN SMALL LETTER U WITH ACUTE)
Prior posts in the series:
Okay, we have now gone through a bunch of information on the Table Driven Text Service component and the text files that define the identity and behavior of individual Text Profiles.
So what happens next, exactly?
Well, let's start with the Text Profile I discussed, demo'd, and did not yet give to anyone in And we are the knights who say நீ (NII).
The framework I am using is the same a in that post, plus the feedback in the comments. Like the following for each consonant:
"n" = "ந்""na" = "ந""naa" = "நா""ni" = "நி""nii" = "நீ""nu" = "நு""nuu" = "நூ""ne" = "நெ""nee" = "நே""nai" = "நை""no" = "நொ""noo" = "நோ""nau" = "நௌ"
And then the following pure (independent) vowels:
a அaa ஆi இii ஈu உuu ஊe எee ஏai ஐo ஒoo ஓau ஔ
And the following consonants:
k க்ng ங்c ச்j ஜ்ny ஞ்tt ட்d ட்nnn ண்th த்n ந்nn ன்p ப்f ஃப்m ம்y ய்r ர்rr ற்l ல்L ள்zh ழ்v வ்ss ஷ்s ஸ்sh ஶ்h ஹ்
But as others have pointed out, this is kind of tedious -- there are some many combinations that really should be handled by using different cases rather than requiring a person to type two vowels.
Now this is currently a limitation in TableTextService.DLL but it may nlot always be -- some future version my address the limitation.
In fact if you look at the Amharic input method in Vista and its text file, you'll see that it mixes upper and lower case on the input side, in anticipation of that limitation being addressed at some point. In the meantime , when you have multiple entries with the same letter differing oinly by case, they will simply both show up in the candidate list.
So what is the principle here, native Tamil speakers? Taking the above lists, which ones would you change the left side entries with, and how?
When I get back all of the rest of the feedback, we'll replace my "based on Unicode character names" input method with one that will perhaps be a bit closer to intuitive!
Then we'll configure the various settings and produce our ideal Tamil input method....
So, any native speakers want to chime in on their replacements? I have tried to do the ones that others suggested in the comments of that post, but I'd like to get them all done -- concentrate on the second and third lists above, noting how
l ல்ll ள்
became
l ல்L ள்
and going from there....
Now of course this is kind of a transliteration keyboard, but it does not have to be if there are keyboards that print just Tami letters on them and we wanted to have this input method match it. Does anyone have such a keyboard? And if so could they take a picture of it?
This post brought to you by நீ (U+0ba8 U+0bc0, a.k.a. TAMIL LETTER NA + TAMIL VOWEL SIGN II)
One never knows how dumb one was until one gets a little bit smarter. :-)
I'll explain what I mean, how I was reminded of this the other day....
I must admit the question that came to me was a bit more polite that the title, it was actually more like:
Do you know anything about this prop? I checked the Japanese Keyboard and English keyboard with different locale, but it just seems to return the LCID of the corresponding Locale. Please let me know if you know any.
And the underlying question that person had been asked was also more polite:
I would like to check with you if you have information about who is using this particular property and how they use it. It does not have a matching LCTYPE[ in Windows]. Any idea?
This property (in both CultureInfo.KeyboardLayoutId and CultureAndRegionInfoBuilder.KeyboardLayoutId) was kind of my idea because of an LCID dependency that the InputLanguage class has in its InputLanguage.FromCulture method.
Now in fairness that dependency was only there because the developers were looking at the Win32 GetKeyboardLayoutList function and seeing it was the only way to enumerate loaded keyboards (which is basically all that FromCulture was trying to do (map loaded input languages with the passed in CultureInfo).
But now with custom culture, the CultureInfo.LCID would not be a useful thing to compare; it would alway fail.
So this KeyboardLayoutId was added as a stopgap -- so there would be a way to specify the keyboard to load.
Anyway, I am not taking the credit for this idea, more like I am taking the blame. Because the view from this CLR-specific problem was too narrow!
Working with folks later on MSKLC, the solution mentioned in Getting the language of an LCID-less keyboard and then documented in Getting the language (and more!) of an LCID-less keyboard with the apocrypha at MSKLC keyboard layout names in your own language became an obvious way to not only support custom languages to do along with custom keyboards, but a way for someone to query exactly what the custom locale name might be!
Of course this other property will always be there unfortunately, but now future versions have a method they can look into using to let them shed the LCID and now KeyboardLayoutId dependency!
I suppose there is a lesson here -- it is important to think about the whole problem rather than just the piece in front of you, if you want to come up with the best solution in the long run....
This post brought to you by ᕓ (U+1553, a.k.a. CANADIAN SYLLABICS FE)
Last week, over on the Volt Users Community, azhary asked:
Hi every one,We are designing a meroetic font ( not supported yet by Unicode). The letters (about 25 letters) should display from right to left (rtl). At the same time they are isolated (not cursive).We have used Coreldraw to create the outlines of the font. The questions I have: -- How to use Volt (or otherwise) to make it an rtl font? -- What Unicode range should we use? Thanks.
John Hudson swooped in to provide a solid, immediate answer:
This is a difficult situation, because normally one should avoid using defined Unicode ranges for things other than what they are, and you should avoid using reserved Unicode codepoints. The Private Use Area of Unicode is available, but it has the limitation that all PUA codepoints have a presumed left-to-right direction (directionality is a character property, not something that you set in the font).If your goal is simply to be facilitate Meroetic text entry and display, while awaiting an eventual Unicode encoding of this script (or even as part of the process of getting the script standardised), then I think you could make use of codepoints for a script with similar features. From your description of the script, the closest match is probably Hebrew.
And Lorna Priest pointed out some important additional information:
John's reply brings up the issue that with OpenType you are not able to assign right-to-left behavior for PUA characters. However, you can be perfectly Unicode compliant (ie using PUA codepoints) and use the Graphite Description Language (http://scripts.sil.org/RenderingGraphite) which allows you to indicate RTL behavior in the PUA area. The main problem with Graphite is that there aren't a lot of applications that can use it. So, it kind of depends on what your purpose is. There is a version of OpenOffice that uses Graphite. There's a TeX publishing system (XeTeX) that use Graphite.
This truly is a limitation in both OpenType and rendering in general on Microsoft-provided text rendering platforms (by which I principally mean Uniscribe, because even though WPF exposes the most features in easily accessible ways like optional ligatures and such, Uniscribe still provides the most support overall, including support for those features WPF exposes).
I find myself torn in situations like this, especially since I have spent so much time assisting with tools with significant components involving "do whatever the hell you want rather than what Microsoft specifically provides itself" like
and so on. Fonts are some of the most customizable items on the platform, in some ways -- but limitations involving the frameworks surrounding fonts are, while perhaps less common or even uncommon edge cases, the hardest limitations to overcome without either violating rules within Unicode, or using someone else's platform and technologies.
I am able to console myself a bit with the fact that these are not only edge cases but also temporary ones since the obvious intent in the bulk of these cases is to work with scripts that will eventually be in Unicode (like Meroitic, currently on the roadmap for the Supplemental Multilingual Plane).
But on the other hand, the fact that there are many ancient scripts on their way in like Carian and Lydian that Microsoft has not necessarily been quick to support but which need right-to-left character processing semantics in some or even most cases. the fact the behavior cannot be overridden/customized either in Win32's GetStringTypeW or in Uniscribe's own internal tables. If this is not changed in the future then whether languages that do not have modern use but which do have special processing needs may not be best served by Microsoft-provided solutions, an issue likely to become more obvious as more and more of the ancient script proposals become ancient script Unicode subranges....
And then I wonder whether (as Microsoft did with Adobe's Type 1 fonts) whether support for other technologies should become part of the platform as well, though obviously doing so in a way that lets components like Office that use Uniscribe get their work done would have some truly significant challenges -- it would probably be easier on the whole to open up for customization the data and shaping engines of Uniscribe and the character data of Win32.
One could say that is the principal limitation of some of the platforms Microsoft provides -- those with requirements that are not part of the strategic investments of the platform find themselves much less ably supported....
This blog brought to you by 𐤎 (U+1090e, PHOENICIAN LETTER SEMK)
I am writing this blog from my own laptop waiting in the ER at the hospital (all of the quotes are from archives of old mails on my machine, not from memory!). It all happened when I was heading back after seeing a show in Ballard last night. By the time you read this I will actually be out again so there is no need to panic. Let's just say that Michael should not push himself too hard (especially when his scooter is in the shop) and leave it at that. I am sure that I am just fine and will probably explain what happened at some point. Crap like this keeps me humble, and anyone who knows me will claim that I can always use more of that....
Some time near the beginning of the month, Peter Edberg of Apple asked:
We have had several bug reports at Apple complaining that in case-insensitive string search, U+00DF "ß" matches "ss". Apparently this is due to the following line in CaseFolding.txt: 00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP SIsn't this an error? In the Unicode collation data, there is a secondary difference between U+00DF "ß" and "ss". The way I read the Unicode Collation Algorithm, case folding should preserve primary and secondary differences, and only eliminate differences at the tertiary level and below.Am I misunderstanding something?
Peter is actually spot on here and as usual missing nothing. In fact, this is an issue I have talked about in relation to Windows before (long-time readers may recall Dere are qvestions? In zat case... and What the %#$* is wrong with German sorting?).
In any case, regular reader here John Cowan pointed out:
> We have had several bug reports at Apple complaining that in case-insensitive string search, U+00DF "ß" matches "ss". It pretty much has to. There is no way to tell (without knowing German) whether any given "SS" in text is an upcased version of "ss" or "ß". Consequently, if you want "SS" and "ss" to match, and likewise "SS" and "ß", then "ss" and "ß" will naturally match too. You could special-case this in code at the expense of making all case-folding slower.> Isn't this an error? In the Unicode collation data, there is a secondary difference between U+00DF "ß" and "ss". That's because "ss" is also used as a fallback when "ß" is not available. It is in effect both a secondary and a tertiary difference.In Unicode 5.1 there will be a capital sharp S, but that is never used in running text, only in all-caps display text, and not always then. So it doesn't solve the problem, which is simply an inescapable quirk of German orthography. (The real answer is not to upcase German, but that battle is long since lost.)
Microsoft specifically avoids that weirdness of the mixed secondary/tertiary difference, in part because our architecture kind of requires it, though I suspect it would have happened anyway....
Anyway, Peter responded to John:
John,Thanks, that makes sense. Ken Whistler (here at the UTC meeting) also just clarified this. He also indicated that a particular implementation of case-insensitive string search could choose a different approach to matching of U+00DF "ß" and "ss" without being non-conformant with Unicode (It would just not be following CaseFolding.txt).
And also Markus Scherer also added some additional interesting words to the mix:
A couple of years ago I wrote an email to DIN proposing to change DIN 5007 so that ß and ss are a tertiary difference, to make them consistent with mostly being case-different. However, their response was that they see ß more as a ligature, and ligatures sort as secondary differences in DIN 5007. The default UCA table follows DIN 5007 with respect to this as a secondary difference.For Unicode case folding, there is really no choice: ß needs to uppercase to SS (at least for most users) which lowercases to ss. Therefore, ß and ss are in the same equivalence class, and that's how case folding is constructed.In addition, Germans don't always understand when to use ß vs. ss, and in Switzerland ss is always used instead of ß, so it makes sense for somewhat-lenient string comparisons to equate them.In my opinion, treating ß and ss as a case difference is the best behavior for this somewhat messy situation. (I did grow up in Germany,up to and including college.)
And there you have it!
Of course the view is likely going to get rockier soon, with Unicode 5.1 and the new CAPITAL SHARP S. The Sharp S and many of the issues surrounding it of course represents an issue that I have been blathering about for some time, considering all of the following prior posts, just for starters....
That last one has in it my recommended changes for what I think Microsoft (and Windows) ought to do for both case and collation in the next version of Windows, which will be released at some unknown date after U+1e9e (CAPITAL SHARP S) is out there in the upcoming Unicode 5.1, which are:
For Microsoft, it raises some interesting questions for both collation and case for the next version of Windows. I mean, think about the issues I have already talked about in posts like What the %#$* is wrong with German sorting? where we make ss equal to ß so that the uppercase version "SS" will sort near the ß in a sort ignoring case -- where we do things that make less linguistic sense in order to give regular results that are intuitive. So who would expect that if U+00df is equal to ss that U+1e9e wouldn't be made equal to SS? Meaning that in the collation tables, U+00df and U+1e9e would simply be case variants, with no real choice in the matter. And as to casing.... Now just because we make the relationship in casing does not mean we make it in collation. After all, as I have pointed out several times before, collation != case. But on the other hand, the case table is used in order to enforce the case insensitivity in the NT object namespace and the file system. And one clear issue is that there is no good reason to allow one to put filenames differing only by the presence of U+00df and U+1e9e in the same directory. Users would either never try it or they would never expect it to work. So it is quite possible that in the next version of Windows (which only does simple casing) it may make the most sense to make the two characters case variants of each other -- to enforce reasonable use of both letters!
For Microsoft, it raises some interesting questions for both collation and case for the next version of Windows.
I mean, think about the issues I have already talked about in posts like What the %#$* is wrong with German sorting? where we make ss equal to ß so that the uppercase version "SS" will sort near the ß in a sort ignoring case -- where we do things that make less linguistic sense in order to give regular results that are intuitive.
So who would expect that if U+00df is equal to ss that U+1e9e wouldn't be made equal to SS? Meaning that in the collation tables, U+00df and U+1e9e would simply be case variants, with no real choice in the matter.
And as to casing....
Now just because we make the relationship in casing does not mean we make it in collation. After all, as I have pointed out several times before, collation != case.
But on the other hand, the case table is used in order to enforce the case insensitivity in the NT object namespace and the file system. And one clear issue is that there is no good reason to allow one to put filenames differing only by the presence of U+00df and U+1e9e in the same directory. Users would either never try it or they would never expect it to work. So it is quite possible that in the next version of Windows (which only does simple casing) it may make the most sense to make the two characters case variants of each other -- to enforce reasonable use of both letters!
The collation change is kind of obvious -- what else could it be, ever?
The casing change is a bit more controversial, though, since it does not technically match Unicode.
Though since the simple casing requirements of Windows where the length can never change keep SS from ever being an option there, and in a case insensitive file system the notion of putting the lower and uppercase variants of the character in the same directory just feels like the wrong answer. Having these four entities:
all be the same in collation (when ignoring case distinctions) and having both the first two paired to each other and the second two paired to to each other in the casing tables just makes sense -- anything else will lead to unintuitive results in normal situations -- and those variations would amount to genuine bugs from a user and a linguistic standpoint....
Now of course I am the developer owner of neither case nor collation at this point, which means that having it make sense to me is not necessarily the principal criteria to having the idea championed and eventually seeing the behavior updated in either case or collation.
But I do still chat with the various owners in development, program management, and test from time to time.
And some of them even read this blog now and again....
So they will at least have the opportunity to have my opinion on the matter. :-)
This blog brought to you by the ever-popular ẞ (U+1e9e, aka LATIN CAPITAL LETTER SHARP S)
In the early days of my ownership of the collation functionality, I did have a bit of an inferiority complex about the more linguistic aspects to the work.
So I would talk about my delusions of linguistic aptitude and how I was the architect of all the collations that required no linguistic knowledge (being algorithmically derived), while looking on the process by which the reverse engineering of dictionaries in fact took place and be amazed at the results. And to be frank amazed about the people who were doing the work, who held some kind of secret knowledge that I didn't have and which (unlike most technologies and functionalities I run across in my work) I probably never would (blogs like Some sort of order to collation and Collation can actually be linguistic aside, of course -- they id tend to prove my fascination more than my abilities -- that I had become a sortophile of sorts).
And of course as some of the people who had been doing this work for years were moving onto new challenges.
I got to see firsthand that I was not the only one who didn't fully understand how to do it; we even had a false start or two as some people tried to do the work but were simply unable to handle that linguistic side that I (recognizing that I couldn't -- or didn't think I could -- do) kept myself from trying to do.
Eventually others came in to fill those shoes, people who still had those seemingly intrinsic abilities, even as they learned the new abilities to bring the feature to Windows with new languages.
And in those early days that I usually think of as B.R. (Before Ryan) when Ryan was not yet the collation tester, we were at the time kind of between testers for the area. So the automation that was there kept running but we did not have someone looking at new things with their tester eyes just yet....
Somewhere in all of that, between the outgoing and the incoming who didn't quite get it and the incoming who did get it but was still learning how to use it and not having a great tester in place and the one who knew he didn't get it but was the one charged with checking in the final results, a bug or two could (and did) slip in, as I am sure you might be able to imagine....
In particular, the collation for Lao is kind of broken for both vowels and tone marks.
And as anyone who knows Lao can tell you, sorting Lao without the expected results for vowels and tone marks is a bit like trying to wash your foot with your shoes and socks on -- you may get things wet but it won't be very effective and later on quite uncomfortable!
Since I am the one who checked it in I actually take full responsibility, though as I have come to expect from working with competent people and bugs like this the program manager who used to work on the data but never saw the data in this case, the program manager who created the data while learning how the data was supposed to be created, and even the tester who was not even working in the area at the time could just as easily try to make the same claim and to be honest often do....
I get to win since it is my name in the checkin log, a dubious pleasure at best.
Kind of puts blogs like Not so Lao[d], at least not until Vista in perspective, though....
The various vowels and tone marks I mean are:
The bug is an interesting one since the results are much worse when comparing sort keys via LCMapString calls than via CompareString calls, with the former producing embedded 0x00 byte values in the middle of the sort key while the latter simply produces results that are a bit off (most noticeable in huge sorted lists).
Now since those early days everyone involved has progressed:
but the legacy of the not quite correct vowels and tones in Lao call out to me quite LAOdly....
This blog sponsored by all of the above cited characters, not only in gratitude for the chance to be properly noticed in this new world but also in the hope of future adjustments
Disclaimer -- most of the examples given here are fictional, and most of the ones that aren't are not schedule bug fixes for any future version.
I post a lot of my opinions here, as regular readers will readily admit.
A lot of my opinions are based on functionality that is less than ideal or missing or wrong or broken or strange in Microsoft software.
Yesterday I was asked by reader Bill whether the blogs themselves were a factor of the process in triaging which features would go in and which bugs would get fixed.
Now obviously in a future version some of those things might be fixed in a future version of Windows. Not specifically because I am blogging about them but because new versions fix bugs and add features.
Right?
Of course the triage process is people going through the bug list or going through the potential feature request list and choosing what gets in and what doesn't is a different story.
And I am very pleased that (as far as I know) the process is not influenced by the cleverness of the blog post or amount of detail in it or whatever. Because if they did then I'd be really really unhappy for Windows! :-)
I mean it is all well and good to posit an EqualString function or an internationalized StrCmpLogicalW or Cantonese input method or new locales in Windows or fixes to bugs. But to find out that my random blathering helped decide to put a bug ahead of some feature or some feature ahead of some locale would frankly scare me a bit.
So my answer to Bill is "damn, I hope not!"
Because the whole process that decides that
and so on? All of that just gives me hives. Someone has to do it, sure. But the round-robin of meetings, emails, Product Studio poker, cut this, QFE that, and all of the rest, I just don't want to deal with it until and unless that becomes my job.
I want to make it clear that I do not think this is a waste of time, and I have a lot of respect for the people who do this job well (since in my experience not everyone does!). It just can't be me doing it, that's all.
For now, it is not on the list of things I do, so I am happy to just not weigh in with my two cents.
If they need help knowing whether nw comes before ng in Quenya then they can come by and ask -- I am still very happy to help answer questions, that is just my nature! :-)
This blog brought to you by a (U+0061, aka LATIN SMALL LETTER A)
Let me start by saying that I think the subsidiary model for software companies like Microsoft is a very powerful one.
It is a way to make sure that a group of people who are most interested in the success of a market are on the ground in that market and in a portion to understand what people there want and then asking for it of the core team, with appropriate information about the relative importance of the issues involved and (most importantly) a serious triaging effort on the relevant importance of different requests within that market.
Further to that end, it is people in the subsidiaries who become the most natural "owners" of features that are specific to their market -- from Czech support being owned by the Czech subsidiary to IME support being owned by the various East Asian subsidiaries and so on. Who else would be the best people to provide information and resources and such for locale data, keyboard/input method information, sorting details, market requirements, and other features? It really does make sense and some very powerful and compelling products are produced this way that work and work well in those markets.
And since their success is quite literally tied in very real ways (i.e. financial ways) to the success of the company within their market, they are working within their own best interests to do their job well. In almost every case they are very interested and passionate about their market and their language anyway, but having the financial incentives are obviously never going to make people work less hard. :-)
But (and you knew that was oming, right?) there are some drawbacks to the subsidiary model, some gaps that it is not as effective at filling.
I was thinking about this the other day when (via the Contact link), Marti asked:
Hi Michael,I'm not sure if I'm asking to the right person, but I haven't found anyone more specialized than you about this subject all over the net. The question is not strictly programmatic but the solution may involve some step in that direction.I use Windows XP Pro with a Spanish keyboard, which looks exactly like:http://www.rockwood.k12.mo.us/Lafayette/Reed/images/spanish-keyboard_l.jpgI also type in Japanese, and for that purpose I installed Microsoft IME Standard 2002 ver. 8.1. The problem I have is that when I switch to Japanese, then all the punctuation and arithmetic symbols are in a different location, just like a Japanese or US keyboard layout. I don't want to get used to this layout because I don't plan to use a real japanese keyboard. What I want is to use IME with a Spanish keyboard layout.What I've done so far:a) I've tried to install the Spanish layout under Japanese input language, but this way I cannot use IME and Spanish layout at the same time. Pointless.b) Dirty job: I've used MSKLC to create a Spanish keyboard layout and I assigned it Japanese language under Properties. I installed it and then I changed the following registry keys:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layouts\00000411\ "Layout File"= (name of my DLL)HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layouts\E0010411\ "Layout File"= (name of my DLL)Now, when I switch to IME I have the Spanish layout enabled, but the dead keys (`,´, ^, ¨) are still not working, they just output the symbol twice (``,´´,¨¨,^^). Is it possible to have it all or is not possible to have accents in Direct Input mode? Is there a cleaner way to do this instead of that Registry hack?On the other hand, I would like to translate the Right Control key to some specific key only present in Japanese keyboards. Which is the cleanest way to remap keyboard scan codes? I've read about the "Scancode Map" registry value, and it seems very attractive but again it looks like a hack (with pros and cons). Can the same thing be achieved via the keyboard layout DLL?Thanks for your attention,Martí
Now IMEs in Windows have never worked particularly hard to support the scenario of inputting an East Asian language using the keyboard of another language beyond the one they picked -- in fact some would argue that it has become harder and harder each version to see such support happen -- think about the extreme measures you used to have to go through (as described here) which then were later broken anyway (as described here). And notice the extreme efforts Marti is going to. Now none of it is supported (or indeed supportable), so it is not surprising that there are problems trying to enable it.
I'll start by telling Marti that there is no real solution to the problem -- this is not their scenario so no solution was envisioned or provided. What used to work only worked by accident, not by design. and further, MSKLC will not work for mapping the right control key to other functionality, and affecting ScanCode map is really the only way to support that sort of thing. Dead keys with the IME are a huge challenge given what the IME is doing to the input -- you really have to switch the input language to be not hooked up to the IME, to get that done.
But let's move beyond the Japanese case for a moment, and not only because I am not being of very much help there. :-)
Instead let's think about reversible transliteration in Korean (attainable through solutions like KORDA) that I mentioned in Is it Hangul? or Hangeul? or Han'gŭl? or what?, and the fact that Microsoft does nothing of the kind in its products today. But within Korea the thought of nit just using the native language? Why would someone even bother, right?
Closer to the point, consider the Yi "IME" in Vista, which by every report I have heard works great for people who know Chinese but not as well for native speakers of the language (not entirely surprising since the data that shapes the IME was done by the same team that does the IME work, as did a non-trivial amount of the testing (which led to more than one misunderstanding when language issues like like this one came up -- causing problems in both the original IME work and the later testing of it and of the font!). Of course the principal push for Yi support comes from the government in China as a part of GB18030 compliance, and the support in Vista is more than good enough from that point of view.
The strong feelings of native language speakers within India versus experts outside of it is an issues I have talked about a great deal in the past and plan to talk about more in the future, and it again underscores the same kind of issue. To which I'll add that providing only the INSCRIPT keyboards and ISCII code pages as provided by the Government of India appears to show a certain specific degree of disinterest in needs of the many languages in country, if not outright disdain -- like someone else's agenda being pushed, however indirectly.
And there are countless other examples, really.
In every case, the problem is that the nominal "owner" of the feature, its future direction, and its ultimate destiny is in the hands of a group of people who only are allowed to directly benefit for in-market successes, and any feature that would work well for a language but not for its main market is not one that the subsidiary has within their charter or mandate to support, and often outside of their abilities to do the research on to determine other solutions to the problem which may well work better outside a market.
Now obviously this is a sensible decision when it comes to choosing where you will get the most people supported, and the out-of-country solutions would almost certainly be unacceptable in-country.
And this means that changing the current model would incredibly stupid and would not serve the majority of customers or potential customers.
But at the same time, there is a clear need to supplement the current model -- to make it a reasonable and sensible idea for the subsidiary contacts to directly benefit from the out-of-market features that support their language, as well as making available to them the right materials and research to make the job easier and more possible.
So that it makes sense for the Korean subsidiary to be directly concerned with the use of Korean outside of Korea, for the Japanese subsidiary to aware of the problems of supporting their IME when non-Japanese keyboards are used and interested in getting solutions built into the IME, and so on.
And then of course for the separate issues in India and China a bigger push to go beyond the central government and into the needs of specific languages in country, beyond what the country looks for (which in many cases is npt enough). Some work happens here, but really not nearly enough is ending up in product.
Sorry Marti about taking your question and focusing on a much larger problem that, while being the source of the problem with the Japanese IME you reported, quickly moved beyond it an into whole different areas. But there really aren't good answers for the original question, and I figured the time seemed ripe to be on a soapbox about the bigger issues that put us there....
This post brought to you by ^ (U+005e, aka CIRCUMFLEX ACCENT)