<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx</link><description>Warning to readers : this post is completely and totally my own opinions based on my efforts to assist with Tamil's representation in Unicode, and truly have nothing to do with Microsoft's opinions on the matter (whatever they are). If you quote anything</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#733682</link><pubDate>Thu, 31 Aug 2006 19:22:36 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:733682</guid><dc:creator>Mike Dimmick</dc:creator><description>You'd have to argue that this proposal is an artifact of the general poor support for complex scripts. If, say, French were encoded using only base letters and composing diacritics I suspect support would be greater, but of course for legacy reasons that isn't the case.&lt;br&gt;&lt;br&gt;In some ways it appears - and certainly could appear to a native of Tamil Nadu - that Unicode's solution is idealistic, and this proposal reflects the reality that few developers are going to the trouble of making their software work correctly for south asian scripts. This may be because they're unaware of the issues, or even if they are, simply not interested in the market. The market may be large (Tamil Nadu has a larger population than the UK, where I am) but is not particularly wealthy (GDP of $56bn compared to the UK's $1,833bn, making GDP per capita of $901).&lt;br&gt;&lt;br&gt;However, we have to invoke Raymond's 'what if everyone did this' rule - if all distinct glyphs in all scripts are encoded with no combining characters, then 16 bits is not enough. The BMP is probably not enough (correct my terminology - I mean UTF-16 with the high and low surrogates, which is enough to encode up to U+10FFFF IIRC).</description></item><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#733959</link><pubDate>Thu, 31 Aug 2006 21:56:22 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:733959</guid><dc:creator>Mihai</dc:creator><description>&amp;lt;&amp;lt;this proposal is an artifact of the general poor support for complex scripts&amp;gt;&amp;gt;&lt;br&gt;I think in most cases the lack of support is just &amp;quot;because ISVs don’t really care”&lt;br&gt;If a script is “by nature” complex, then a new encoding standard only shifts the problem from the text layout engine to the keyboard (for instance). And this will mess up other things, like searching.&lt;br&gt;It is a bit like the addition of the fi ligature for Latin (U+FB01). If I type ‘f’ then ‘i’ some applications will show it as a ligature using the text shaping engine (Notepad), others will not (Word). But there is no keyboard producing U+FB01, and no application handles properly searching (search for ‘f’ and find “half of U+FB01”), or case conversion (Uppercase(U+FB01) = &amp;lt;U+0046 U+0049&amp;gt;).&lt;br&gt;So, is TUNE going to solve anything in the area of ISV support? I bet not!&lt;br&gt;Is current Unicode support for Tamil going to help? Maybe. The OSes are committed to Unicode, there are a lot of complex scripts already supported, and more and more are added every day. If an application does not properly support Hindi today (for instance), is not because Hindi is complex, but because the application does not use the system API properly.&lt;br&gt;Just look back to DOS, or Win 3.x, or Win 9x. Each one has its own limitations. Long ago very few ISVs supported anything but Latin 1. Then some added support for other single byte encodings, but no DBCS. Now is common and easy to do DBCS, and there are dents done in the complex script support. It takes time? Yes. But is it easier to convince an American ISV to support Unicode, or some proprietary obscure Tamil encoding?&lt;br&gt;Call me in 30 years if I am wrong. I hope this is enough time :-)&lt;br&gt;</description></item><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#735686</link><pubDate>Fri, 01 Sep 2006 21:43:05 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:735686</guid><dc:creator>Shriram</dc:creator><description>First of all, this meeting is called for the purpose of a meeting of minds. A forum sanctioned by the highest levels of the Government to find a common ground. &lt;br&gt;&lt;br&gt;The author is definitely wrong in saying that dissenting voices are ignored. I have been in meetings where TUNE has evoked a lot of emotions. Not all of them in support. &amp;nbsp;All these have been heard and taken into account by the committee. I expect todays session to be stormy too and it should. &amp;nbsp;&lt;br&gt;&lt;br&gt;The TUNE effort is led by academics with well distinguished track records and have maintained the highest levels of transparency.&lt;br&gt;&lt;br&gt;Why not listen to the deliberations before rushing to judgements? &lt;br&gt;&lt;br&gt;I am sure that there is potential for the opponents and supporters of this proposal to sit and discuss the way ahead in the future provided we leave our egos and bias at the door. &lt;br&gt;&lt;br&gt;Also reg the market for tamil, please take into account the Tamil diaspora which is well spread over the world.&lt;br&gt;&lt;br&gt;Lastly if ever there is a time to go for the TUNE/Mobile keypad standardization route it is now. With a favorable and committed people in key ministries in state and centre the timing is perfect. It just needs a concerted effort&lt;br&gt;&lt;br&gt;</description></item><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#735825</link><pubDate>Fri, 01 Sep 2006 23:47:49 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:735825</guid><dc:creator>Michael S. Kaplan</dc:creator><description>Hi Shriram, &lt;br&gt;&lt;br&gt;Since Unicode's own opinions on the matter are being ignored in the conversation, as are it's policies and procedures -- and that these facts, even though communicated directly, have not stopped the TUNE momentum -- I do not have to wait to see what they decide to know that they have no interests in the facts.&lt;br&gt;&lt;br&gt;I do take into account Tamils around the world -- they are the ones who have rejected TAB/TAM and who have also rejected TUNE. As such, they are in a better postion to ensure the future of Tamil....</description></item><item><title>At the TONE, it will not be TUNE, but TANE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#741032</link><pubDate>Tue, 05 Sep 2006 15:30:14 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:741032</guid><dc:creator>Sorting It All Out</dc:creator><description>&lt;br&gt;I figure that since I initially posted about TUNE in And if your language starts playing a different...</description></item><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#744115</link><pubDate>Thu, 07 Sep 2006 12:43:44 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:744115</guid><dc:creator>Baskaran</dc:creator><description>Hi Shriram,&lt;br&gt;&lt;br&gt;Tinkering with a standard (Unicode) is never a good idea. Remember, we (read Indian languages) never had anything called standard either for encoding or keyboard except for something that were in very limited use, such as TAB/TAM, ISCII etc.&lt;br&gt;&lt;br&gt;One should not be trying to RE-define a new standard, when something has been already accepted worldwide. We are already suffering with lack of standards and the TUNE is just going to make the situation worse.&lt;br&gt;&lt;br&gt;It is understandable that the code point order is not the same as natural order, but what people don't understand is that the collation order is independent of code chart order. This is because, people take these issues emotionally, while these should be approached with technical points.&lt;br&gt;&lt;br&gt;Many of the people supporting TUNE are independent software developors and have several tools including for word-processing and fonts. Developing a font for Tamil Unicode block is difficult as the rendering engine needs to be intelligent enough to adjust the glyph positionings. Thus, (I believe that) these people think TUNE as a best alternative, as it eliminates the need for any intelligence making it easier for them to develop fonts.</description></item><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#768816</link><pubDate>Sun, 24 Sep 2006 08:46:58 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:768816</guid><dc:creator>CAPital</dc:creator><description>When you say:&lt;br&gt;&lt;br&gt;///This post brought to you by க் (U+0b95 U+0bcd, a.k.a. TAMIL LETTER KA + TAMIL SIGN VIRAMA, a.k.a. TAMIL KA puLLi, a.k.a. TAMIL LETTER K)///&lt;br&gt;&lt;br&gt;I right away notice that there is NO Tamil sign called VIRAMA!&lt;br&gt;&lt;br&gt;In fact Tamil language doesn't add the dot to form the 'ka' sound. &amp;nbsp;the form you have presented is the original form. &amp;nbsp;the letter without the dot is NOT the basic letter. &amp;nbsp;So you can see Unicode is already broken its rules of ONLY encoding the basic characters!&lt;br&gt;&lt;br&gt;______&lt;br&gt;CAPital&lt;br&gt;&lt;br&gt;</description></item><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#768819</link><pubDate>Sun, 24 Sep 2006 08:55:07 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:768819</guid><dc:creator>Michael S. Kaplan</dc:creator><description>Actually, no -- that is not how abugidas work. But see &lt;A href="http://blogs.msdn.com/michkap/archive/2006/09/16/758310.aspx"&gt;this post&lt;/A&gt; and &lt;A href="http://blogs.msdn.com/michkap/archive/2006/09/20/763705.aspx"&gt;this one&lt;/A&gt; for more info on the approach that was taken in the encoding, and why it is okay for you to not agree with the approach and still be able to work in Unicode....</description></item><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#768824</link><pubDate>Sun, 24 Sep 2006 09:01:01 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:768824</guid><dc:creator>Michael S. Kaplan</dc:creator><description>See also &lt;A href="http://blogs.msdn.com/michkap/archive/2005/06/13/428755.aspx"&gt;this post&lt;/A&gt; from over a year ago, which makes some additional related points, ones that I wish those who want to set Tamil implementations back by years would pay more attention to....</description></item><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#768837</link><pubDate>Sun, 24 Sep 2006 09:13:58 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:768837</guid><dc:creator>CAPital</dc:creator><description>I am not trying to say that only the ka + dot should have been encoded and not the other. &amp;nbsp;Then it would be almost impossible to display the other.&lt;br&gt;&lt;br&gt;All I'm saying is Unicode already broke its rules. &amp;nbsp;So to other European and East Asian langues.&lt;br&gt;&lt;br&gt;IF it did NOT break its rules for ANY language, then I wouldn't even say a word.&lt;br&gt;&lt;br&gt;So it looks like whoever had the power, broke the rules according to their &amp;quot;standardization&amp;quot; policies.</description></item><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#768846</link><pubDate>Sun, 24 Sep 2006 09:21:36 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:768846</guid><dc:creator>Michael S. Kaplan</dc:creator><description>Actually, it did not. It encoded an abugida. &lt;br&gt;&lt;br&gt;The rules that you envisage are not actually rules of Unicode, which may be the problem here? You are expecting promises to be kept that were never made. :-(</description></item><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#769295</link><pubDate>Sun, 24 Sep 2006 17:15:07 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:769295</guid><dc:creator>CAPital Z</dc:creator><description>In the sets of Latin, there are encoded characters as &amp;#224;, &amp;#225;, &amp;#226;, &amp;#227;, &amp;#229; [for the European languages]. &amp;nbsp;It is encoded as a single character. &amp;nbsp;Meanwhile, the same glyphs are present seperately, like a, ̀, ́, ˆ, ˜, ˚. &lt;br&gt;&lt;br&gt;So only for Tamil like scripts [South Asian languages] similar encoding is denied.&lt;br&gt;&lt;br&gt;East Asian languages did encoded their all characters. &amp;nbsp;And you know the famous Chinese governments stand about all chinese character encoding.&lt;br&gt;&lt;br&gt;Most recently, the new language to be included in Unicode, Balinese break the rules of Unicode. &amp;nbsp;As you said, it did not only encode the &amp;quot;basic abugida&amp;quot; but did others as well. &amp;nbsp;You may have already seen it in the new Unicode Charts.&lt;br&gt;&lt;br&gt;Even in Tamil, your point is right that basic abugida is enough to display the most characters. &amp;nbsp;But the &amp;quot;ku கு, kuu கூ&amp;quot; &amp;nbsp;[and similar for all other characters] are almost totally different than the basic abugidas.&lt;br&gt;&lt;br&gt;I'm not saying TUNE is the best thing, but Tamil do lack the efficiency of what other similar anguages have in Unicode.&lt;br&gt;&lt;br&gt;&lt;br&gt;______&lt;br&gt;CAPital&lt;br&gt;</description></item><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#769305</link><pubDate>Sun, 24 Sep 2006 17:36:51 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:769305</guid><dc:creator>Michael S. Kaplan</dc:creator><description>CAPital, this is hardly breaking the rules of Unicode. &lt;BR&gt;&lt;BR&gt;In some cases, there were legacy standards that pre-dated Unicode which had to be represented. And other cases where you see rule breaking, the actual proposals give the justifications (as do the block descriptions in the Unicode book, in many cases). &lt;BR&gt;&lt;BR&gt;Did you look at the links I put in? They explain many of the reasons why strategies like TUNE are simply too late and do involve a re-encoding of a script already encoded, and would set bzck Tamil computing by 5-10 years or more. &lt;BR&gt;&lt;BR&gt;Even just looking at your blog (and the many other Tamil blogs out there), it would invalidate all of this existing data!</description></item><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#774715</link><pubDate>Thu, 28 Sep 2006 04:32:40 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:774715</guid><dc:creator>CAPital Z</dc:creator><description>East Asian Language encoding dilema and Balinese are not pre-dated problems.&lt;br&gt;&lt;br&gt;Anyhow, yes whatever written in current unicode will be unreadable. &amp;nbsp;But that's the evolution right?. &amp;nbsp;Latin encoding had so many perfections &amp;nbsp;during the course of Computer. &amp;nbsp;Tamil is just a baby in computing. &amp;nbsp;So now the Tamil doesn't get the chance to improve just because, what is already there has to be THE ONE.&lt;br&gt;&lt;br&gt;So Tamil can never improve itself in the future. &amp;nbsp;Because Unicode Consortium will never accept any modification to what is already there!</description></item><item><title>re: And if your language starts playing a different TUNE</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#774764</link><pubDate>Thu, 28 Sep 2006 05:24:25 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:774764</guid><dc:creator>Michael S. Kaplan</dc:creator><description>Tamil improving itself has nothing to do with its encoding, because language is not just encoding. &lt;br&gt;&lt;br&gt;But Unicode is on record as rejecting these schemes, so eventually the illogic of waste is the factor....</description></item><item><title>On Thokks who don't give a Frigg, under the mistletoe</title><link>http://blogs.msdn.com/michkap/archive/2006/08/31/733302.aspx#961824</link><pubDate>Sat, 04 Nov 2006 23:55:44 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:961824</guid><dc:creator>Sorting It All Out</dc:creator><description>&lt;p&gt;I was thinking about Balder yesterday. Balder is a fascinating character in Norse mythology, and a good&lt;/p&gt;
</description></item></channel></rss>