Blog - Title

July, 2010

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    They're going the wrong way.

    • 6 Comments

    Like some of you (many of you?), I read TechCrunch from time to time.

    Not religiously, mind you. Since it usually doesn't cover anything that is relevant to me, and rarely covers anything truly important/interesting to me. this is not a flaw in them, they just don't have me in their target demographic.

    dditionally, I tend to generally skip the posts by MG Siegler.

    I mean I don't hate the man, I can't. I don't even know him.

    But one has naught but to read 10-15 random posts of his to conclude that if iPhones and iPads were personally programmed by Steve Jobs to have their antennas electrocute every member of MG's family and all of MG's friends, that MG would have some kind of pro-Apple spin on the matter. It just doesn't feel to me like anything approaching journalism, which seems (to me) to conflict a bit with TechCrunch's stated purposes.

    Now I am not a member of the I Hate MG Siegler Facebook group or anything like that, but skipping his posts in most cases just seems like better time management.

    But in his hilarious Decoding Microsoft’s Fantastic Passive-Agressive Numbers Post that a colleague pointed me to, there was a fun line I really liked:

    24%
    Linux Server market share in 2005.

    33%
    Predicted Linux Server market share for 2007 (made in 2005).

    21.2%
    Actual Linux Server market share, Q4 2009.

    What he really means: Remember when everyone was saying Linux was going to take over the market? They’re going the wrong way.

    Now the part that caught my eye was not the full quote, it was that last bit.

    The They're going the wrong way bit.

    I thought of it when someone pointed out to me the Plan for multilingual sites (SharePoint Server 2010) article on TechNet, published on May 12, 2010.

    In particular, this note:

    Although Microsoft Office SharePoint Server 2007 supported internationalized domain names (IDNs), SharePoint Server 2010 does not. If you currently use IDNs with Office SharePoint Server 2007 and you plan to upgrade or migrate to SharePoint Server 2010, you must stop using IDNs, delete any IDN settings, and set up a non-IDN environment before you upgrade or migrate to SharePoint Server 2010.

    Um.

    Now it turns out the reason for this is that in SharePoint 2010 they made an explicit move to use WCF (the Windows Communication Foundation).

    And although WCF is

    a part of the .NET Framework that provides a unified programming model for rapidly building service-oriented applications that communicate across the web and the enterprise

    it turns out that perhaps one of the ways it was allowing such a rapid method for doing all that work was by not supporting International Domain Names.

    Oops.

    Amazing how rapid one can be when one sticks to ASCII!

    Now I look forward to some point in the future where I can excitedly point out that WCF is on the right track, and that SharePoint is back on the right track. I really am.

    But for now, unlike MG Siegler, I am not a fanboy of the company at the center of my professional life (Microsoft, in my case; Apple in his). And because of this, I call bugs BUGS, mistakes MISTAKES, and design flaws DESIGN FLAWS. It is how I believe things can be made better (and people who read here can be made aware of problems in the meantime). Whatever credibility I have rests on my honesty, in this regard. And I take that pretty seriously.

    With all that in mind....

    You know how WCF can help one to "...communicate across the web and the enterprise" ?

    Well, make sure to not include a world wide in front of the word web here, at the moment. Because every day from now until they fix this flaw in their design, there will be more and more sites created, builyt, and promulgated that cannot use WCF. Or the latest version of SharePoint.

    Because (unlike most of the rest of Microsoft), They're going the wrong way.

  • Sorting it all Out

    Unicode without UNICODE/_UNICODE?

    • 24 Comments

    A familiar question I got the other day:

    We are considering porting a Win32 application to use Unicode for internal string handling and are trying to decide which encoding to use. We would like to use UTF-8 and wondered whether there is any way to tell the application to use this encoding without having to compile using the _UNICODE compiler switch. What we'd really like is to be able to call an equivalent to the SetConsoleCP function for a full blown Win32 application as opposed to just a console application. We have tried to achieve this by changing the locale setting but see no way to set code page 65001 using the SetThreadLocale API function. Is there any way to do this without compiling with the _UNICODE compiler switch?

    Thanks in advance for you help.

    I think I've been asked this one before!

    The reason Andreas was unable to find a way to do this is that there is no way to do this.

    You cannot have a CP_ACP that is 65001 (UTF-8). Period.

    Having that would make migration to Unicode "easier", for developer "customers".

    But the cost to Microsoft would be scrubbing literally thousands (and when I say thousands I mean the high thousands!) of functions to make sure they behave okay with UTF-8 (a significant percentage of them will not, in fact).

     There would likely be little other time to do feature work in the next release of Windows beyond this one huge feature that most users would neither understand nor care about. That would be a hard sell to management, believe me!

    Now this is not to say it hasn't been prototyped to see what works, etc. Because it has (at least twice, over the years)....

  • Sorting it all Out

    4 out the door, in both 32 & 64 (aka What Irish, Malay, Maltese & Bengali have in common)

    • 0 Comments

    On the heels of the incredible news I relayed yesterday (to very little fanfare) in Reporting on 64 bits of awesomesauce, a whole bunch of Language Interface Packs have been released recently!

    Of the four I am going to talk about today, three (Maltese and Malay and Irish) predate the "Always release 64-bit for Windows 7 LIPs" policy but were in the Thirteen (13) can be a lucky number list so their 64-bit release was already going to happen anyway.

    And the fourth (Bengali - India) benefits from the new policy and I think may be the first official LIP released under it!

    This is pretty exciting stuff, if you ask me! :-)

    Anyway, without further adieu, the languages:

    A little background on Maltese:

    Download page for 32-bit and 64-bit windows 7 LIPs, that can only be installed on a system with English resources can be found here.

    NUMBER OF SPEAKERS:

    ~400,000

    NAME IN THE LANGUAGE ITSELF:

    Malti

    Maltese is spoken in Malta where it has been the official national language (together with English) since 1936. It was also recognized as an official language of the European Union when the country became a member state in 2004.

    Based on and most closely related to Arabic, Maltese also contains a lot of elements from Romance and Germanic languages which reflects the history of the Maltese islands. In 870, Malta was occupied by the Arabs, who brought their language. [Maltese is the only survivor of the Arabic dialects which were widely spoken in Spain and Sicily in the Middle Ages.] When the Normans conquered Malta in 1090 and Christianized it, Maltese started adopting many loan words and even phonetic and phonological features from Southern Italian and Sicilian. After a long rule by the Knights of Saint John of Jerusalem (from 1530 to the end of the 18th century), which prolonged the process of Latinization, the country became part of the British Empire as a crown colony in 1814, opening the language to English influences. Malta was granted independence in 1964.

    Over the centuries Maltese has developed a hybrid vocabulary. An analysis of the etymology of 40,000 Maltese words found that about a third of the words is of Arabic origin (though these are most basic words), about half is of Sicilian and Italian origin, and 6% are derived from English.

    The Maltese grammar is still mainly Semitic, though for nouns of Romance origin there is a Romance pattern for inflection. The Maltese population, being fluent in both Maltese and English, displays code-switching (referred to as Minglish) in certain localities and between certain social groups.

    CLASSIFICATION:

    Maltese, most closely related to Arabic, belongs to the group of Semitic languages (of which for example Hebrew or Amharic are part as well).

    SCRIPT:

    Maltese is the only Semitic language written in the Latin alphabet. It has a few special characters:
    • ċ (pronounced like ch)
    • ġ (pronounced like the English j)
    • ħ (pronounced like an English h, but stronger)
    • ż (pronounced like English z, while z is pronounced like ts)

    Click here for more information on Maltese!

    A little background on Malay:

    Download page for 32-bit and 64-bit windows 7 LIPs, that can only be installed on a system with English resources can be found here.

    Note that the Indonesian LIP has already been released for both architectures, as described in "Donesian"…just east of "Variant" and just north of "Cognito", right?

    NUMBER OF SPEAKERS:

    47 million speakers

    NAME IN THE LANGUAGE ITSELF:

    Bahasa Melayu

    The Malay language is official language in Malaysia and Brunei where it is spoken by about 23 million people.  It is also one of four official languages of Singapore.

    It is a variant of a language diasystem, having its counterpart in the Indonesian language. Malay/Indonesian was a trade language since at least a thousand years on the Malaysian peninsula and the Indonesian islands; the difference between the two languages started to form only in colonial times when today's Malaysia was influenced by English while Indonesian was influenced by Dutch. The differences are still small enough to make both variants mutually intelligible.

    The grammar of this agglutinative language is rather simple: There is no inflection for both nouns and verbs, no articles are used for nouns, only very few words (those borrowed from Sanskrit) have a grammatical gender, the plural mostly gets indicated by using a numeral (often with a classifier) or simple duplication (orang, person, orang-orang, people). There are only two different tenses for verbs: the present tense and a form of future tense.

    While the language might sometimes be referred to as "Malaysian" (Bahasa Malaysia), linguists nowadays prefer "Malay". Bahasa means language, so shortening the name to bahasa does not make sense.

    See also my blog Malay or Malaysian? for more info on the name....

    FUN FACTS:

    • English words of Malay origin include amok, bamboo, compound (from kampong, enclosure), ketchup (originally referring to fish sauce), orangutan (literally meaning forest person).

    • While Indonesian was influenced by Dutch during colonial times, Malay borrowed many words from English. Striking examples for the resulting difference in the vocabulary include akaun (account, Indonesian: rekening), farmasi (pharmacy, Indonesian: apotek) and tiket (ticket, Indonesian karcis, from Dutch kaartje).

    CLASSIFICATION:

    Malay belongs to the Western Malayo-Polynesian languages, along with languages like Javanese, Balinese, which are spoken in Indonesia, Malagasy, spoken in Madagascar, or Tagalog, spoken in the Philippines. The Malayo-Polynesian languages form a subgroup of the Austronesian language family.

    SCRIPT:

    Malay is written in Latin script, which replaced the modified Arabic alphabet (jawi) used before the 20th century. In 1972 Malaysia and Indonesia agreed on a unified spelling for Bahasa Melayu and Bahasa Indonesia.

    Click here for more info on Malay!

    A little background on Bengali - India:

    Download page for 32-bit and 64-bit windows 7 LIPs, that can only be installed on a system with English resources can be found here.

    NUMBER OF SPEAKERS:

    207 million speakers

    NAME IN THE LANGUAGE ITSELF:

    বাংলা (ভারত)

    Bengali is spoken in the region of eastern South Asia known as Bengal, comprising Bangladesh (where it is spoken by about 110 million people) and the Indian state of West Bengal  (where it is spoken by 55 million people). With more than 200 million speakers it is the second most widely spoken language on the Indian subcontinent and among the 5 languages with the most native speakers worldwide. Bengali is official language of Bangladesh, one of India’s official languages and official language of the Indian states of West Bengal and Tripura. 

    The dialect spoken in Kolkata, capital of West Bengal, is considered standard for Bengali as it is spoken in India. The dialect spoken in Bangladesh is different.

    FUN FACT:

    • The first Asian author to ever be awarded the Nobel Prize in Literature was Bengali poet and writer Rabindranath Tagore (1861-1941). He won the Nobel Prize in 1913. He wrote the national anthems of both India and Bangladesh.

    CLASSIFICATION:

    Bengali belongs to the Eastern Indo-Aryan languages which are part of the Indo-European language family. Together with its closest relatives Assemese and Oriya, Bengali is the most eastern of this large language family.

    SCRIPT:

    Bengali is written in the alphasyllabary called Bangla or Kutila-lipi which highly resembles the Devanagari script used for Sanskrit, Hindi or Nepali. The script consists of 12 vowel characters and 52 consonant characters. Like in all alphasyllabaries, or abugidas, characters for consonants have embedded vowels (or an extra diacritic showing that there is no vowel).

    Click here for more info on Bengali!

    A little background on Irish:

    Download page for 32-bit and 64-bit windows 7 LIPs, that can only be installed on a system with English resources can be found here.

    NUMBER OF SPEAKERS

    ~400,000

    NAME IN THE LANGUAGE ITSELF: 

    Gaeilge

    Irish is the first official language of the Republic of Ireland (with English being a second one). On 13 June 2005, Irish was made an official working language of the European Union, starting on 1 January 2007. The number of Irish speakers steadily declined after England conquered Ireland in the 16th century, and even after the Republic of Ireland was founded in 1922 this trend continued. Today about a 260,0000 people can speak Irish as a second language; on a daily basis it is used by about 30,000 people, chiefly in the western and southwestern parts of the country. The use of Irish is strongly supported and encouraged by the government. The language is taught in all schools and in 1998 the first all-Irish-speaking TV station, Teilifis na Gaeilge (TnaG), was established in the Ireland. In Northern Ireland there are another 100,000 speakers, in the United States about 25,000 (mainly concentrated in the states of Massachusetts and New York).

    The most unfamiliar features of Irish are the orthography, the mutation of initial consonants and the Verb - Subject - Object word order. Irish has a long and rich written tradition. The oldest written examples can be found in inscriptions dating from the 5th to 8th century.

    Click here for more info on Irish

    Now two of these LIPs (Maltese and Irish) both have something else interesting in common, something that I'll talk about more another day....

    Enjoy!

  • Sorting it all Out

    Reporting on 64 bits of awesomesauce

    • 10 Comments

    There have been a few updates on the Language Interface Pack front that I want to communicate out to all of you....

    First and foremost, do you remember back in March, in my blog Thirteen (13) can be a lucky number?

    That was the blog where I explained that there was a magic list of 13 languages that would see 64-bit LIPs created.

    Well, due to the patient and persistence of some of the amazing folks just down the hall from me, and in part due to some of the feedback by regular readers here like Pavanaja U B that my colleagues were able to verify and use...

    ...All future Language Interface Packs will have both their 32-bit and 64-bit packages released!

    Note that this only applies to the download center; I have had it communicated to me that our OEM partners (i.e. the Dells and HPs and all of the others) specifically do not want these 64-bit packages because they do not get enough requests to make it worth creating all of the additional images (apparently they most often ship the 32-bit versions of Windows even on 64-bit hardware, still).

    This is something that may be a future cause to take up if there is enough interest that people have in being able to have full packages that support their preferred languages when they buy a new computer!

    Now all of that (well, the first part of that) is the good news. :-)

    The bad news is that the LIPs that have been released as 32-bit versions only will have to wait for a bit before the media verification can happen and their 64-bit counterparts can be released. They will be coming, but it will be a bit. If there were some other way it would happen sooner, but all of these work items are on a very tight schedule and the new work items can only be added when time can be found.

    But I can report that everyone does want to see it happen and that it will eventually happen!

    I promise that I will keep interested readers posted about this as more details become available. :-)


    Don't forget that later today I'll be doing that all-encompassing "Bidi support on Windows" presentation for MS internal type folk!

    If you are a full time Microsoft employee in Redmond who is interested, feel free to drop in....

  • Sorting it all Out

    An all-encompassing "Bidi support on Windows" presentation for MS internal type folk

    • 5 Comments

    The title says it all.

    Because Windows supports Hebrew, Arabic, Persian, Urdu, Pashto, Dari, Uyghur, Syriac, and lots of other languages/locales that support bidirectional text, and all of them have to work well!

    And so, to that end...

    ...on Wednesday, July 28th, 2010 at 10:00am in 86/2831, I'll be doing a huge presentation on support bidirectional script languages on Windows.

    So if you

    • work at Microsoft,
    • are a developer or a tester, and
    • have anything that runs on Windows, no matter whether it is Win32, MFC, ATL/WTL. WinForms, MMC console, Powershell, WPF, or anything else

    this training should be of real interest to you!

    The presentation will cover UI mirroring, digit substitution, date and calendar formatting, text rendering, and all of kinds of additional issues when you combine these various scenarios.

    If you are not MS internal, don't fret too much; lots of it can eventually end up here in the blog as well.

    But this is a case where being inside the tent has its advantages, because of all of the things I can say and punches I don't have to pull (if you think I am outrageous here you should see me without those "NDA" filters!).

    If you are internal but cannot attend for geographical or scheduling reasons, the slides will be available and the presentation itself will be recorded....

  • Sorting it all Out

    You can violate the rules of decorum, just not the law of gravity

    • 0 Comments

    This blog does not mean I am taking up the cause of Bengali in Unicode in the same way I did with Tamil in Unicode a decade ago. I lack the passionate/interested contacts to sustain such a thing, and to be honest doubt they are out there (if they are, they are not nearly as vocal on the Internet!). I am simply making a point here....

    So two of the things I mentioned at the World Classical Tamil Conference (well, actually at the co-located Tamil Internet 2010) were:

    • that in August of 2001 Dr. Ponnovaiko told me that Unicode would listen to them eventually, They would come to Tamil Nadu and listen to what Tamils needed (which Mark Davis and I did, in January 2008), and
    • that at this point there is a bit of conversation happening across the subcontinent among native speakers whose mother tongues are not Tamil wondering what they can do.

    Now it is not as completely simple as e.g. taking that chart I put up in the bottom half of Learn Tamil in 30 Days (or something like that) that ended in Wikipedia thanks to Scott's efforts and in Unicode 5.1 thanks to Mark and I... taking that chart, substituting the word Tamil with the word Bengali and redoing the characters to magically get a new chart.

    Because there are differences between the two blocks that do not make them completely match. This is definitely the case.

    And some of the named sequences might be different. This might be the case.

    And even if you had this big chart in place and you have all 32 consonants in the block and all 11 dependent vowels in the block in a big grid with nearly 400 cells altogether, there is still a piece missing.

    There is the fact that there are thousands of conjunct consonants that take up to our consonants that are together with no vowels (I mean this in the language sense; in the Unicode sense "four consonants with no vowel" would be "four consonants with the first three followed by a Virama, or to be more accurate for Bengali a Hasant") and a chart of nearly 400 that ignores the other thousands is not all that complete of a chart.

    At which point someone has some explaining to do about how they would expect all of this to be described.

    You can look in that Wikipedia article on conjuncts and see lots of rules but they also have lots of exceptions. It is unclear (to me at least) whether there are algorithms that could be used to build the conjuncts but since some appear to have no connection in shape to the original consonants it is probably easier to just have the huge horking table.

    Now there are movements to dump many of the conjuncts (e.g. the swachchha font movement) but they have limited traction.

    So even though The script can make the language more complicated, the fact is that people are often quite fond of their complications. Not everyone will do stuff like experts chose to do in Want to hear about a cool new typographic convention? Khmer, and I'll tell you about it..., and not everyone wants to move from a primarily Top-to-Bottom/Right-to-Left language to a primarily Left-To-Right/Top-To-Bottom language just to get into computers like happened for Japanese (in retrospect they made a good choice, as vertical support is still not all that great -- though soimne would clsim if they had styed vertical this effort would have been more vigorously supported).

    So getting back to the less "language revolutionaries" thoughts of people wondering what they can do in Unicode.

    The original topic, remember?

    You (by which I mean They) are not asking the right question.

    If your (by which I mean their) language is Hindi or Bengali or Oriya or Telugu or Marathi or Malayalam or Konkani¹ or Assamese or Punjabi or whatever, the question you (by which I mean they) have to ask is whether people are able to widely support your language -- and by widely I mean there are input methods and fonts and search tools and all of the platform pieces you need to easily work with the language -- and if there are blockers, figuring out how to unblock them.

    The problems can be technical, they can be conceptual, there can be motivation to change or inertia against it.

    But the key is not to say "look what Tamil got, how can I get something?" but to figure out what you (by which I mean they) need that can genuinely help and then to push to get that need taken care of in a way that fits in the framework of how the Internet and everything else gets work done.

    Now obviously the fact that the Government of Bangladesh joined Unicode as an Institutional member on June 30th is an example of perhaps just such a movement. If it is, I hope it leads to productive conversations and contributions.

     To be honest after I saw the announcement my first thought was that I wish I had time/resources to go up there and talk to some people while I am on the right side of the world for such talks to happen, but I really can't afford to go to all of the places I want to go while I am in India without people sponsoring some parts of the trip (i.e. plane tickets and hotel rooms) involving visits to them. And no one from Bangladesh was asking anyway.

    So, we'll see what happens, I suppose.

    If you're interested, I'll keep you all posted (and to be honest since I am interested I'll keep you posted anyway!).

     

    1 - The spellcheck results for the word Konkani were quite disappointing. Blog spell check #fail, and I believe the only one of India's constitutional languages to not be in its dictionary:

     

  • Sorting it all Out

    Which code page to use? The right one, of course!

    • 1 Comments

    Over in the Suggestion Box, Pete asked:

    I've been reading what you have to say about the deep, insatiable depravity that is CP_THREAD_ACP, and I can't say I disagree. My goal when writing Win32 string code is to start in UTF-16 and stay there as long as possible, and, when I can't start there, get there as soon as possible. Sadly, though, one must sometimes mediate between good and evil, such as when one writes a general-purpose library which needs to call an API which not only lacks a W variant but also seems to think code pages make perfect sense to everyone on all systems at all times, so doesn't bother to mention its needs or expectations. An example would be FGetComponentPath, but I think there are some cases in the old Outlook Express API as well. In these sorts of cases, we might benefit from your guidance on which to use, CP_ACP or CP_THREAD_ACP. Can one even generalize? And, if not, how might one determine the correct constant for each case?

    I think Pete was referring to blogs like Nothing stinks worse than the thread locale, other than the thread code page, where I am so subtle about my feelings for some constructs that one might miss the subtext....

    But for the general scenario here, the question of what code page to use?

    There is no "always correct" answer.

    I mean for FGetComponentPath there is a simple answer, which is use CP_ACP. Since that is what it is going to use.

    And if the private mapi32.dll stub is to be shoved in a directory that contains off-CP_ACP or Unicode-only characters, you are basically screwed trying to do this in some cases (like if short file name generation is turned off).

    Okay, I guess there is an answer here of a sort, after all.

    You have to match the code page of whatever function you are calling!

    Usually that will be CP_ACP but there are hundreds of exceptions out there (many of which I have written about in the past), and sometimes the only way exceptions are found is to use CP_ACP and notice the bug caused thereby. If MSLU taught me anything is that nobody bothered to document most of this stuff (beyond my occasional efforts, of course!).

    Generally speaking, there are better answers in most cases, even with functions like FGetComponentPath, depending on what you are trying to do with the function.

    As a by-the-way, the docs for FGetComponentPath are confusing anyway (they are all [in]parameters? Really? That would be more of an FSetComponentPath if it were true!)....

  • Sorting it all Out

    My 2 out of 119 (11 + 108) trips around the Chilkur Balaji Temple's inner shrine

    • 5 Comments

    Not really anything technical, you know what that means!

    I'm not really all that much of a tourist. It just isn't my thing in most cases.

    Occasionally though, I forget about that because there will be a place or site or thing I really want to see.

    Some of the time I spent in Thailand and Hong Kong and the Czech Republic for example.

    Now when I was in Tamil Nadu, both when I was in Chennai and when I was in Coimbatore, there were two questions that people kept asking me:

    1. Have you been to Kerala?
    2. Have you been to a temple yet?

    Kerala, I did not see -- though if I had not had all those American Express woes I had, then Kerala was on the list of things to do. Even back in Redmond Joe used to tell me I should visit Kerala. OK, something to do for next time.....

    The temple? Well no one ever suggested one kind of temple over another, which I thought was interesting (I could never imagine someone recommending people see a church when they visit the USA, any church and it does not matter what kind). Since my only prior history with temples in India was from Indiana Jones and the Temple of Doom it seemed to me that what temple and in particular what Hindu (or other) God was involved seemed important, and my movie experience really did little other than give me the appropriate warning of which of those temples to avoid (i.e. Kali Thuggee religious cult worshipping temples), so I figured I would just go with it.

    Now the temple posed an interesting challenge for me, because of the iBot.

    You see, most temples are not very accessible to wheelchairs.

    Even iBots, which can handle stairs, are pretty heavy to be taking up some of these old steps, so even if the folks at the temple were okay with it, I'd feel uncomfortable.

    Perhaps I lack faith. :-)

    Anyway, I was in Hyderabad, staying at Ista. And after a little work they found a temple that would work.

    The Chilkur Balaji Temple (చిలుకూరు బాలాజీ గుడి).

    You can read up about this temple here on Wikipedia. Most of the people I know from India already knew of this temple, whose alternate name is the Visa Balaji Temple or just the Visa Temple with Balaji as Visa God.

    The idea is that you would go there to pray before you got your visa, then you'd go back again after you go it to thank Balaji. That is pretty cool if you ask me. :-)

    Now I had no need of a visa (though perhaps divine intercession would have helped me with my own AMEX/Visa issues, they were resolved before I made it to Hyderabad), but I took this opportunity to see a temple since it was likely as close to accessible as I would get. And everybody really wanted me to see a temple while I was there, so....

    Now the ritual at this temple is described in the Wikipedia article I pointed to above:

    During the visit, the devotee goes through the usual rituals of prayer, including Eleven (11) circumambulations of the inner shrine, and makes a vow.

    Once the wish is fulfilled devotees then carryout 108 times around the sanctum sanctorum.

    Majority of wishes by devotees are VISA related, thus Chilkur Balaji is also referred to as 'VISA Balaji'

    11 Circumambulations (11 and 108 rounds) represents about the secret of creation, 1 means SOUL. 1 BODY, uniting both with Divotion and full Determination to fulfill wish , Dedicate on the lord, there is no second everything is god.

    108 represents 1 the EXISTANCE, ALMIGHTY, GOD (PARAMATHMA, here balaji in the minds of devotte), 0 represents CREATION (ILLUSIONARY WORLD,JAGATH), 8 Human Body need to come to this universe 8 months (JIVATMA).

    GOD is everything , GOD does not want anything from devottes, GOD want DDD (Divotion, Determination, Dedication). This Temple has chance to surrender GOD with DDD. Thanks to inventor of this Ritual.

    Having done my research about the temple, I showed up fully prepared to do those 11 trips around the inner shrine. Since I had no visa that I needed to get, I figured I wouldn't neeed to do the 108 circumambulations. Plus I was worried about the iBot battery a little. :-)

    But I was ready. I was primed!

    When I got there, I was asked to remove my shoes. Though there was a brief argument between two of the people at the temple as to whether that was necessary since I was in a wheelchair and my feet would not be touching the ground. I resolved the argument by removing my shoes, holding my hands together, and saying Namaskar (which is Telugu, if you don't know what it means think Namaste, and if you don't know what means look it up!). They both smiled and responded back in kind, and I then proceeded to move closer to the circle around the inner shrine.

    I was given some flowers by a lady to offer once I was in the appropriate place, and off I went.

    Suddenly another obstacle presented itself -- one incredibly huge step down with nowhere to grab on to so I could not use the iBot's stair climbing capabilities.

    I could get down, but there was no way I could get up again without being lifted by some people.

    That reminds me how much I love when people see the iBot and ask "Can that go down stairs?"  -- any chair can go down stairs -- the trick is to find the ones that can go up stairs!

    I described my conundrum to the person there and he assured me people would be able to come and lift me back up. I mentioned the weight (131 kilograms for the chair and 73 for me) which made him pause for a second worriedly, then he smiled and assured me it could be done.

    So down I went.

    I moved up to two wheels, and carefully proceeded around the shrine (the "carefully" part was to avoid running over bare feet!). I offered the flowers at the appropriate time into the appropriate place and bowed, and people seemed genuinely pleased as I was doing all this.

    One of the attendants started to move me toward that large step for the exit after just two trips around. I looked at him quizzically (since I expected to be doing nine more around) and with an embarrassed look he mentioned how distracting the iBot was to the people there. He did not want to offend me, but was genuinely concerned about the impact to the others (and their wishes) if I continued to make more rounds.

    As another side point, I thought I'd mention that it was interesting how the most common question men asked in the USA about the iBot is "Isn't that like a Segway?" while the most common question in India is "How much does it cost?". That 11.2 lakh rupees figure would usually stun them, let me tell you!

    But I became very serious and told him solemnly that I understood.

    I told him I had a couple of conditions, though....

    He had to

    1. help me make sure that I would be able to accomplish my vow (that I'd be able to get back up the big step with the iBot), and that
    2. when he was going around the innrer shrine afterward he would have to do at least a few of the remaining 9 and 108 trips around for me, so that I would not be disrespecting Balaji by my incomplete circumambulations.

    He was a little surprised, but he covered it up quickly and became very serious himself. He said he said he would help lift me up, and he would be sure to do the remaining 9 and 108 on my behalf.

    I thanked him very warmly. A:-)

    Getting back up did hit one snag; I was not smart enough to power the iBot down before it was lifted, which meant the chair had trouble with the uneven lifting that is pretty much guaranteed to happen with four guys lifting an awkward ~200kg weight.

    I knew I should have done those 9 myself!

    But I quickly fixed its confusion and I headed out to where the driver was waiting (I paid the lady for the flowers on the way out, which had been bothering me since she walked away too quickly for me to give her money earlier!).

    I was then outside the temple for over an hour enjoying the rare sunny day that time of year outside Hyderabad, talking to people, and them asking me lots of questions. Which was also lots of fun.

    Finally we found some people to help haul the iBot back into the Innova, which I also have down to a science now.

    And thus ended my visit to the Chilkur Balaji Temple....

    After that we head off to the mall, which is a completely different story.

  • Sorting it all Out

    It used to be Windows doing it right, and Office following. But now...

    • 2 Comments

    I think I may be getting old.

    More and more, I find myself responding to various engineering situations at Microsoft where I am saying (or at least thinking) "When I was doing that work...", if you know what I mean.

    Like for example, there was a time when Cathy and I were on the NLS team and we were both heavily involved in Unicode.

    Windows pretty much set the trends, did the work that other parts of Microsoft would then pick up.

    Now over the last several years, some of which were happening while I was still on that team and some of which happened later, I have been blogging about the terrible job that both Microsoft and Unicode have been doing on core aspects of real world use of the Unicode Bidirectional Algorithm. Like all of these and more:

    As I said previously, the simple problem is best stated as:

    The Unicode Bidirectional Algorithm cannot handle text from both left-to-right and right-to-left languages together in the same line of text.

    That is it, right there.

    And since in the real world this scenario and its underlying problem (all of the neutral character type punctuation charactersare so often there), this is not a theoretical problem; it is a real one.

    Now Unicode's Bidi Algoritm kind of wasn't very good in this scenario.

    And Windows was equally not good, though with the excuse that they were conformant to the UBA.

    Which we knew kind of sucked here.

    And of course Office, which uses the Uniscribe component from Windows, would (as an application suite downlevel of Windows), sucked a bit too -- but was conformant to the UBA and doing the work to support what Windows did (or didn't).

    Folks in Windows were comfortable with this, knowing that when Office did their own thing that had incomplete stories like the one in Oh (Saka to me, Saka to me, Saka to me, Saka to me) Whoa Babe (Just a little bit) A little respect (just a little bit).

    Then, for Office 2007, one could say that they looked at that Bidi situation and decided enough was enough.

    Even if Windows was okay with the idea of sucking with a good excuse (Unicode conformance), they decided to say screw that....

    Murray Sargent described some of what they did in blogs like Tailoring the Unicode Bidi Algorithm and Bidi Paragraph with Parenthesized Text. Basically they tackled this huge scenario in a way that neither Windows nor Unicode were doing so well in.

    And people have been noticing the difference, e.g. blogs like It's a bug, it's always been a bug. In either direction....and how Office and many of its applications and components now do it correctly, while most of Windows and .Net do not.

    This all shipped in both Office 2007 and Office 2010. And was in beta before Vista shipped.

    And at the most recent Unicode Technical Committee meeting, Murray Sargent brought up the problem with Unicode and suggested that they make a change to describe this exact option to better support real world usages of bidirectional text.

    They were very interested, and are looking forward to the fully written up proposal at the next UTC meeting.

    Soon enough, Microsoft Office and RichEdit and Unicode will be working correctly in this scenario.

    And Uniscribe, GDI+, WPF, DWrite, Silverlight, and everyone else won't.

    In fact, all of the above is true right now, except for Unicode.

    The excuse of these components is that they had to stay conformant to the UBA.

    I wonder what their response will be when Unicode is updated too?

    I remember back when Windows was pretty much the one doing the right thing in such circumstances.

    Now (less than a half a decade later)?

  • Sorting it all Out

    When I say Graphics.MeasureString can hang with you, I mean it in a bad way!

    • 4 Comments

    I'm not up on the latest slang the kids today are using, you know?

    I do have some of the most awesome cousins in the whole world and I am friends with them on Facebook and I can see what they say. Occasionally I can follow along, but usually I just accept that this is a dialect I cannot master....

    It was just recently (in It's a bug, it's always been a bug. In either direction....) that I was talking about some specific issues that worked the way a user moight expect in Word and WordPad and RichEdit but did not behave in such a way when GDI+ was used.

    The behavior is one requested/demanded in the current version of the Unicode Bidi algoroithm, so that really was not a bug!

    This is not an issue like that.

    No sir.

    This is a bug

    Everyone would call it such.

    The description:

    Hi all, 

    I’m working on a bug complaining that Graphics.MeasureString hangs with a long right-to left string. If the number of ش  in a string is greater than 2046, it will hang. My question is: Is it a known issue that there is a length limitation in GdipMeasureString? Is there a workaround approach available for us? Your suggestions will be sincerely appreciated.

    We can repro it with the following code: 

        class Program {
            static void Main(string[] args){
                Bitmap b = new Bitmap(10, 10);
                Graphics g = Graphics.FromImage(b);
                Font objFont = new Font("Arial", 10);
                //Notice: i == 2046 works well.
                //Notice: i == 2047 hangs.
                for (int i = 2000; i < 2500; i++) {
                    // Create a string with a repetition of right-to-left characters.
                    // If works well if use a normal char like 'a'.
                    // string ss = new string('a', i);  //this one works fine
                    // but it hangs when the length of ش is greater than 2047
                    string ss = new string('ش', i);
                    Console.WriteLine("Testing " + i.ToString());
                    SizeF oSize = g.MeasureString(ss, objFont);
                    Console.WriteLine("Size = " + oSize.Width);
                    Console.WriteLine();
                    Console.WriteLine();
                }
           }
        }

    That sounds like a bug, right?

    You can see the original report on the Connect site, right here.

    You can see there the resolution, too:

    Windows team has confirmed that it's a regression, but they decide not to fix in Win7 since a string with more than 2046 characters is a corner case. However, they would evaluate it for Win8.

    Addtionally, we have a workaround, using Graphics.MeasureCharacterRanges. Thefore, I'm going to close this issue.

    Thanks and keep the feedback coming.
    UIFx Team

    Kind of says it all, I guess. Right?

    I'll be honest, they are making a pretty optimistic assessment as far as being willing to look at it in the future given how little GDI+ work is happening, but perhaps the hang elevates the status a bit.

    Besides, who cares what Graphics.MeasureString is doing wrong as long as Graphics.MeasureCharacterRanges is around to pick up the slack.

    In any case, when I explain how Graphics.MeasureString may hang with you, don't assume it's a good thing!

  • Sorting it all Out

    It is easy (and obnoxious) to claim "size doesn't matter" if one has the size everyone wants

    • 14 Comments

    The other day, in The script can make the language more complicated [to use], I said:

    There is a lingering size issue I'll talk about another day....

    Well, think of today as another day! :-)

    In the realm of international politics (one of my favorite people in this or any other world is cringing as she reads me use that phrase but I promise I'll be good!), there are several different philosophies that can guide the way one does the work.

    There is a way known as

    SHARE AND SHARE ALIKE

    also known as basic fairness, where everybody is pretty much treated equally and contributes as much no matter who you are, when you showed up, how much you need, etc.

    Now this idea has some significant limitations as it means people who have less are forced to contribute just as much even if they don't have enough or if the problems were not of their own doing.

    So some prefer a different philosophy, more like

        DIFFERENTIATED RESPONSIBILITY

    which addresses that some by making sure that who pays in and how has a lot more to do with who can contribute and who did the most to require people to contribute.

    Now both of these try to be fair, but in different ways. There are specific times that each might be better than the other.

    Then, there is another way.

    This way might best described

        FIRST COME, FIRST SERVED

    and although popular, it has some drawbacks, especially in the situations where there is a specific cost to being late to the game.

    You could, by and large, think of Unicode and ISO/IEC 10646 in terms of international politics. It certainly involves international and can often be quite political! :-)

    Now every script gets to play, which kind of fits in the share and share alike philosophy, and some scripts need more effort than others but those resources are often allocated which fits well into differentiated responsibility a little bit.

    But for the most part, if you are one of the scripts whose code points are allocated after 0x7ff fo UTF-8 or after U+FFFF for UTF-8/UTF-16, it will cost you.

    The former puts you into three-bytes per Unicode code point land, and the latter puts you in four-bytes per Unicode code point land.

    And that will cost you, as I said.

    If you are in a land of broadband it may not be as important, but if you are in dial-up or heavily metered territory, those extra bytes will cost. Especially if some or all of the communication will be in that three-byte or four-byte range.

    I would argue that the USA is a country where connectivity among computer users including broadband is pretty prevalent. So its easy to look at the one language that pretty much stays in the one-byte per Unicode code point land and feel a little bitter, thinking about how one of the countries most able to afford the extra cost is the one that gets off cheapest.

    And it is easy to wonder why e.g. Syriac is below 0x7ff when so many scripts in heavy modern use are above it (though I suspect if they had put Devanagari below and all the other Indics above -- for example -- we could have caused violence in lots of places!).

    When you add to that the fact that for most Indics, which use the Virama/Abugida method to encode, native text in the script will be almost twice as big, taking two Unicode code points for all letters other than the ones that use the inherent "a" vowel.

    Plus there are those alternate forms that require ZWJ and ZWNJ to be there too (I've talked about them before in blogs like Which form to use if the form keeps changing?). I'll remind everyone that the Unicode implementation suggestion from the Indic FAQ adds yet another character -- a three byte one -- the form most commonly used.

    The upshot is that for the Indic scripts, the cost per linguistic character in the script is 3-6 bytes per (usually 6) with conjuncts being 9-15 or 15-21 or 21-27 bytes per.

    Having spoken to several people in Chennai and Coimbatore and Bangalore and Hyderabad, being told that this cost is no big deal by people who aren't paying metered usage and who use just 1 byte per character sounds just a tad condescending to more than just a few of the billion plus people potentially impacted.

    I can't argue with their logic, but I can say that one of the reasons so many of the original Indics were put together is that they were submitted at the same time by India, which really did think of them as a big Indic block. This is kind of what ewas asked for.

    Which points to another of those philosophies:

        BE CAREFUL WHAT YOU ASK FOR BECAUSE YOU MAY GET IT

    Now they did not get out of it completely unscathed. They too are paying that extra price too, for Hindi (including a non-trivial possible amount of those prices for conjuncts). So it is not like they get it any easier. But obviously if they had originally requested one or more of the Indics be done as syllabaries in the original proposal (Tamil and Bengali are the only two I have seen suggested in such a way by native speakers though to be fair I have not spent a ton of time looking for many others!), then just like Ethiopic they might have gotten it.

    Which points to yet another of those philosophies:

        IF IT'S WATER UNDER THE BRIDGE NOW, ONE HAS TO MOVE ON

    With the benefits of hindsight and post-mortem review and more knowledge, it is easy to criticize. And people do, in fact, criticize. Sometimes I think that is what 75% of this blog ends up being about.

    But there is not a lot anyone can do about it anymore.

    For me the worst part is that some of the people who did the original work don't even see an issue here -- they feel quite good about all they have done and don't see the irony that happens when they are a bit piqued when those it was done for aren't "more appreciative." I have found myself apologizing for "those who don't know any better" quite a bit in recent times.

    And talking about the importance of broadband (to help make sure everyone can cross the bridge that the water is running under)....

    To be honest, it is almost frightening (though predictable in retrospect) how much better of a response one gets when one starts by trying to actually understand a concern rather than leading off by attacking it.

  • Sorting it all Out

    Amazon? Dumas would be spinning in his ` (grave) if he knew.

    • 8 Comments

    Yesterday one of the big news stories everyone covered was the information from Amazon that E-books for the Kindle were outselling hardcover books.

    Here's a link about it but this is fairly gratuitous, I'm sure you've seen it.

    You'll notice thast none of the stories pointed to a source; in the new Internet, we don't always have those!

    Anyway, given my recent experience it made me weep for the future of books a little, to be honest.

    I'll explain why....

    So, the other day I was flying.

    Heading back from India, via Frankfurt.

    This is a very long couple of flights.

    So I was reading some old favorites of mine on my Kindle.

    Oh, did I mention?

    I have a Kindle.

    Pre-loaded with titles since I was heading to India and didn't set up the international download deal.

    And I was reading while movies were playing in the background.

    I admit that stories like The Count of Monte Cristo would be better in French, but my French is not really quite up to that task (I actually can accomplish it, but then it is much harder and is no longer reading for pleasure -- so I stick to translations).

    And suddenly, I find one of those issues I just can't ever seem to get too far away from:

    Geez, Amazon! What's up with that?

    How would Alexandre Dumas, père feel about this, exactly?

    Lucky for us he is nearly 140 years past being able to respond.

    I can't help wondering whether the Kindle is sold in France. Or in Quebec.

    What is wrong with these people?

    Now the Interent has watered down words like père a little, so that even French Wikipedia will find père when you look for pere (ref), but that is just to make search easier. We haven't yet changed the "Father" or "Senior" suffix in French to remove the grave accent above the letter e.

    All of the words have been stripped of accents (even words that would usually retain their accents in English versions)....

    I should be grateful that the items list has the accent properly listed. Maybe. Though proper indexing of trashed content is hardly a point of pride, Jeff.

    Yet another "let's provide all the free content we can" project consisting of a Lucasian "shove as much crap on the list as we can even if it is, in fact, crap" that gave us Star Wars I-III.

    Sigh.

    I guess Dumas would not be allowed to spin in his grave since in the free content the grave is not available!

    I also had the Kindle Reader for PCs on my laptop, so I decide to try that too. Maybe it was just an issue on the device?

    And unfortunately....

    Sigh.

    Which is worse -- not having the font support for the character or stripping it from the text or choosing content that never had it in the first place?

    Between two and three of these crimes have been committed here....

    Is this merely representative of the shoddy job they do with free titles they offer? Or the crappy quality of the titles they pick up?

    Is there some special module required for international support?

    Or should I just return this otherwise nice device for one that can handle Western European languages better, given how unforgiving I tend to be of such fundamental issues?

  • Sorting it all Out

    It's a bug, it's always been a bug. In either direction....

    • 6 Comments

    So the other day someone pointed me to a thread going on over on the forums.

    The post in question was: Problem with symbols when drawing Hebrew text by Graphics.DrawString function from the RichTextBox control, if you want to skip ahead of my description and see what I’m going to talk about before I start talking…

    The stated problem from winrazor:

    Hi all,

    I have a question:
    I'm trying to draw RightToLeft (Hebrew) text on the Image object using "Graphics.DrawString" function.
    Text is previously loaded from the RTF file into RichTextBox control, and in this control it seems correct.
    But after drawing on the image symbols like "(" , ":"  and "." displayed on the incorrect positions when they are last or first in the text line.
    In other words mixing Hebrew with Simbols (English) does not work correct.

    Do you have any idea?

    Thanks a lot!
    Timur.

    Now regular readers here might know what is going on already.

    This is the typical problem with neutral characters at the end of an RTL run of characters that are embedded in an LTR context – the neutral character takes the overiding directionality to decide where it should be placed rather than the preceding text.

    The user has just noticed that the RichTextBox (as well as WordPad and Word and Outlook) have been doing a slightly better job than GDI+ (as well as plain old Notepad, plain old EDIT controls, plain old Uniscribe).

    We can start with an easy point.

    The bug in question will never be [intentionally] fixed. Period.

    GDI+ is in maintenance mode now and only high severity bugs like crashing or security exploit bugs are being fixed. And while a misplaced period can sometimes crash one’s plans for an evening, it is never going to crash a version of Windows.

    So Graphics.DrawString is going to live with the same limitations as Notepad, the regular EDIT control, the TextBox control, and plain old Uniscribe that it has roughly for the lifetime of Bidi support on Microsoft products that do these “mixed directionality with neutral characters on the border of items” scenarios.

    Now for the rest of the thread, it gets a little weird, with the MS person claiming the bug can’t be reproduced even though the sample written by the MS person completely reproduces the bug that the customer reported. A screenshot of the non-repro that was repro’ed was provided:

    The post was then moved to “General Discussion” since requested information wasn’t provided after just four days. Even though it really had been (this is a problem that anyone with knowledge of Bidi can easily speak to).

    Ten days later, WinRazor responded (I added checkmarks to the lines that are accurate):

    Dear Friend,

    Hebrew is Right to Left language, it means that dot should be from the left side. (in the end of the frase)

    Initial string also has dot from the right - it's another bug of this online tool.

    Dots braces and other symbols are placed incorrectly by MS new functions.
    Finally I've fixed the problem by using old functions, which have awfull preformance with long texts, this problem is also fixed by manual separating pages with additional optimization.

    In other words:  MS should fix 2 bugs:
    ✔1) In the new function "MeasureString" - RTL is not supported as needed!!! (symbols mixed with Hebrew text have incorrect positions)
    ✔2) Bug in this online tool - typing and copy\paste have the same problem with symbols as 1) .

    I did serious investigation in the net, and a lot of people have the same problem and suggest to use old interface,
    it also suggested in the MS example. I think that a nice way or fix this bug or notice about this problem in the help of the new functions.

    And please don't say that you don't have enough info - you can see the problem in your answer.

    Anyway, thank you for your answer! And please let me know when 1) is fixed. (if it's possible)
    Timur.

    Well, the first few lines were right (in fact all the lines I put a check in front of are correct!).

    Now it is unclear what is meant here by “old functions” since this only works correctly in newer versions of certain controls. I assume the problem is that it used to work with other RichTextBox editing stuff and that can make the definition of “old” and “new” change a bit (it is easy to think of these functions as NEW to the people using it, I suppose).

    But there have been misunderstandings about what is going on here on both sides!

    So be it, though.

    Microsoft never responded to this last post, even after nine months, which is more than enough time for cow, countess, or response from support. So it probably won’t be responded to ever at this point.

    I mean except for me weighing in, of course! :-)

    This is the kind of bug that really ought to be just fixed given the many reports over the years and the real impact it has, though of course when/who/how type questions clearly will impact such a fix…

  • Sorting it all Out

    Where's the other Urdu?

    • 10 Comments

    What are the languages of India? is a rather loaded question.

    Not in the Have you stopped beating your wife yet? sense. But perhaps to some the two questions have a similar order of magnitude.

    In the constitution of India, it is clear that the official languages of the country are Hindi (in the Devanagari script) and English (in the Latin script). 

    But a part of the constitution allows the recognition of official languages in individual states, and since the states had their borders largely decided based on language it seemed best to leave it to the states to work to define the official languages within the states.

    With that said, there is a list of languages that have a special significance, whose latest incarnation is described here in Wikipedia:

    The Eighth Schedule to the Indian Constitution contains a list of 22 scheduled languages. At the time the constitution was enacted, inclusion in this list meant that the language was entitled to representation on the Official Languages Commission, and that the language would be one of the bases that would be drawn upon to enrich Hindi, the official language of the Union. The list has since, however, acquired further significance. The Government of India is now under an obligation to take measures for the development of these languages, such that "they grow rapidly in richness and become effective means of communicating modern knowledge." In addition, a candidate appearing in an examination conducted for public service at a higher level is entitled to use any of these languages as the medium in which he answers the paper.

    There are obviously benefits to being on this rather exclusive list -- this number 22 is out of either nearly 500 or over 1500 languages in India (depending on whose count you accept).

    The list (table modified from here) in order of population is:

    Language Millions of speakers per last census Locale within India
    defined in Windows?
    State(s) giving the language official status
    Hindi 422 Yes Andaman and Nicobar Islands, Arunachal Pradesh, Bihar, Chandigarh, Chhattisgarh, the national capital territory of Delhi, Haryana, Himachal Pradesh, Jharkhand, Madhya Pradesh, Rajasthan, Uttar Pradesh and Uttarakhand
    Bengali 180 Yes Andaman & Nicobar Islands, Tripura, West Bengal
    Telugu 74 Yes Andaman & Nicobar Islands, Andhra Pradesh, Puducherry
    Marathi 72 Yes Maharashtra, Goa, Dadra & Nagar Haveli, Daman and Diu, Madhya Pradesh, Karnataka
    Tamil 61 Yes Tamil Nadu, Andaman & Nicobar Islands, Puducherry
    Urdu 52 No Jammu and Kashmir, Andhra Pradesh, Delhi, Bihar, Uttar Pradesh
    Gujarati 46 Yes Dadra and Nagar Haveli, Daman and Diu, Gujarat
    Kanada 38 Yes Karnataka
    Malayalam 33 Yes Kerala, Andaman and Nicobar Islands, Lakshadweep, Puducherry
    Oriya 33 Yes Orissa
    Punjabi 29 Yes Chandigarh, Delhi, Haryana, Punjab
    Assamese 13 Yes Assam
    Maithili 12 No Bihar
    Santhali 6.5 No Santhal tribals of the Chota Nagpur Plateau (comprising the states of Bihar, Chattisgarh, Jharkhand, Orissa)
    Kashmiri 5.5 No Jammu and Kashmir
    Konkani 2.5 Yes Goa, Karnataka, Maharashtra, Kerala
    Nepali 2.5 No Sikkim, West Bengal, Assam
    Sindhi 2.5 No non-regional language
    Manipuri 1.5 No Manipur
    Bodo 1.2 No Assam
    Dogri 0.1 No Jammu and Kashmir
    Sanskrit 0.05 Yes non-regional language

     Now I threw that third column in to point out that not every decision made in regard to Windows has a pure population reason behind it. I could have used other list items like version of Windows where support was added if I wanted to show even more interesting and/or strange trends, but I figure this one is enough for present purposes.

    Now of all of these languages the only one that cannot be displayed at all using the built in fonts in Windows 7 is Santali, which is written with the Ol Chiki script. But I was told that literacy rates among speakers is low, so perhaps that 6.5 million number shouldn't be thought of purely in terms of "theoretical potential customers". Though of course other numbers would change on this list as well, with that metric. :-)

    Microsoft Windows and Office don't seem all that well aimed at the "silent majority" (~93%) in India who don't speak English, but we'll leave that interesting issue for another day....

    There are only a few real anomalies on this list:

    • Kashmiri, whose font support is available for both the Arabic script and the Devanagari script, would really have to wait for built-in locale support until after the political situation is resolved in a less tense way;
    • Sanskrit having a locale is obviously also a very political thing, in the other direction;
    • Sindhi mostly uses the Devanagari script in India and has had those extra needed characters added both to Unicode and in Microsoft fonts, a small point of embarrassment for me personally (and for Microsoft) since the inclusion of the characters in Unicode when the relevant version of ISO/IEC 10646 had not yet added them making them out of sync for a version was done on the basis of Microsoft requesting them (through the people in the UTC meeting at the time, which included me) for the sake of support in the next version of Windows and Office (though the locale was never officially added -- either in the Devanagari script for India or the Arabic script for Pakistan);

    And the most unusual of the anomalies on this list? It can be seen in Urdu, which as I mentioned in Giving the people Urdu, we are! can really be thought of as the same underlying language as Hindi, with both of them grown in different directions.

    Directions that have helped to fuel the differences between india and Pakistanfor lo these many years, in fact.

    Yet in Windows, where an Urdu - Pakistan locale exists, no Urdu - India one is to found!

    Though space has been reserved for it, as charts in both Locale IDs Assigned by Microsoft and Language Identifier Constants and Strings indicate (technically the same could be said for Manipuri - India and Nepal - India and Sindhi - India, now that I look at the lists!). I'm not sure whether that counts as transparency or some people publishing the wrong lists!

    I was asked by five different people while I was in India about what is holding up an Urdu - India , but to be honest I have no earthly clue. I was told that the folks in the subsidiary have asked for it, but I was unable to verify that bit of information at the time this blog was written.

    The bulk of the data in the locale would be identical to Urdu - Pakistan, but there are incredibly good reasons to really want Urdu - India to be separate and not ask people to use "the wrong one".

    So, ignoring everything else but the customer requirement for a moment, I am going to use the method described in Where are the other Tamils? and create a custom locale for ur-IN. :-)

    Here is the code:

    using System;
    using System.Globalization;

    namespace CustomLocales {
        class CustomLocales {
            [STAThread]
            static void Main() {
                CultureInfo ci = new CultureInfo("ur-PK", false);
                RegionInfo ri = new RegionInfo("en-IN");
                CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder("ur-IN", CultureAndRegionModifiers.None);
                carib.LoadDataFromCultureInfo(ci);
                carib.LoadDataFromRegionInfo(ri);
                carib.CultureEnglishName = "Urdu (India)";
                carib.CultureNativeName = "اُردو (بھارت)"; // Ignore the way it looks, the string is right! :-)
                carib.CurrencyEnglishName = ri.CurrencyEnglishName;
                carib.CurrencyNativeName = "روپیہ";
                carib.RegionNativeName = "بھارت";
                carib.NumberFormat.CurrencySymbol = "Rs.";
                carib.ThreeLetterWindowsLanguageName = "URI"; // Instead of URD as ur-PK has
                carib.IetfLanguageTag = carib.CultureName;
                carib.Save("ur-IN.ldml");
                carib.Register();
            }
        }
    }

    In the course of putting all that together, someone pointd out an interesting issue in the Urdu (Pakistan) locale. It's native currency name in Windows 7 is

    روپيه

    which includes U+064a, ARABIC LETTER YEH. This seems like a bug since U+06cc, ARABIC LETTER FARSI YEH almost certainly seems like it would be prefered by Urdu-speaking people in either country.

    But in any case the following slightly different string was recommended to me:

    روپیہ

    so I chose that one in the case of the above code; if you disagree then of course you can change the string, as well as the ThreeLetterWindowsLanguageName I used....

    If I am right about the built-in ur-PK data, someone should put in a bug to get that fixed in some future version of Windows, by the way. Any former NLS testers reading this? :-)

    If it exists then it is a subtle bug, since as I mentioned in Every character has a story #18: U+06cc and U+064a (ARABIC LETTER FARSI YEH and ARABIC LETTER YEH), in the initial and medial forms the two letters look identical (and this is obviously the medial form since it is the penultimate chacracter in the string).

    Anyway, just take the code, save it to a file as ur-IN.cs, and then compile it from the command line with the following line of code:

    csc /r:sysglobl.dll ur-IN.cs

    And once you do that, the landscape in Regional and Language Options will change a little bit:

    And there we go! :-)

    Now ideally one would be able to use the reserved LCID value mentioned in those other articles, but that is not an option in this case.

    But no solution is perfect....

    Sometimes it really still is about opening it all up and getting out of the way, as best as we can....

  • Sorting it all Out

    I think of them as American Express, internationally not so much

    • 4 Comments

    It all started in the beginning of June.

    Not the very beginning, mind you. Not the 1st of June. But just after.

    This "seemingly worthless and over precise detail" is an important plot point, so please keep it in mind.

    This India trip was happening, and there were going to be some Microsoft expenses on it. Several people figured as long as I was there anyway, why not take advantage? :-)

    So I sent a piece of email to get my American Express corporate card reactivated.

    It had been rendered somewhat inert, since I hadn’t had need to use it in the last year, by a company policy that said as much.

    But the same email I received explaining it was deactivated said that if there was a business need to reactivate it, I could tell them and they would reactivate it.

    Well it did, and I did, so I did, and they did.

    Everything was now all set.

    But on the day I was leaving, June 17th, i noticed a pretty gruesome fact.

    The American Express card itself (which I had not really looked at in quite some time) expired 6/10.

    Crap! The biggest part of the trip involving Microsoft expenses were happening in the July half of the trip!

    So I call them, a bit frantic, since I am leaving in a few hours.

    On the phone call they are at first confused why I was never sent a new card, until suddenly the piece in the American Express system that sends out the updated cards was seen to have a pretty significant bug: it runs on the first day of the month of expiration. And at that point, a new card was not needed since the account was inert.

    And no program or process existed to notice the changed status and the fact that the card was going to expire soon.

    They felt terrible, and told me so. It was clearly their error.

    But it was only the 17th; if I knew where I would be in India then they could just send the new card there!

    So I gave them the address to send it to (one of the Microsoft buildings in Bangalore that I knew I would be at and could pick up the card.

    No worries, right?

    Well….

    I went to the Tamil conference and had a wonderful time, and all of that was on other people; my only expense was the wireless – highway robbery at Rs. 500 for 50mb, but since it was my only real expense for most of the week I didn’t worry about it.

    Though I did learn that connecting to the Exchange Server is a huge no-no in metered situations like this. Its a pig, a real pig, in such cases! :-)

    Anyway, the conference over, I head over to Bangalore, and the day before I get to that Microsoft office, I send some mail there and fine out the card isn’t there.

    By this point it is the 24th/25th, so I am really panicking.

    I burn up quite a few rupees on the international cellular call because waiting a day seems like a bad idea, and the hotel will charge even on collect calls – the mobile will too, but it is actually less for collect on the mobile!

    They look at the records and assure me the card has been sent to me.

    To my address in Redmond.

    They look at the complicated record and can see the address in India it was supposed to be sent to. And they are confused that it was not sent.

    I cut off the apologies and expressions of regret and confusion in this AMEX "Sumimasen" exercise by pointing out I could not afford to hear them all on a prepaid cellular.

    But they assure me they will get me a new card ASAP.

    However, they explain, it is harder to send new cards to areas considered to be “high crime” (which they defined as "countries like Iraq, Iran, and India" – clearly the American Express definition of ‘high crime” is ‘countries that start with I” – lucky for me I was going to be Iceland or Ireland or Iskandar!).

    So even the DHL office in India that I literally would be passing each day in Bangalore was not good enough for AMEX – they had to send the card by courier. And it would take three days.

    After it didn’t arrive, the next person I was talking to couldn’t believe I was told three days – there was no way it would be less than five to a place like India.

    Oh yeah, a "high crime" area. Even though they had DHL offices, they were not trusted in India by American Express as a matter of policy.

    I wonder how DHL feels about the opinions of American Express here, I really do.

    At this point, with no way to access any other money due to my inactive Bank of America Visa account (BoA flagged my card for “suspicious” charges in another country, a situation I eventually got straightened out with them, but at the time I only knew the card was being declined) and an about-to-expire AMEX, I admit I was getting a little nervous.

    I suppose I should have had them next-day it to my manager in Redmond and had him IO mail the card to India, but I didn't think of this until much later.

    So I just waited. And hoped that the hotel wouldn't notice that the imprint of the card they took when I checked in expired soon. Having them call AMEX would be a challenge since I had to be the phone at least five minutes to get to someone who saw the right part of my account with the details ("Replacement Cards) here. Stress, defined.

    The initial plans for Community TechDays (a quarterly series of events targeted towards Developers and IT Professionals across Microsoft technology focused User Groups in India) were pushed back until mid-July given the end of the fiscal year, and no one ever set up time with the other MVPs either. But that is a conversation for another day that is much more likely to be an internal MS communication so that Soma can know that I was not given "other opportunities or ways to leverage [my] presence in India with the developer community" as he had envisioned (long time readers will remembr how last time I was in India he set up the chance for me to speak to students at Anna University as I described in To Err is human, but to Geek is divine).

    For now, we'll just note that I had to find other work to do to fill my time. The time create by this whole AMEX debacle.

    'It didn’t arrive on Wednesday (June 30th) or Thursday (July 1st) or Friday (July 2nd). I can’t leave Bangalore and am at this point having to start shifting around and even cancelling plans for the “vacation” parts of the trip since I can’t go to them, and doing extra presentations of interest to the folks at Microsoft Research in India -- mainly I think out of guilt.

    The courier wasn’t going to be delivering over the weekend, I was told.

    I couldn't leave Bangalore even though I could get to money now (Bank of America fixed their little overeager denial issue -- remind me in the future to tell my BANK when I have travel plans!). Because if I left I couldn't get my AMEX card that woulkd be siting here in the city I would have just left.

    And Monday was the Bharat Bandh, the huge 1-day strike in India to protest increasing fuel costs (see here for more info). The strike really hit Bangalore more than almost anywhere else in India (in Tamil Nadu they took the precaution of jailing the most well-known troublemakers so by report almost no one felt it there, for example), so it was the wrong place to be on Monday (though I had no other choice, obviously). I was stuck in hotel all day that day.

    On the glass half full side it let me ride the iBot on the nearly empty streets of Bangalore, which was actually a lot of fun. Not that it was worth that many of lodging expenses in the wrong city.

    Finally on Tuesday in the very early morning (for me; for them it was Monday) I get word from Courtney, who was subbing for Rachel, some of the people behind the AskAmex Twitter account who had been trying to assist me without requiring expensive phone bills on my prepaid mobile. Rachel was out that day but had left a note for someone to tell me when they got word of news on my case.

    The card arrived in India!

    Though not to the requested address.

    The courier they use, who cannot deliver on weekends, also apparently can only deliver to a bank.

    My movie-based mental definition of couriers who are armed and handcuff themelves to their briedcases and can get the goods anywhere and the American Express definition have some serious differences!

    So I burn an hour getting to the bank and back on Tuesday morning to pick up the card, with the cab driver who was going to take me to MSRI and wasn't entirely pleased with the poor quality of the directions the bank gave to their address.

    And then the card was finally in my hands!

    Though when I was initialing the log where the card was listed at the bank and saw it had a bunch of entries after it, I was suspicious. There were an awful lot of cards listed there for a Tuesday morning. Could there really be that many cards delivered to this small bank in so short a time?

    “When did this card arrive here at the bank?” I ask in a hushed tone. A missile silo kind of a hush, though really in my mind aimed somewhere in the US, not at this teller in the bank with no accessible entrance that required a security guard to carry me up the stairs to get in. Though that is as whole 'nuther story.

    She traces her finger in the log to the date column, and then says “June 22nd.”

    I have to look and check her work to be sure she got that one right. I even look at the entries around it to be sure.

    What the hell?

    Five days after the first card was supposedly sent?

    The one that supposedly was sent to Redmond and got me all worried that I would never get a card in time in the first place.

    There is no way that could have been anything other than that first card they sent, back on July 17th, because there was no way this could be the second card I asked them to send three days later on June 25th.

    At this point I had been required to give up plans to visit Kolkata (Calcutta) and Kerala (highly recommended by almost everyone who knew I would be in India) this trip though I would have liked to do one and maybe both of those things, and stayed grounded in Bangalore for at least six days longer than I intended.

    All due to a series of software bugs, process problems, inept phone support, lies, errors, bad communications coming from American Express.

    That company with a representative that equated doing business in India with doing business in Iraq or Iran.

    I will technically be violating several Microsoft corporate policies for use of the AMEX since I was forced to charge items that I will ultimately claim as personal (since they were) but had no other means of credit at the time. I think my boss will understand so I'm  not worried, but as expense reports go its one I'd rather have my gums scraped than fill out, and there is at least a part of me that thinks it would be easier to quit and work for some other company than fill out the expense report from hell that became so devilish due to American Express.

    I’ll say that in the end and for the record, I have no authority to make decisions on behalf of Microsoft for its choice in corporate cards.

    I also have no authority over American Express and its ability to disciple or terminate their employees.

    Nor am I hankering for any of those responsibilities.

    But it is quite lucky for American Express that this is the case, or I would fire a bunch of them for dishonesty and/or incompetence, and send then packing as a corporate card until they proved they could do a much open, honest, and professional job in International scenarios. With the exception of one employee -- Rachel Toledo of the AMEX Social Media Response Team, who I would call out for her nearly heroic efforts to keep me informed that proved it was just as bad on the inside to get things done as it looked on the outside --  I find the behavior of every person involved with the case to be so consistently awful that it is hard not to come to the conclusion of serious infrastructural issues that random chance in my case happened to expose.

    This is not mere bad luck -- there are real problems in design here of their overall system.

    No wonder they spend so much money on commercials in the US about getting people tickets to the best shows in town, how else can they distract from all their other issues?

    I suppose they will just shrug and blame it on me for not noticing the expiration date on the card I had not used in a year and do nothing about their own internal issues. I am easier for them to blame as they prove to live up to their name of American Express, and internationally, not so much.

    It cost me a bit of my vacation and some personal money and some of Microsoft's money, and I’ll live (I suspect Microsoft will too). But their mistakes could have led to much more dire problems and with no evidence that anything will change here, there is little I’d be willing to recommend about them.

Page 1 of 3 (32 items) 123