Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
I was flipping through Carl Sagan's Contact the other day, and ran across a part I must have seen before but don't recall:
The austere marble lobby displayed, perhaps incongruously, a real statue--not a holograph--of a nude woman in the style of Praxiteles. they ascended in an Otis-Hitachi elevator, in which the second language was English rather than braille, and she found herself ushered through a large barn of a room in which people were huddled over word processors. A word would be typed in Hiragana, the fifty-one letter Japanese phonetic alphabet, and on the screen would appear the corresponding Chinese ideogram in Kanji. There were hundreds of thousands of such ideograms, or characters, stored in the computer memories, although only three or four thousand were required to read a newspaper. Because many characters of entirely different meanings were expressed by the same spoken word, all possible translations into Kanji were printed out, in order of probability. The word processor had a contextual subroutine in which the candidate characters were also queued according to the computer's estimate of the intended meaning. It was rarely wrong. In a language which until recently never had a typewriter, the word processor was working a communications revolution not fully admired by traditionalists.
Now the book first came out in 1985.
But it was largely imagined starting back in 1980, at that time thinking about a movie.
That movie that didn't happen and eventually became the book a few years later.
And then, it became a movie 12 years after that.
The term IME or Input Method Editor is mostly regarded as having been originally coined by/for Windows, though as far as I know everyone uses it now. It just struck me that this fictional description that predates even the most primitive incarnations of the technology is probably better,more accurate, and to be perfectly frank more engaging than just about every description of an IME I have ever seen in my life, and written in a way that presumed very little prior knowledge of Japanese or for that matter Chinese in doing so.
That's pretty impressive.
Of course perhaps he was just describing a demonstration of something that he had been shown which, even if not yet on computers and not yet as sophisticated as modern IMEs are, was working well enough that the future possibilities were a part of the demonstration. But the words are still immortalized in this tiny corner of a book that was about something else entirely (though it was still along the lines of one of the book's common themes).
Most of that theme was lost in the movie that came out over a decade later, as did several other themes of language and culture that lent a lot of depth to the book. And of course they didn't have this scene either (I watched the movie last night to make sure they didn't slip it in passively, because even in 1997 it might have seemed pretty amazing to people, though only a text description could have done it real justice). Most of these themes were replaced by the throwaway "awful waste of space" imagery that just made me groan.
I've never met an intelligent woman who would fall for that kind of a line the way Jodie Foster's Ellie Arroway did, but maybe I'm not meeting the right kind of intelligent woman yet.
But I digress.
Anyway...no worries, though. The book is still there....
I'd like to point out that all of this blog today is my opinion -- it is not a strategic statement of intent for Microsoft. Please set your expectations and interpretations accordingly....
It started on July 29, 2010, in 4 out the door, in both 32 & 64 (aka What Irish, Malay, Maltese & Bengali have in common) -- where I mentioned the release of the 32bit and 64bit versions of the Language Interface Pack for Bengali (India).
And then it continued on January 17th, 2011, in The Bangla LIP is out, only 5½ months after the Bengali LIP!-- where I mentioned the release of the 32bit and 64bit versions of the Language Interface Pack for Bangla (Bangladesh).
It culminated at the end of that second blog where (in true pot stirring form) I mentioned:
Perhaps in some future blog I may contrast the two LIPs, if people would perhaps find that interesting....
and in the comments where many people made it clear they were taking me up on the offer. :-)
Now the general issue of deciding to bifurcate a localization for two or more markets that speak the same language differently is something I have talked about countless times before, especially relating to customer satisfaction for English, Spanish, and Arabic. Especially when one considers the issues of mutual intelligibility across these many places. We don't really do any of those, though.
Now we do split Simplified Chinese and Traditional Chinese, though. There is simply some threshold of differences combined with threshold of market sizes that plays into that one.
And if you consider About that Portuguese localization question... and About that Portuguese localization question, redux..., I have clearly discussed the one other big significant case where the bifurcation took place -- between Brazilian and European Portuguese (note that it too involved definite market pressures and formal usage research).
One more article I need tyo bring into the mix here, since all of the language issues I mentioned so far relate to full Language Pack SKUs, is Why one LIP and not another?, which will help provide some of the additional context of LIP-type language decisions.
First, I'll add one explicit fact that was buried a little implicitly in Why one LIP and not another?, previously.
The truth is that there are some individual cases where the people who were asking for a LIP (whether Public Sector in Microsoft or a government) might have no real interest in language quality, whatsoever. Maybe Someone is just trying to "sign a deal", maybe someone else is trying to "prove they care about a language" -- so perhaps to them it is just a line item, and if we shipped a brick with the language inscribed on it then it would meet *those* needs.
But the localization team tends to have higher standards than that, and of course the reason they care i because they know the customer will care too. A lot.
Let me introduce one more blog before I start talking about Bengali/Bangla: A way to say "this is who I am, where I am, what I think is wrong, and why" ?. It is right there that the problem of actionable feedback really comes into play.
Because it is much more difficult to respond to feedback that is either not directly actionable or at least interpret-able by others.
Now taking into account my own interest in Tamil¹, my own interest in Bengali/Bangla², and a lot of feedback on other LIPs targeting languages of South Asia, I am willing go out on a limb and say that although translation quality has improved in both languages (in some ways significantly), there remains issues (of the localization quality type I described in A way to say "this is who I am, where I am, what I think is wrong, and why" ?).
Combined with the general lack of actionable feedback, it can be hard to address this problem, which is one of the reason that it takes a while to do.
Even people who are unhappy with Bengali or Tamil or Kannada or Marathi in Windows 7 will readily admit that the quality is better than the initial LIP release in XP and/or Vista.
But if Microsoft has to tease the bugs out if the people who are saying something is wrong, it can take even longer than it would if we could just give up and go with what we had.
Okay.
Think of all of the above as introduction. :-)
When it comes to Bengali/Bangla, there are three issues I am going to cover here:
It would make a lot of sense for me to emphasize again that this is all my opinion -- it is not a strategic statement of intent for Microsoft. So please consider yourself reminded!
Now the first point is easy -- with more than 200 million speakers who consider it a first language across several countries (a not insignificant percentage of whom do not know English), it clearly makes sense to consider it for localization.
The second point is a bit more complicated.
A bunch of feedback had been coming in about how the Bengali (India) LIP was inadequate for Bangla-speaking computer users in Bangladesh.
Not a roar but a dull sense of dissatisfaction and frustration that one can expect from people who may not need the product for their own understanding (since they know English) but would need it to bring it to others (who may not, or at least not as well).
The issue can in theory be similar for Tamil though they tend to bring more passion and anger to language issues. So you can assume the volume will be higher, even if the message is just as hard to understand or harder!
It simply became more and more obvious as problems continued to come in as feedback that there was an honest feeling of inadequacy within the market, of a sort that could perhaps only be met by having localizers who were more attuned to that market on the job.
When considers that the majority of those > 200 million people are in Bangladesh, it becomes even more interesting to consider. And consideration turned to action!
The third pointis perhaps not as directly interesting, though as near as I can gather it would seem that while the Bengali (India) LIP is in its third version, the Bangla (Bangladesh) LIP is v.1. And the feedback that would come in was not as actionable. In essence, the biggest part of the delay seemed to be due to the fact it took longer for people to be comfortable with terminology choices and phrasings and usage, and to get the appropriate feedback when there were bugs to be fixed in these items.
But let me loop back to the second point for a moment.
Even in India, the language is actually known as Bangla, not Bengali, by almost all of the native speakers you might ask.
In fact, I may be wrong but I think calling it Bengali is yet another Britishism (there would be a surprise -- yet another problem in South Asia we could ultimately blame on the British!).
When one considers the fact that the language name was the first and most visible problem that Bangladeshis saw in the India-bound LIP, an important question comes up about whether this difference is typical of the many other differences that was imputed and suggested.
Because if it is that would suggest that the differences between the two LIPs are ones that would really better served if it were one single LIP with those Bangladeshi changes as the base.
So, now my challenge to all of you who read this blog who are in West Bengal or Assam is to:
I may not be correct here, but I think native speakers in India are the ones who can say for sure whether these differences are in fact ones that would benefit all Bengalis or not.
If the feedback is that both sides are right -- that some changes belong in both places but some genuinely are differences in the two markets -- then perhaps that would suggest that at a minimum the translation of one could benefit from a bit of review of the other!
I figure the first step is to ask the question. So in the hopes that a few of those 55 million Bangla speakers in India are reading this or have been pointed to it by friends what they think of the two language packs, compared....
And yes, I think both locale names in English should be updated to Bangla in future versions of Windows either way.
1 - Based on my decade of involvement with it, and the several native speaker friends I have.2 - Based on the woman I dated, the several native speaker friends I have, and my desire to be able to read Tagore as he wrote.
THE WINDOWS 7 KYRGYZ LANGUAGE INTERFACE PACK IS LIVE!
Click here to download the Kyrgyz Windows 7 LIP via the Microsoft.com Download Center.
Please note that the Kyrgyz Windows 7 LIP can only be installed on a system that runs a Russian client version of Windows 7.
It is available for both 32-bit and 64-bit systems on the Download Center.
The Kyrgyz Windows 7 LIP is produced as part of the Local Language Program sponsored by Public Sector.
A LITTLE BACKGROUND INFORMATION ON KYRGYZ
NUMBER OF SPEAKERS:
3 million speakers
NAME IN THE LANGUAGE ITSELF:
кыргыз
Kyrgyz is (together with Russian) the official language of the Central Asian Republic of Kyrgyzstan. In Kyrgyzstan, the percentage of people claiming Kyrgyz as their first language dropped from 67% to 44% in the years between 1926 and 1970 due to the large influx of non-Kyrgyz speakers. After Kyrgyzstan became an independent nation in 1991, an aggressive language policy tried to revert the trend and re-"Kyrgyzify" the country.
Outside of Kyrgyzstan, Kyrgyz is spoken in neighboring countries such as China (150,000 speakers, mainly in Sinkiang-Uighur Autonomous Region), Mongolia, Kazakhstan, Tajikistan, and Uzbekistan.
Kyrgyz is an agglutinative language: Suffixes are added to fixed stems to generate meaning. The word order in sentences is Subject-Object-Verb, and therefore Kyrgyz has postpositions rather than prepositions and relative clauses precede the verb.
FUN FACTS:
• There has been some confusion around the terms Kyrgyz and Kazakh. Kazakh has been referred to as Kirghiz, Kirghiz-Kaisak, or Kazakh-Kirghiz while Kara-Kirghiz or Kara-Kyrgyz (black Kyrgyz) was the term used for what we call Kyrgyz. Also, in Turkish, Kyrgyz is known as Kırgız Türkçesi (Kyrgyz Turkish).• By only allowing Russian as a base langauge, they avoid several of the issues discussed in Майкрософт vs. Microsoft, aka On choosing the most reasonable inconsistency.
Click here for more information about the Kyrgyz language CLASSIFICATION:
Kyrgyz is a member of the Kipchak group of languages which belongs to the Turkic languages (which also include Azerbaijani, Tartar or Uzbek). Its closest relative is Kazakh.
Click here for more information about Kyrgyz classification
SCRIPT:
In Kyrgyztan, Kyrgyz has been written in a modified Cyrillic script (with 3 special characters) since 1940 (while in China a modified Arabic script is used). Until 1923 an Arabic script was used in Kyrgyztan and after, from 1928 to 1940, Kyrgyz used the Unified Turkic Latin Alphabet. There were plans to return to the Latin alphabet in the early 1990s, but these were never implemented.
Click here for more information about the Kyrgyz script
Enjoy!
So, every now and again we have a bug in a function that is part of Windows.
You may have run across this sort of thing in the past.
Now there are times that one can't convince people that the scenario that causes the bug is reasonable, so that even if they acknowledge that the problem exists, they don't feel it makes sense to fix.
The bugs in Excel and .Net described in Seeing double? You're not drunk; you're just running pseudo! (aka Announcement: Pseudo Day!) are an example of this kind of problem. The fact that a particular locale (pseudo, in fact!) hits problems that any user can repro by customizing some of the Regional and Language Options settings doesn't bother the appropriate owners of Excel or .Net.
That bug is not the subject of today's blog.
Today I'm going to talk about another bug, one found in GetTimeFormat and GetTimeFormatEx. One which has existed in Windows for over a decade. And one that you can see with an existing locale, in a very visible part of the user interface.
To see the bug, just switch to any one of the German locales, and change your time formats to one of the optional formats that includes a ' Uhr' literal in it:
The approved syntax for a string literal within a format string is to enclose the string in single quotes -- which is how these formats are designed, for German. my German is pretty rusty but I think Uhr means "time" or maybe "clock" or something like that -- the way in English we might say "O'Clock". A German speaker whose knowledge is not based on Yiddish (which is really just 16th century German with Hebrew letters) should feel free to correct/clarify that point!
Any call to GetTimeFormat or GetTimeFormatEx with the TIME_NOSECONDS or TIME_NOMINUTESORSECONDS flags.
While the straight time format will be something like
11:04:27 Uhr
the format with TIME_NOMINUTESORSECONDS will be
11 11r
and the format with TIME_NOSECONDS will be
11:04 11r
Now it is pretty easy to see what is going on here -- the fact that ' Uhr' is a literal is ignored, and the fact that the 'h' when not treated as part of a literal has a specific meaning in the format string causes the hour to be inserted.Of course the time in the system notification area (aka the 'system tray") goes through this code path, as do other fun spots like the time in the CMD 'dir" command.
Related bugs can be found with custom formats involving literals containing m or s as well as h, but given the issues raised in Seeing double? You're not drunk; you're just running pseudo! (aka Announcement: Pseudo Day!) I'm going to focus on the time fomats in actual shippingt locales....
And they can all show this terrible, weirdly formatted time string that has no excuse for the poor parsing job that it does, and has been doing for over ten years now.
Even after the refactoring work they did in the most recent version, described in We do seem to be short on time... (Windows 7 edition).
Okay, it is technically not a regression.
I would hardly recommend fixing it in a hotfix, or a GDR, or a service pack.
But I would never leave it broken this way in any function I owned for a major version....
Would you?
If not, then at a minimum the documentation should clearly warn against TIME_NOSECONDS and TIME_NOMINUTESORSECONDS since they do not work properly with string literals that contain format picture inserts. Punishing a locale for its letter choices is just unseemly....
The other day, colleague Gwyneth asked:
Is there a date mask for SSUPERSHORTDAYNAMES? NLS Name Date Mask ValueSDAYNAMES ddddSABBREVDAYNAMES dddSSUPERSHORTDAYNAMES ? Thanks,Gwyneth
The quick answer here is simple.
No.
And the slightly longer answer is also pretty straightforward:
No, there is no date mask that will make use of the LOCALE_SSHORTESTDAYNAME* Constants.
Of course the slightly longer answer is kind of candy-ass, and on top of it there is a prescriptivist passive/aggressive style correction on the field in question.
Kind of obnoxious of me, I'm sorry about that.
How about I give the full answer now?
Background:
Sometimes it can be hard to remember the original reasons that a feature was implemented in Windows, even though those very reasons may have guided the way that the feature was designed and any limitations thereof.
Now we don't make the situation any easier by not really documenting any of this, either. In essence we leave the average developer to look at the design and guess what we had in mind.
And as a special non-bonus, if a developer wants to design something that the implementation doesn't allow, they get to be really frustrated at being left out of the "blessed locus of supported scenarios."
For the less snarky answer, you can look to blogs of mine like Size matters (when it comes to day names, at least),
The full answer:
By most any honest, objective measure, the attempt to take the LOCALE_SMONTHNAME* Constants and the LOCALE_SDAYNAME* Constants and provide shortened versions of them via the respective LOCALE_SABBREV* Constants was and is an unmitigated disaster.
Because there are a number of languages that do not tend to, as a matter of course, abbreviate such words. Thus many of the providers of locale data for those specific locales returned the same values for the abbreviated month and day names as they did for their full brethren and said we do not abbreviate those terms.
But just because of this situation does not mean there was no scenario in mind -- there is!
I mean if you are building a little calendar, for example:
then you must have a shorter name for the days is you want them to fit in this kind of grid.
So we had some meetings.
And in those meetings we had some conversations.
Out of those conversations we had a spec for seven new LCType values, whose original spec'ed names were the LOCALE_SONELETTERDAYNAME* constants.
We were sending a firm message to everyone providing locale data that these fields were gonna be small.
If languages like Hebrew wanted to have day names like יום א and יום ב (compared to their full day names like יום ראשון and יום שני), they were going to see us truncate their day names and each of them would look identical. And they would look really dumb.
And it was by this kind and gentle method of brute force persuasion that we forced everyone to consider this bold new scenario of the calendar view when they came up with a very very very short day name.
Eventually we started calling them two-letter day names because lots of people didn't like even like
S M T W T F S
for English due to the potential Saturday/Sunday and Tuesday/Thursday confusion.
These people who were complaining were forgetting about the scenario though -- no one would confuse the meanings in the context of a calendar!
But we gave in -- first calling them LOCALE_STWOLETTERDAYNAME* constants and then finally the LOCALE_SSHORTESTDAYNAME* Constants.
For some locales a few of them are even three letters, like Hungarian's
H K Sze Cs P Szo V
Though it is of note that both Cs and Sz are unique sort elements in Hungarian, so these were being used more because that is how Hungarian users think of letters than for disambiguation as was the case in English.
As an interesting corollary, a part of our hardcore sell and push where we pointed out that we wanted to use the LOCALE_SABBREV* Constants for this purpose but too many locales weren't cooperating. At this point, LOCALE_SABBREVDAYNAME1 had an implied meaning of "abbreviate if you ordinarily would" that no one wanted to change. Many locale data owners actually ignored that implied meaning and updated their abbrevidated day names to match their shortest day names!
For the record, we neither encouraged nor supported such a change since whn it aws done it amounted to a change in the semantic of the LCType, which is why we wereaddiong a whole new type in order to avoid anyway. Though some locales did it anyway (like Hebrew).
I described this phenomenon previously in They pushed out of the formatt[ing|er].
Eventually, all of this led to an important question that the original spec did not address because we were all so focused on our bloodthirety imperalistic scenario that no one thought of it:
What token should we use if we want to put these "one letter day names" in date format strings?
We had a bunch of rules for how dates and months were grabbed already, as in this table:
Symbol
Description
Example
M
Displays month with no leading zeros.
1/1/06
MM
Displays month with leading zeros.
01/1/06
MMM
Displays month abbreviation.
Jan
MMMM
Displays full month name.
January
d
Displays day of month with no leading zeros.
dd
Displays day of month with leading zeros.
01/01/06
ddd
Displays abbreviation of the day of the week.
Sun
dddd
Displays full name of the day of the week.
Sunday
We were clearly making use of items that mapped to LOCALE_SDAYNAME1 and LOCALE_SABBREVDAYNAME1 with dddd and ddd, respectively.
The program manager who owned the spec rattled off ideas we quickly shot down, from ddddd to changing the meaning of d or dd, before we went back to the original scenario and said these are for calendars, notr for date formats.
This time, lacking an elegant technical solution (everyone hated the alternatives), this firm line in the sand stuck.
And now we are here, with no way to represent the shortest day names in format strings....
I'll bet you many of you are wishing you had just taken the short answer now!
THE WINDOWS 7 MĀORI LANGUAGE INTERFACE PACK IS LIVE!
Click here to download the Māori Windows 7 LIP via the Microsoft.com Download Center.
Please note that the Māori Windows 7 LIP can only be installed on a system that runs an English client version of Windows 7. It is available for both 32-bit and 64-bit systems on the Download Center.
The Māori Windows 7 LIP is produced as part of the Local Language Program sponsored by Public Sector.
A LITTLE BACKGROUND INFORMATION ON MĀORI
100,000+ speakers as their first language, plus many additional speakers who can hold a conversation in Māori but don’t consider themselves fluent in the language.
Reo Māori
The Māori language is almost exclusively spoken in New Zealand, mostly by people of Māori descent. The Māori language was recognized as second official language of New Zealand in 1987 (the other being English). It was the predominant language in New Zealand until the 1860s, then became a minority language in the shadow of English. In the 1970s concerns about the decline of Māori initiated a multitude of activities to support the survival and growth of the language. One of the language recovery programs is the Kohanga Reo (“language nests”) movement in which language immersion centers were established for pre-school children. Over 13,000 children have been enrolled in more than 700 Kohanga reo centers throughout New Zealand. There are less primary and secondary schools that continue the language exposure, but Kura Kaupapa, a primary school program in Māori, is a first step to address that issue.
With ten consonants and five vowels Māori has the richest phonological inventory of all East Polynesian languages, and the number of its speakers exceed that of any other Polynesian language. Māori is the most southerly member of the East Polynesian branch of the Austronesian family. It is most closely related to Tahitian and Rarotongan. In the 1820s a phonetic script was developed for Māori in Cambridge. Long vowels are marked with a macron: ā ē ī ō ū.
Click here for more information about the Māori language
CLASSIFICATION:
Comparative linguists classify Māori as a Polynesian language; specifically as an Eastern Polynesian language belonging to the Tahitic subgroup, which includes Rarotongan, spoken in the southern Cook Islands, and Tahitian, spoken in Tahiti and the Society Islands. Other major Eastern Polynesian languages include Hawaiian, Marquesan (languages in the Marquesic subgroup), and the Rapa Nui language of Easter Island.[
Maori is always written in a Roman script. Two digraphs are used, “ng” represents a velar nasal, and “wh” which is nowadays mostly pronounced like an English /f/.
MICROSOFT-SPECIFIC INFORMATION:
MICHAEL-SPECIFIC INFORMATION:
As the title mentions, New Zealand is one of my five favorite places in the whole world. Many years ago, I had my (comparatively younger) heart broken in Melbourne (Australia), but a beautiful lady in Christchurch (New Zealand) helped me move on, and smile again....
And yes, fo those keeping score linguistically, the title of this blog is kind of a garden path sentence!
Sometimes really useful documentation is created.
By Microsoft, I mean.
I mean, people never read it -- but that does not stop if from bring useful sometimes!
Like this page containing the Windows 7 Keyboard Shortcuts.
But (and you knew there had to be a but in there, didn't you?) these are hard documentation pages to localize.
Because the shortcuts themselves are localized in some cases but not in others, and the people doing one kind of localization are not the same as the people doing the other kind. And there seeme to be a marked lack of communiction between them, or means to communicate what is going on.
Thus when you look at the Spanish version of that Windows 7 Keyboard Shortcuts, you run into problems. For example, to save a document in English you would use CTRL+S (as in "Save") but in Spanish it would be CTRL+G (as in "Guardar").
But that Spanish page still lists CTRL+S for Save, in the otherwise properly localized text.
Now multiply this one mistake by many other shortcuts, ad then multiply that by many other languages.
You now have a lot of different bugs,without a good mechanism or architecture to support correct descriptions.
Of course over in Office it looks like this particular problem doesn't appear to exist on their analgous keyboard shortcuts documentation pages, though Office may have other breaks along the same lines. As might other products, too.
This ends up being a pretty huge item to clean up, of course. Especially when you consider that the way centralized help pages and the plan to localize them were not really all decide at once, and the mechanisms to fully support everything weren't there (which makes solving it in documentation even harder).
I suppose someone needs to fix this.
Though I wonder why it is not reported more often by customers who speak these languages. I mean it has been reported by one person, a developer trying to write a Windows Phone 7 app showing people all the shortcuts, but wouldn't you expect user complaints too? Lots of them?
Sigh.
I suppose we're lucky sometimes that no one reads the documentation....
ABSOLUTELY NOTHING TECHNICAL. All of what happened that is described in this blog you are reading occurred either before, during, or the time following Alaska Airlines flight 613 from Las Vegas to Seattle on January 9, 2011. It represents something that I'll call a failure of sorts, but only because I attached more to the situation than might have been there. Several friends have suggested that I am being a little too hard on myself, but that is, in the end, how I roll.
CES was over, and the AEE was over. The exhibits and the meetings were over. The various parties and after parties? They were over too.
For now I was in the Continental President's Club, even though I was not flying on Continental. It was some deal Alaska Airlines had made since I was flying first class.
Plugged in and charging, I worked on my laptop to try to catch up on mail.
I lost track of time and started packing up to head to the gate almost 30 minutes later than I wanted to, and when I reached the gate, it was empty. By the door to the jet-way there was one Alaska Airlines employee who saw me and said "you must be Michael Kaplan."
She smiled and told me not to worry, there was one other person they were waiting on, so I still had time
Just outside the plane, the light attendant admired me moving from two wheels to four, and she said "Wow, that's like the Cadillac of wheelchairs, isn't it?"
I smiled and agreed. She helped get my laptop on the plane.
I was in 1F. My preferred seat -- I like not having to get up out of the way of the person in the window heading to the bathroom, and I like the extra leg room that the front row provides. It allows me to be much more nimble once the plane lands!
Next to me, in 1D, was a woman.
Now despite what regular readers might think after Better than an elevator friend, but... and Better than a single serving friend!, it is seldom that the person sitting next to me on the plane is female, even less often that she is a female that strikes me as attractive, and its lottery winning odds that I have any substantive interaction anyway.
So I didn't think of this is as any kind of special opportunity.
Truth be told I was looking forward to several hours of sleep on this particular flight. After less than six hours of sleep over the previous three days, I needed it.
It turned out that the one other person we were waiting for was someone who got off the flight to get something and promised he'd be back in time for the flight. His wife was still on he plane and they had come on late, so their bags were randomly stuck in various random overhead bins. And the wife was now making noises about getting off the plane so they had to find all her bags.
I made an offhand comment about this man and his wife to the woman next to me, and she smiled.
At this point, I knew absolutely nothing about this woman other than the fact that I loved her smile. :-)
Unsure how to really inspire random smiling, I was grateful when she turned out to be very curious about me!
Between my wild tales of parties and of "the other conference" and so forth, she had so many random questions, which led to things I would say which led to things she would respond to that we were talking for pretty much the entire flight.
Serious chemistry was happening in the first row of flight 613 that night, let me tell you!
When we were nearly ready to land I suddenly realized that she had no idea I was in a wheelchair, and there was no way to follow her to baggage claim since I had to wait for the iBot. So I broke the news about it, suddenly dreading the reaction for the first time in I don't know how long.
No need to worry, she didn't even blink. "So that was your wheelchair the flight attendant was talking about?" she inquired, smiling.
I asked her if she had seen Fight Club and she had. So I told her I had to wait for them to bring the chair up, so if we never saw each other again than she was a stellar single serving friend. But it would be really nice to see her again.
She smiled again. "You have a bag checked, right?"
Yes, I do.
"Well then don't worry, I'll see you in baggage claim!"
Okay, everybody filed off the plane. The two flight attendants sitting in the front of the plane came back, and one of them was clapping.
"That was quite a performance you two had going on!" she laughed.
I quipped "That lady who was sitting next to me? I think tonight you met my next ex-girlfriend!" thinking about the TV show NCIS and the way DiNozzo approached the marriages that Gibbs occasionally had.
"Really?" the flight attendant asked.
"Well, maybe just friends. Chemistry doesn't always lead to physics, after all."
Both flight attendants laughed.
Everyone was off the plane now. This is longest time it has ever taken baggage handling, including the time they broke my scooter in Germany.
I asked about where the iBot was; I was really getting nervous. "She isn't going to stay all night after all the baggage is collected and everyone leaves," I worriedly stated.
It was unfathomable that Alaska Airlines baggage handling was salting my game, all while the Alaska Airlnes flight attendants are rooting for me. But there it is.....
"She'll still be there. We saw how she was looking at you" one of them says.
"Maybe she was meant to be something at some point, and baggage handling is mucking with our relationship mythology," I opined.
They smiled and one of them called down again.
Finally, the iBot came. I made sure it was working and rolled as fast as I could to baggage handling. Of course the belt was not moving, and no one was there.
Though when I fully rounded the corner, I saw her. There was that smile again....
My bag wasn't on the belt though we found it pretty quickly. And then we just started talking, once more.
Almost like a second date or something!
It was just about midnight, too late to pick up the light rail since no bus would be running to Redmond til morning. So I realized I'd be taking a cab.
I gave her my card and told her my alias (she's a Microsoft employee too), and suggested we continue this conversation some time soon....
It wasn't until I was in the cab that the warm buzz of the chemistry wore off and I realized that I didn't have her alias. Or her last name. and even though she said it earlier in the evening, I was drawing a blank on her first name, too! With a first name, knowing her group can make finding her easy, but without it there is no non-creepy way to find her....
Crap, that meant I couldn't even send the polite "thanks for the entertaining flight to Redmond" mail that I would have sent to almost anyone I flew with under these circumstances.
No worries, I thought. She has my name and my email and my twitter and my phone number and my blog (all on the business card), so it'll all work out, I reassured myself.
You can guess how this story ends, right?
I never heard from her.
That mystery woman who works at Microsoft with the enchanting smile that I clicked with on a flight to Redmond never made contact after that night
I knew she wouldn't make contact sooner than 3 days (or maybe even 10 days following my friend Stacy's ultra-careful rule). But I wasn't banking on never. Even writing this blog is me giving up on ever finding her in a way that wouldn't make me really creepy....
Taking a brief trip through my neuroses, a myriad of potential reasons for the lact of contact pop up, each of which have happened to me at least once before:
Maybe I have the wrong take on it, though. The wrong take entirely.
Perhaps she thought that I wouldn't have been such a dork that I'd forget her name (knowing a team and a first name of a woman at Microsoft is usually more than enough to track her down). I was the one who ended the evening, maybe she took that to mean something. The fact that I know I would have sent that email suggests that she knew I would -- so perhaps the fact that I didn't is something that she took as a lack of interest on my part. Her whole version of the story might have culminated in my apparent eventual lack of interest after seeming to be very interested!
This is all pretty unlikely, but guys can work through scenarios and rationalize them in all kinds of hopeful ways. Maybe she will be reading this blog and be torn between being either much more interested or hella less so.
And maybe, just maybe, she's wondering how I had managed to turn myself into a single serving [boy]friend....
Now just as a by-the-way, I am aware that the title of this blog might at first glance be considerd specious by people who both know who Gallagher is and who know aout the long history of the Srivasian "laugh" pun. I will explain why it isn't at the end of this blog.
For many years, Sinnathurai Srivas has been trying to convince whoever he can that Unicode is missing out on some fundamental issues by not encoding two different pronunciations of what visualy is the same letter, and how this impacts so many future technological problems that Tamil would be hitting.
He said it differently than this -- for example talking about how Tamil uses a scientific system that Unicode ignores, and going down the rathole of pointing out how the way Unicode encodes things would cause actual meaning to change since only one of two possible options exist for encoding. His favorite example was that sometimes one means ksh, other times one means x, and by only encoding it one way we are unscientifically limiting the language.
Or he'd put it in a slightly different and more "native" way and point out that "In Tamil sri mean laugh. In Tamil sRi is a religious symbol" to imply that putting these two words and making them the same is an insult -- laughing at a religious symbol, essentially.
People would point out that this is not a problem that Unicode tries to solve, that we have both Polish as someone from Poland and polish as something we use to make something smooth/clean. And even the fact that the rules in English can cause the latter word to be capitalizd (e.g. when its at the beginning of a sentence) does not make anyone want to treat the two forms of "o" as different letter in Unicode.
If we want to say the sentence "Hitler didn't polish off the Polish" then we are okay doing that, and that is how the language works.
This example not only takes a more formal and less formal word example, but by invoking Hitler there was hope it might end the conversation. Sadly, this was not the case....
If we were funnier we'd dig up an old Gallagher bit talking about stupid English is, like this one:
But no one really though of that, I guess. Or thought of mentioning it, at least.
This approach may have led (and may have already led -- or may one day lead) to a Tamil comedy routine along similar lines. If you are a Tamil comedian and choose to do this, make sure to thank me (or at least Gallagher) in the liner notes!
What he was trying to express was his own subconscious frustration over the fact that Tamil does not have (and has never had) a 100% correspondence between graphemes and phonemes -- i.e. one sound per letter shape.Few languages have this (Latvian comes close, though), and he is hardly the first person to express the frustration within their own language or the language of another. Examples like Vowel "harmony", enforced by political interests? show that people who do not fully understand this concept who nevertheless have the power to make changes to a script sometimes will, in fact.
Srivas never learned this lesson.
Anyway, none of that matters. Because just two days ago Srivas sent the following message to Unicode's Indic list (a list with less traffic than the general Unicode list but which makes up for it with per capita mesage silliness):
Subject: [indic] Scalable but simple Indic and Asian writing systemsEnglish alphabet do not have some of the very basic alphabet and I'm proposing toadd new characters to English through Unicode Consortium because I find it difficultto transliterate between English and other language.Example, the "th" represented in English is not acceptable. It has to be a singleand fundamental character. Then, there are at least 3 different basic soundsrelating to this "th" such as 'thick', 'this', 'mother'. this means we need to addthree more characters to English alphabet. Similarly there are many other alphabetrequire attention, with regard to English.Further, the Asian, (primarily Indic) languages are very complex and random. Theseneed to be made logical and simple, but represent all that is required incontemporary use in a simple way and also allow for expansion with simplicity inmind. So I'm going to make proposals to UC on highly scalable and simplified Indicand simplified Asian writing systems.Please comment.
And now we have come full circle.
Unable to convince us that Unicode has been destroying Tamil, he is now pointing out how without more letters encoded to handle every different pronunciation in English, there will be severe problems trying to handle English too. Thus he is going to make proposals to stop that same problem in Asian/Indic languages.
But this ignores the different pronunciations in different parts of the world of the same words due to dialectcal difference. It ignores peroidic vowel shifts that occur. It ignores the fact that different languages using the same script can use the same letters pronounced differently (and never forget that Unicode encodes scripts and not languages).
The very first message I have from Srivas in my archives is from over eight years ago. It was about this same issue and the need to encode not only the ksh in riksha (ரிக்ஷா) but the x in Luxmi (லூக்ஷ்மி). The only progress we have made is that the old arguments woud point out Tamil language puns (with examples that conflate formal and informal language that could cause offense) and the new arguments start from the use of the English language that aren't serious or amusing.
He would have been better off finding videos like that Gallagher snippet. :-)
Though nothing would have been any different.
There are literally several dozens phonemes in the English language (exact number varying with dialect), and none of them are encoded separately in Unicode, which handles the Latin script and not the English language.
The things that look the same but are pronounced differently would, if added to Unicode, make all kinds of advanced natural language processing tasks easier, at the price of making simple input more difficult (and then you still have to deal with everything that wasn't input correctly like the tons and tons of existing data), all to solve a problem that Unicode ever signed up to solve in the first place....
Of course people looking at the timeline will note that by citing a Srivas pun from 10 years ago and providing a Gallagher clip in today's blog, I appear to be implying a timeline of 10 years in the title on that basis even though the youtube clip is actually a collection of appearances that is much older. but actually, the clips are from HBO specials from slightly more than 10 years before the initial Srivas sri/sRi "laugh" puns. So really it was just falsely assuming that my blog title implied a timeline from point A to point B that would have led to this misinterpretation. In fact the humor of Gallagher predates the puns of Srivas by over a decade, something that I knew because I know my Gallagher and have the 3-disc Smashing Watermelon Collection to prove it!.
The question I was asked:
Did anything ever happen with the Y1C problem? I haven't seen it in the news.
Complicated question, that! On several levels.
It all started a long time ago.
And when I say a long time, I mean like millenia ago.
There was this tendency to base the Chinese calendar on the current emperor's era name and the year of the emperor's reign.
This system was a popular export of China's, making it's way to Japan and even Korea, over the years.
It totally is how I would do things if I was an Emperor. Your Mileage May Vary here, of course, but that's how I would roll....
Anyway, all of that changed with Sun Yat-sen, though.
Sun Yat-WHO?
This guy:
This politically impressive man did many very cool things (you can read about some of them on Wikipedia, here). One of the smaller yet still significant things was to propose a way to keep the "same old" system of basing the calendar on the emperor even when there was no emperor anymore. The plan was to consider January 1, 1912 of the Gregorian calendar to be the first "year of the republic". Now since of course the intent was to not have an emperor and of course for the Republic to keep on trucking, it would serve to end the steady stream of calendar year resets that the old system was so known for, while keeping the very same system.
"Embrace and Extend", as it were!
The name of the Republic? 中華民國. And the two-letter abbreviation for the sake of the calendar? 民國, all set. Very recognizable kind of scheme.
Of course it was not until a few decades after the fall of the last emperor that the calendar took hold, and after a few more decades, the Communist revolution in mainland China happened. While there was initially some talk of a similar "solution" by naming a new calendar era (perhaps 人國 or somesuch, like "for the people" to contrast with "for the republic"), the eventual decision was made to just move completely to the Gregorian calendar in the People's Republic of China. You should remember that back in the beginning then they considered simplified Chinese to be an interim step to something even more romanized. One imagines moving purely Gregorgian may have come out of that age, but the forces involved are complex enough that I doubt they could ever be fully discerned.
Meanwhile back in Taiwan, which still calls iteself and often thinks of itself as the "Republic of China", still uses this ROC calendar -- a calendar that was formally established in the late 1920s with a start date in 1912. Which means that they are going to have to start dealing with a problem that has been essentially unheard of in China or Japan or Korea -- a calendar era of more than 100 years.
Remember my Y oh Y does YYYY sometimes mean YY, you ask?
Yeah, that blog.
And this is the basis of the Y1C problem -- the year 100 of rollover the ROC calendar.
Now for the most part there was no problem here, since most systems were allowing three digits for the years, in part because the intent of this calendar was to not expect a reset based on a new ruler in some time less than 100 years. The effect was pretty minor. In theory problems could be popping up more this next year, but it is widely not expected to be debilitating. To be honest the 11-year difference with Gregorian dates is more often causing problems (mainly with expiration dates of food) than anything else.
As a side note, North Korea has a similar problem with the fact their their calendar dates from the original birth date of Kim Il-sung, the ruler of North Korea in 1948. I have yet to hear of specific problems in North Korea though it isn't like I am swimming in contacts there. Maybe someone reading here knows -- they don't tend to report out about their IT issues (and it's not like we have an open calendar model for Windows that would directly support North Korea anyway).
Anyway, the ROC calendar support is generally available in Windows only in Taiwan, a technologically epic compromise that keeps Microsoft from taking sides, and in Taiwan there are regularly people recommending they just move to the plain old Gregorian calendar, something that has been resisted to date given the important amount of symbolism that the ROC calendar represents to many people in Taiwan. And in the PRC it is also symbolic -- of a government that they consider themselves to have replaced.
One thing I found interesting in talking to several friends of mine who live in the PRC was how so many of them knew of Sun Yat-sen. When one considers comparable figures after the overthrow of the Tsar in Russia (e.g. Trotsky, Kerensky, etc.), their roles in Russian history were much more marginalized in Soviet rule years than in this somewhat comparable situation in China. Perhaps I am overstating that contrast, but it struck me as interesting....
Of course looking at the ROC calendar as a true extension of the Chinese calendar is probably a bit specious anyway, since it isn't like they want pre-1911 dates thought of as anything out of the end of the Qing Dynasty. All of which makes me happier that the folks in PRC didn't want to symbolically extend the Chinese calendar themselves since carrying around two versions of it, each of which the other would denounce, would make the Windows calendar story even weirder than it is already. And would stick Windows, and Microsoft, in the middle of a debate that we probably wouldn't want to be in.
All of this makes the job in Windows and .Net much easier since we don't have to track those other eras and dynasties and Emperors and such, something that
Common ground FTW? :-)
Anyway, all of this has gone way beyond the original question about the Y1C problem, which really isn't too much of a problem (except maybe in North Korea?).
I should probably say something about the Japanese calendar, too. But I'll save that for another day....
Ryan had a great question yesterday morning:
This function doesn’t seem to always tell the truth – I get compiler errors trying to use the string below as an identifier, but IsValidIdentifier/CreateEscapedIdentifier seem to have no problem with it. Is there a more reliable method to detect when a string has invalid characters for an identifier?CSharpCodeProvider cscp = new CSharpCodeProvider();string test = "イゟゝだヴゝヺドダぐるフもむ";Console.WriteLine(cscp.IsValidIdentifier(test)); // TrueConsole.WriteLine(cscp.CreateEscapedIdentifier(test) == test); // Also trueOk, try it:
Hmm, that's weird.
I mean, being told you have a valid identifier and then having it not work out as an identifier can be a little unsettling.
Wolf was able to point out some details on the specific character:
The problematic character (ゟ) is Unicode U+309F, “Hiragana digraph yori”. It’s a “vertical contraction” of (よ) and (り). I suspect it’s a late Unicode addition.
And Eric Lippert then gave the full answer soon after:
Yori was added in Unicode version 3.2.0, March 2002. The previous letter, KATAKANA LETTER I (U+30A4) was added in Unicode version 1.1.0, June 1993.Odds are good that the (written in native C++) compiler has baked into it an extremely old version of the Unicode “is this a letter?” algorithm. The (written in C#) CSharpCodeProvider is apparently using a more recent version of that algorithm.It seems to me the fact that we may be using an almost ten-year-old out of date version of the Unicode algorithm in the native compiler could reasonably be characterized as “a bug”. I’ll mention it to QA.Cheers,Eric
He got that in one!
I remember when the Yori Digraph was added, and the concern some people had about taking what was wide regarded as a complete set (Hiragana), and adding a new character that would not be in code pages like 932 (Shift-JIS). Those complaints were considered but ultimately rejected, which makes sense given that Unicode has always been a superset of these code pages anyway.
Now there are interesting issues that come up once you start adding version specific knowledge into the world of identifiers, which should make this an interesting one. It will thankfully be made easier by the fact that Unicode never removes characters, though there is no shortage of people who ask for this or that character to be removed because its very existence bothers somebody.
Such changes would break the companies unlike Microsoft that do proper version-specific checking/support of their identifiers, or maybe even one day companies like Microsoft if they more aggressively support such things. :-)
In a future I'll blog I'll dig into the wacky world of identifiers and show that sometimes the only remedy for a Mark is a Ken....
The other day, the question came in again.
And when I say the question, I mean THE question.
You know, the What Unicode version do you support? question.
Well, technically it was a slight variation, more of a What version of Unicode is supported in passwords? but clearly the same question is being asked. Someone administrating password policy for users of Windows in their organization want to know what version of Unicode is used in pass word validation.
Unfortunately, not every question that is reasonable to ask is necessarily one that has a reasonable answer.
Sure, to start with there is everything I pointed out in the What Unicode version do you support? blog back in the end of 2005.
But there is a bigger issue here.
The fact that no validation is done on text in the password to remove illegal characters and/or replace them with � (U+fffd, aka REPLACEMENT CHARACTER).
It doesn't care about illegal sequences or non-characters or any other of the rules that Unicode may have.
So really, it almost never ever matters!
I know, everyone is tripping on the use of the word almost in the previous sentence. They either know the exception, think they know it, or know that I am about to trot some exception out....
Well, whichever one it is for you, strap in now please!
The exception is password filters.
First there is the password filter Microsoft provides, described in Strong Password Enforcement and Passfilt.dll. Note the rules it runs under, particularly the character classification information:
Uppercase letters of European languages (A through Z, with diacritic marks, Greek and Cyrillic characters)
A, B, C, … Z
Lowercase letters of European languages (a through z, sharp-s, with diacritic marks, Greek and Cyrillic characters)
a, b, c, … z
Base 10 digits (0 through 9)
0, 1, 2, … 9
Non-alphanumeric characters (special characters)
$,!,%,^,(){}[];:<>?
Any Unicode character that is categorized as an alphabetic character but is not uppercase or lowercase. This includes Unicode characters from Asian languages.
Note A given character can satisfy only one category. The GetStringTypeW function is used to test whether each character in the password is uppercase, lowercase, or alphanumeric.
Okay now obviously if you are using the built in passfilt.dll then there are some Unicode version dependencies here, based on the version of Unicode used to supply data for GetStringTypeW. And possibly not including different Unicode nornalization forms.
And of course if you use other password filter providers instead of or in addition to this built-in one (or even create your own, ref: Password Filter Programming Considerations), it could contain rules with version dependencies as well.
I personally find these rules to be rather weird in some senses, since there are "Unicode-esque" tricks that could be used to make a password much harder to guess or divine or even get at via keyloggers in the cases of specially design keyboards and unusual characters in Unicode, and yet all of such complexities are given minimal weight under the default passfilt.dll algorithm
In the past (prior to working for Microsoft) I have helped several customers create both unusual keyboards as alluded to above and password filter DLLs that treat many of the complexities of Unicode as a valid method of increasing password complexity. And Unicode is a great area to help provide such additional complexity and to increase the ability to ferret out common lookalikes that Unicode might provide.
Unicode version information has certainly bled into some of those (either dependent on or completely independent of the information GetStringTypeW provides).
But other than such things, passwords know no Unicode version....
Warning: the above applies to Windows; other applications may work by entirely different rules and when/if they do such rules tend to be STUPIDER in one or more senses.
THE WINDOWS 7 KISWAHILI LANGUAGE INTERFACE PACK IS LIVE!
Click here to download the KiSwahili Windows 7 LIP via the Microsoft.com Download Center.
Please note that the KiSwahili Windows 7 LIP can only be installed on a system that runs an English client version of Windows 7. It is available for both 32-bit and 64-bit systems on the Download Center.
The KiSwahili Windows 7 LIP is produced as part of the Local Language Program sponsored by Public Sector.
A LITTLE BACKGROUND INFORMATION ON KISWAHILI
5-10 million speakers as their first language, 50-100 million speakers as a second language.
NAME OF THE ANGUAGE ITSELF:
KiSwahili
KiSwahili (or commonly called Swahili) is a language widely spoken in Eastern Africa and parts of Central Africa. It is national language in Tanzania and Kenya and extensively used in Uganda and the eastern part of the Democratic Republic of Congo. It is understood in areas of Rwanda, Burundi, the Comoros Islands, the southern part of Somalia, the northern parts of Mozambique and Zambia, and even the northwestern coast of Madagascar. While it is native tongue for only 4-5 million people, more than 50 million people know Swahili as second language and use it as a lingua franca, making it the most widely understood language in Africa after Arabic.
Click here for more information about the KiSwahili language
Swahili belongs to the Bantu group of languages which are part of the Niger-Congo language family.
The first script used for Swahili was Arabic. The Latin script used today was adopted in the middle of the 19th century.
Sites like Newsweek will post an article entitled Why learn Mandarin? China won’t make you speak it with a message about how much people in china who are going to own everything and do everything will be using English for so much of what they do so no one has to worry.
Just in case you were worried about this.
But were you?
I mean, people like what they like, right?
Looking at the less sensationalistic aspects of my recent blog PC LOAD LETTER? What the f**k does that mean [in Chinese]?, it is obvious that there are forces in government that look at what some consider to be the negative impact of Anglicization of Chinese enough to want to forbid it. Clearly, they want to keep Chinese a being Chinese, in China.
All the article is saying is that the Chinese aren't stupid.
But I think we already knew that, didn't we?
I mean, I know I did. you probably did, too.
The article started with
The data would seem to be in: China is poised to become the world’s economic leader within the next few decades. But there are those under the impression that this will mean a sea change in the world’s linguistic terrain as well. Certainly, any human being who seeks education, influence, or power should be learning Mandarin, right?
Now let's think about this for a minute.
If you want to sell product in China, which is poised to become the world's economic leader and all that, do you think you will understand what they want and what they like if you give them your English products?
Remember the fact that the whole country doesn't speak English. How far will you get trying to convince people to buy your English product? Even ignoring the fact that the government will step in and make you translate it pretty quick anyway, and also fully support the Chinese language via GB18030 compliance and so on....
Some people will need to understand Chinese culture and language and concepts and attitudes in order to put the product or products in a place that makes them attractive and interesting for the market!
So having some smart people choose to not learn Mandarin is great if you want to have everyone grow up so that smarter people in China who learn English can sell products to you and your kids and their kids.
But if you and your children and your grandchildren want to sell products to Chinese people too, then perhaps you may want some of those English speaking kids to be as smart as those Chinese speaking kids that are learning English.
And perhaps they could read a little less Newsweek, if architected ignorance and catering to xenophobia is going to be their best contribution....
THE WINDOWS 7 BANGLA LANGUAGE INTERFACE PACK IS LIVE!
Click here to download the Bangla Windows 7 LIP via the Microsoft.com Download Center.
Please note that the Bangla Windows 7 LIP can only be installed on a system that runs an English client version of Windows 7. It is available to download for both 32-bit and 64-bit systems.
The Bangla Windows 7 LIP is produced as part of the Local Language Program sponsored by Public Sector.
For the Bengali - India LIP, see 4 out the door, in both 32 & 64 (aka What Irish, Malay, Maltese & Bengali have in common), pubished at the end of July, last year.
A LITTLE BACKGROUND INFORMATION ON BANGLA
150 million native speakers
বাংলা
Bangla is sometimes referred to as Bengali, but is usually referred to as Bangla in Bangladesh where it is the official language.
It was also made an official language of Sierra Leone to honor the Bangladeshi peacekeeping force from the United Nations stationed there.
Bangla is spoken in the region of eastern South Asia known as Bengal, comprising Bangladesh (where it is spoken by about 150 million people) and the Indian state of West Bengal (where it is spoken by 55 million people). With more than 200 million speakers it is the second most widely spoken language on the Indian subcontinent and among the 5 languages with the most native speakers worldwide.
INTERESTING FACTS:
Bangla belongs to the Eastern Indo-Aryan languages which are part of the Indo-European language family. Together with its closest relatives Assamese and Oriya, Bangla is the most eastern of this large language family. The dialect of Bangla spoken in Bangladesh is different from the Bengali dialect spoken in India.
Bangla is written in the alphasyllabary called Bangla or Kutila-lipi which highly resembles the Devanagari script used for Sanskrit, Hindi or Nepali. The script consists of 12 vowel characters and 52 consonant characters. Like in all alphasyllabaries, or abugidas, characters for consonants have embedded vowels (or an extra diacritic showing that there is no vowel).
Note that Unicode encodes the script as an abugida.
Click here for more information on the Bangla language.