Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Content of Michael Kaplan's personal blog not approved by Microsoft (see disclaimer)!Regular readers should keep in mind that all I said in The End? still applies; the allusion to the X-Files continues for people who understand such references....
The question that was asked over in the microsoft.public.dotnet.international newsgroup by Samuel was:
I upload a text file with the following French character 'ç' and the server receives the following: 'ç' insteadAny explanation?Thank you,Samuel
Regular readers might get an astounding sense of deja vu at that, due to remembering past blog posts like Should old aquaintance *not* be forgot, code pages may screw up their names anyhow or Do not adjust your browser, a.k.a. sometimes two wrongs DO make a right, a.k.a. dumb quotes or Linguistic and Unicode considerations (or Language-specific Processing #4) or What's the encoding, again? or Consistent garbage text can be incorrect encoding identification (or detection),
It is UTF-8. :-)
If one looks at Windows code page 1252 one will see the following mappings:
so it makes sense how someone could mix up those two bytes and think they are UTF-8.
In fact you can do it in Notepad! If your default system code page is 1252, take those two characters and save them to a text file, save it, close it, and open it.
You will see your
ç
become
ç
the same way.
In the end, Samuel's bug report has a few possible causes:
so it may or may not be a bug still -- but with who or where the bug lies? To answer, more information is definitely required....
This post brought to you by ç (U+00e7, aka LATIN SMALL LETTER C WITH CEDILLA)
Randy asked over in the Suggestion Box:
While U+0131 is not at all new to your blog, this gives its importance quite a different spin.http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail
Now Randy was technically the third person to ask me about this one, but one of the other ones was via the Contact link and the other sent it by email, so in the interests of trying to use social engineering to encourage Suggestion Box usage, the other two will remain anonymous and all official credit will go to Randy. :-)
The Gizmodo story is entitled Localization Problems: A Cellphone's Missing Dot Kills Two People, Puts Three More in Jail.
It is quoted in full on the Hoax Forum with a title of Deadly Texting Error, and I will quote it again here:
The life of 20-year-old Emine, and her 24-year-old husband Ramazan Çalçoban was pretty much the normal life of any couple in a separation process. After deciding to split up, the two kept having bitter arguments over the cellphone, sending text messages to each other until one day Ramazan wrote "you change the topic every time you run out of arguments." That day, the lack of a single dot over a letter—product of a faulty localization of the cellphone's typing system—caused a chain of events that ended in a violent blood bath (Warning: offensive language ahead.)The surreal mistake happened because Ramazan's sent a message and Emine's cellphone didn't have an specific character from the Turkish alphabet: the letter "ı" or closed i. While "i" is available in all phones in Turkey—where this happened—the closed i apparently doesn't exist in most of the terminals in that country.The use of "i" resulted in an SMS with a completely twisted meaning: instead of writing the word "sıkısınca" it looked like he wrote "sikisince." Ramazan wanted to write "You change the topic every time you run out of arguments" (sounds familiar enough) but what Emine read was, "You change the topic every time they are fucking you" (sounds familiar too.)Emine then showed the message to her father, who—enraged—called Ramazan, accusing him of treating his daughter as a prostitute. Ramazan went to the family's home to apologize and was greeted by the father, Emine, two sisters and a lot of very sharp knives.Injured and bleeding, with a knife on his chest, Ramazan tried to escape. Emine was still trying to finish him but he managed to take the knife out of his chest and attacked her back. Ramazan finally escaped, but Emine bleed to dead as the family waiting for an ambulance to arrive.Confused by all the events, he later killed himself in jail.Apparently it's not the first incident of this kind caused by the damned dot on top of the letter i. The local press has pointed out that the faulty localization of cellphones in Turkey is causing "serious problems" when it comes to certain "delicate words" in Turkish, and they are calling to enhance localization of technology to avoid these mistakes.Alternatively, the press could ask for banning knives from the homes of demonstrably stupid people.
You can also see it in Turkish over on HÜRRİYET, under the title Küçücük bir nokta tam 5 kişiyi yaktı.
Now the Hoax Forum has not yet passed judgment on whether it is in fact a hoax. Proof takes time. And actual journalism. Of course. :-)
For the record, I hope it is a hoax, since it involves death and injury, something I would not generally want to wish on people.
But if that part is just an exaggeration then I hope people in technology at least take the lesson an start choosing to take language issues seriously due to potential problems in using and understanding in even non-dire circumstances.
and all you who are so sure that such different meanings can be ascribed to such similar words as
sıkışınca
and
sikişince
Well, I have plans to get into the vowel harmony topic soon but in the meantime keep in mind that we are talking about very different letters. Just like
d
b
p
are very different letters (made up each of a line and a circle) yet we do not find ourselves mixing up the bear and the pear from or dear loved ones. So the underlying story is believable to me, whether the dire consequences are or not.
If it is true, it does end up being a truly epic example of vowel disharmony - and a serious warning to cellular providers to spend a little more time caring about Unicode support for their devices and their software....
This post brought to you by ı (U+0131, a.k.a. LATIN SMALL LETTER DOTLESS I)
Ben Supnik asked over in the Suggestion Box:
Hi,Your post on dead key states "Sometimes you *want* to interfere with the keyboard's state buffer" was very helpful in understanding what's going on with ToUnicodeEx. But I have the opposite problem of Sebastien:I want to be able to calculate the unicode key that is generated by one or more virtual key presses. But I am supporting a legacy API; I need to dispatch this unicode key with the second key press (e.g. in the second WM_KEYDOWN) rather than wait for the WM_CHAR message. Similarly, I need to know when the first WM_KEYDOWN comes in that it is a dead key without waiting for WM_DEADCHAR.Here's the part that surprises me: when I call ToUnicode with the virtual key code of the dead key (in response to the very first WM_KEYDOWN I get two unicode chars back, typically a repeat of the spacing version of the diacritic. For example: on the first key press of the ^ key on a French keyboard, I get a return of 2 and a buffer with ^^ (a pair of non-spacing circumflexes) when I would have expected to get a return of -1 and a single ^, telling me the circumflex is getting set up.What am I missing here? It looks like something is calling ToUnicode with my vkey code once before I get the WM_KEYDOWN message, and thus pre-loading the unicode buffer. I don't think I can drain the buffer because when the second (real) key is hit, I'll have drained out my precious dead key state.Could you please revisit this topic? In particular, what part of the system is looking up my vkey so early that I'm losing my dead key state? Is there any programmatic access to accumulated dead key state?
The post that Ben refers to is Sometimes you *want* to interfere with the keyboard's state buffer.
The first answer is to the last question asked, an the answer is that there is no way to access that buffer directly.
But let's look a little deeper at the bigger questions here.
One of the changes that happened in Windows is hinted at in the help for WM_KEYDOWN and WM_KEYUP:
Windows 2000/XP: Applications must pass wParam to TranslateMessage without altering it at all.
Thinking aloud for a moment...
Why it would be necessary for you to not change the wParam (which contains the virtual key) -- it shouldn't be important to avoid making such a change if the processing is all to come after the TraslateMessage call.
Perhaps if it was doing that lookup or that caching without waiting?
To be honest I am guessing aloud here, and will be forwarding this speculative blog to someone who should know the actual answer and who can explain the difference between them. :-)
But my advice here would be to do the same thing that MSKLC does --it loads up the information contained in the entire keyboard, not only to display all of the information but also to simulate what the keystroke processing code does without being negatively impacted by mixing up the system with multiple sources of ToUnicode and ToUnicodeEx type calls....
Although this is a bigger investment upfront, in the long run it is much more maintainable than trying to make queries that can (or apparently in this case, does) negatively impact the state of the buffer.
If one really wants to go down the harder road, that solution is not too awful:
Now as I said I am going to try to get answers on the question of what is going on here since it appears to violate my (admittedly older) understanding of what happens during keystroke processing, and it was only after testing both of the methods I suggest above that I realized that even if the underlying behaavior did change, neither method is broken.
So perhaps the only real bug at this point is in the documentation, which does seem to be a little off now (though I am trying to get my head entirely around how!).
This blog brought to you by අ (U+0d85, aka SINHALA LETTER AYANNA)
It was just hours ago that I pointed out in Windows doesn't let you choose the pinch hitter in digit substitution cases that there are even more problems with digit substitution than I had mentioned before.
Let's review:
Could it get worse?
Of course it can!
Let's look at this exciting case in Vista....
First we'll take a machine with an Arabic user locale and make sure that it is set to context:
Now with context definition:
the shape depends on the previous text in the same output
the meaning seems pretty clear. In fact you can even see it from the sample -- in an LTR context we'll get the regular Arabic-Indic digits1, and in an RTL context we'll get Hindi digits2.
Looks like it only cares about the user locale setting here for the purposes of the sample, right?
Now file paths, which start with drive letters, tend to start off with a rather LTR context, thus it is no surprise that we see Arabic digits:
and also in the breadcrumb bar view:
And that is kind of expected.
But then when we switch to an Arabic user interface language, while the full path looks the same:
The breadcrumb bar view is another story:
What happened?
Suddenly my "of course there is no difference in English" claim is refuted somewhat -- it is the correct and expected result, but for the wrong reason.
Because the Breadcrumb bar is looking at each chunk3 as if the context has been reset, and thus a folder that does not start with an LTR character is given RTL context, even if the overall path looks at things differently.
Also, it is not the user locale that runs the "context" setting, it is the user interface language. Despite the fact that the setting itself is tied to the user locale.
And if you have any setting other than context, then it is just the user locale. The implication is therefore that the context setting can have two very different levels of requirements:
Another fun experiment might be to look at whether that Arabic/Farsi limitation applies to the user interface language requirement the same way that it applies to the user locale setting. And on the assumption that Regional and Language Options is not synthesizing the user locale dependency for the sample and that USER controls are affected, it might be good to understand why Explorer and the Shell seem to be working under this additional level, and what rules it is working under.
I'll give away the answer in case you haven't spotted it -- if the control itself is an RTL one, then it gets the RTL rules. For whatever reason, Explorer in this case has rules for what kind of controls to create that are based on rules ties to the user interface language. It is unclear whether it uses the Uniscribe-like LANG_FARSI/LANG_ARABIC rules or the NLS rules, though. Hopefully it is the latter which would be more correct even if also more confusing and less consistent (since even former Uniscribe owners often consider the Arabic/Farsi limitations to be bugs!).
Ironically, no one has been fast to fix these problems, in large part due to the fact that the context substitution setting is not widely used in these other places where it is broken. So there is not a huge scenario to make something work better that most people won't want anyway? :-)
No matter what, the whole area is still incredibly confusing and not-very-well-documented!
1 - Although they are called by some Arabic numerals, they are not really used for Arabic (which uses different digits)2 - Although they are called by some Hindi digits4, they are not used for Hindi5 3 - I guess these items could be called something else, I don't know the terminology; if the technology were called a breadloaf bar, I'd call them slices!4 - Note that ٠١٢٣٤٥٦٧٨٩ are actually (according to their names) "Arabic-Indic digits" in Unicode, despite the fact that they are actually known to some as Hindi digits5 - Hindi itself uses what some call Devanagari digits, which are technically Indic (though they are not to be confused with the Arabic-Indic digits, which are actually Hindi digits)
This blog brought to you by ۶ (U+06f6, aka EXTENDED ARABIC-INDIC DIGIT SIX, which is also not an Arabic-Indic digit!)
Developer Gloria asks via the Contact link:
I recently tried to use SetLocaleInfo with LOCALE_SNATIVEDIGITS and the context setting for LOCAL_IDIGITSUBSTITUTION, and it failed. Even when set to native my digits aren't all being used. Are there rules for these settings?
Long-time regular readers might remember where I started to explain what is going on here, in Digits -- there is no substitute. In the end. the digits you select here are not used directly; instead, whatever you put in for the second entry in that ten-character array will guide Uniscribe to decide which hard coded list of characters to use.
The rest is in the LOCALE_IDIGITSUBSTITUTION value, which, if set to the context setting, will only work when PRIMARYLANGID(lcid) is either LANG_PERSIAN or LANG_ARABIC. All other Arabic script and Indic languages that have other digits have this setting ignored unless it is one of the other two settings....
There are other problems with that context setting, which I will discuss in an upcoming blog....
This post brought to you by U+0be6, a.k.a. TAMIL DIGIT ZERO
If you are easily offended then please stop reading now....
I had a regular reader ask for assistance very urgently on kind of a personal matter, and although I am usually immune to such requests, I decided to make an exception in this case.
It relates to my blog While vacationing, idle random thoughts on the potential influence of Unicode on 'alphabet soup'.
He initially found that his knowledge of Hiragana and Katakana led to the generally positive results I suggested:
If you ask, I will deny any personal knowledge of what I am about suggest, but if you are thinking in this direction and you know some Japanese then considering slow Hiragana eventually transitioning into slightly more rapid Katakana, in iroha order if you can manage it (again, without reciting the poem!) for reasons that are very complicated to get into but if you look at the orders you might have some hints, really can inspire miracles. Similar effects are possible with most "curvy" scripts (e.g. the handwritten form of Hebrew, Tamil, Telugu, Georgian) moving into less curvy ones (e.g. Armenian, Bengali, Devanagari); the Latin alphabet really seems less suited to the whole effect though if you know no other alphabets then lowercase to uppercase and maybe sans-serif to serif might have the (for lack of a better word) "appropriate" impact....
Unfortunately, the implementation in this case coincided with an out-of-town business trip, which it apparently led to a bit of suspicion -- in this case suspicion of infidelity.
I guess you could say that somethings things can work too well? :-(
Of course I am a huge fan of trying new things and being creative in order to facilitate a relationship which is more interesting and fresh, but perhaps I should have suggested a more measured pace in the introduction of new things if the actual introduction of new things is itself a new thing. Although it does seem slightly out of scope, I would want to add neither chaos nor disharmony, then simply arguing such a technicality technicality in my defense, so the warning seems (for lack of a better word) proper.
My advice, should one find oneself in such a situation, to come clean about the alphabet thing -- but if you can focus on the words being spelled out (iroha order is if nothing else quite a nice poem, and a nice song lyric or poem of any sort can't ever hurt!) rather than focusing on the alphabet itself, which is not particularly sexy to anyone who doesn't have crush on Big Bird!
Coming (for lack of a better word) clean is not such a bad way to go in this case, and even if there was no words or poem then one will likely find that alphabet is something easy to prove in the (for lack of a better word) appropriate situation
If everything goes wrong you can always point to the blog as where the idea came from, I believe it might go over better than some of the other available sources of this information. General overall happiness should quickly follow.... :-)
This post brought to you by ৡ and ৠ (U+09e1 and U+09e0, aka BENGALI LETTER VOCALIC LL and BENGALI LETTER VOCALIC RR)
I had a friend call me and tell me to watch Oceans Twelve last night, after verifying that I was still watching TV with closed captions on.
Mysterious, but I figured what the hell.
Right near the beginning I saw what he wanted me to notice.
Now I have made my peace with the fact that closed captioning does not support Unicode (an issue I have talked about before, e.g. here).
All of The Amazing Yen's parts were captioned with:
Yen speaking Mandarin
I saw it right away, even before the actor could be seen:
Yet the big inside joke of the Oceans Eleven remake was that Yen was speaking Cantonese, and that only Rusty understood him.
Mandarin? Was this yet another error in closed captioning?
Well, it depends, really.
I mean, by the time Oceans Twelve came along, everyone seemed to understand Yen -- so they had dropped that particular joke.
And no clues come back from Qin Shaobo (the actor) since he was born in Guangxi, China -- where Mandarin is one of only two official languages but where Cantonese is widely known. So really he might know both well enough for the part (I don't know Cantonese or Mandarin well enough to guess from the small sample, where none of my small vocabulary came up. :-)
There was one terribly funny joke in there. From the script:
Yen pops his head out from a small tube and says something in Chinese. Frank shrugs...doesn't understand. Yen tries again.... This time he enunciates very clearly and talks very loudly (like Americans do when foreigners don't understand English). Frank nods, starts turning the handle of the water pump in the opposite direction. Yen climbs down out of the tube.
And that is funny. Notice in the movie how Bernie did understand him the second time. :-)
Compare that to the Ocean's Eleven script:
Silence. For a moment, each man keeps his two dozen questions or more to himself. At last, one speaks up... The Amazing Yen. In Cantonese. Of course, no one understands him. Except Rusty. RUSTY (in response) No. Tunneling is out. There are Richter scales monitoring the ground for one hundred yards in every direction. If a groundhog tried to nest there, they'd know about it. Anyone else? Another silence. Either the guys are too dumbfounded by that bilingual exchange or too numbed by the task ahead of them to speak.
Any Chinese speakers see any of the three parts who can identify The Amazing Yen's language? Is this a case of an error in the closed captioning content, or a simple change from script to screen for reasons unknown?
This blog brought to you by 係 (U+4fc2, a CJK UNIFIED IDEOGRAPH)
This blog is one of those ideas suggested by my friend Andrea back during that conversation I blogged about in I'm aware of that: an Andreaesque segue and intervention, of sorts.
She is kind of interviewing. Interviewing to be the new Liz. Kind of like how Kirsten Cohen was interviewing with Ryan to be "the new Seth" while Seth was away in the almost healthy almost bounceback in Season 4 of The O.C. Hopefully Andrea's interview will go better than Kirsten's. Or season four's.
Her suggestion went something like this:
Andrea: Well I was thinking about songs and song themes. would you say that the Steve Taylor song you like and the Samantha Ronson song you like are opposite sides of the same relationship? Well, not relationship, but the same people, and the way that most relationships happen with the same like people?Michael: Hmmm..... interesting. Not exactly that, though -- they are both about one person, each -- though from two different points of view. They could both be about the same person, though. I'll think about it, that could be interesting....Andrea: Since it was my idea, I guess I'm aware of that.Michael: Though I think your take on it was wrong.Andrea: I'm aware of that.
Okay, so the two songs are Samantha Ronson's Built This Way, heard on the soundtrack of the movie Mean Girls, and the Chagall Guevara song Tale O' The Twister, heard on the CD cut of the soundtrack of the movie Pump Up the Volume.
Liz and I had previously talked about both of these songs, though at different times and neither of us ever connected them. But she was not beyond making new connections so the premise, though forced, is plausible, at least.
So we'll run it up the flagpole and see who salutes. :-)
First I'll put the lyrics out there for both:
Built This WaySamantha Ronson
Tale O' The TwisterChagall Guevara
Did you ever feel like you wanna besomeone else for just one daydid you ever feel like you wanna seethrough another pair of eyesDid you ever think I might wanna bewith anyone else for just one dayDid you ever really think of mewhen I walked away The look the dunks and the bottle of Jackthe smokes the slouch and my eyes backyou think you know what you think you'll findyou think you'll figure me out tonightbut you'll never know what I won't share'cause I don't care, no I don't careyou think you'll figure me out tonightbut I don't care And I wonder, if I'm just built this way'cause every man that I know makes me feel like I'm to blamewhen it's over, me and my selfish waysgo back to start againgo back to start again Did you ever feel like you should have saidsomething smarter at the timeDid you ever feel like you should have keptit all to yourselfDid you ever think it might be your faultI never promised any moreDid you ever think it might not be meno it was always meThe look the dunks and the bottle of Jackthe smokes the slouch and my eyes backyou think you know what you think you'll findyou think you'll figure me out tonightbut you'll never know what I won't share'cause I don't care, no I don't careyou think you'll figure me out tonightbut I don't care And I wonder, if I'm just built this way'cause every man that I know makes me feel like I'm to blamewhen it's over, me and my selfish waysgo back to start again And I wonder, if I'm just built this way'cause every man that I know makes me feel like I'm to blamewhen it's over, me and my selfish waysgo back to start again (2x) The look the dunks and the bottle of Jackthe smokes the slouch and my eyes backyou think you know what you think you'll findyou think you'll figure me out tonightbut you'll never know what I won't share'cause I don't care, no I don't careyou think you'll figure me out tonightbut I don't care And I wonder, if I'm just built this way'cause every man that I know makes me feel like I'm to blamewhen it's over, me and my selfish waysgo back to start againAnd I wonder, if I'm just built this way'cause every man that I know makes me feel like I'm to blamewhen it's over, me and my selfish waysgo back to start again (3x)
She was a cool blue redheadShe was a virgin vixenShe had the eyes of LassieShe had the lips of NixonLips like Tricia NixonI stole a sideways glance at her continental shelfAnd I know she was the devil himselfIt's a barstool yawn to a stuttered come-onIt's a dirt road rut,She said, "Button up, mister"I shook as she took another look"Have you ever been hooked," she said,"By the tail of the twister?"And with a brain like EinsteinAnd with a form like sinUp on the roof of Trump TowerShe said, "It's yours on a trade-in"Think about itShe was drawn to blood like a lean loan sharkA tornado to a trailer parkIt's a long black carIt's power like a czarIt's temporary blissIt's like kissing your sisterBig wheels and you're feeling real fineIt's a temporary rideOn the tail of the twisterBig, big wheelsAnd you're sitting real highIt's a temporary rideOn the tail of the twisterCars and girlsIt's the details of designAnother bait and switch gameThat hooks me everytimeAnd it's paydayAfter sleeping with the devil you'd love to close the bookBut you gotta wonder how the baby's gonna lookIt's a wide-eyed stealIt's another New DealIt's the whore before the cartMy head's starting to blisterShe said, "You could be the envyOf everyone you envy"It's the tailIt's a long black carIt's power like a czarIt's temporary blissMy head's starting to blisterI got took as she took another look"Have you even been hooked," she said"By the tail of the twister?"
Now I have had a chance to think about this one a bit further....
As a quick "by the way" many people claim that the line in the Samantha Ronson lyric was I'm too plain rather than I'm to blame but I am certain they are wrong and have heard her say as much once (the lyrics have also reportedly been given as such on her MySpace page, further bolstering the claim.
I'll call them Built and Tale since otherwise using those nouns over and over again would really wear me down and all. :-)
Both songs are clearly about particular women (Built is about a woman talking about how all of her relationships seem to go a particular way, and Tale talks about one woman -- a woman who it is easy to imagine that her relationships go that way).
And both women are rather aloof about, or perhaps Built would prefer to say disconnected with, the men they are with.
I've met and occasionally even dated women in the past who seemed like the virgin vixen of Tale -- that kind of confidence in herself, that kind of belief in her ability to change a life. To have the influence and impact of a twister. Only having the one interaction per woman, it is hard to know if they were touched by the relationship (whether it was witnessed or experienced). I'd like to think so, but the front she puts up makes it seem clear that I will never know for sure. Calling them relationships is likely an overstatement, though -- there is a real disconnection there, and it does seem to be her that is driving it.
And I've been friends with and occasionally even dated women in the past who like Built seemed to wonder if they were just built that way when the relationship ends. They probably don't wonder it every time, but they do feel disconnected from the men they are with. In the end it doesn't work out, and though all of the men make her feel like she is to blame, she is perhaps not so quick to deny it as pone might imagine. Maybe she is just built this way and there is little else for her to do but try again.
I'll flatter myself enough to claim that I likely occasionally had impact in the latter case (though probably not the former) when I was in them. And perhaps the tantalizing idea about them being the same woman and the only differences related to how close I was able to get (either as friend or lover) is tantalizing because it redeems the former more than a little bit. It implies that someone is able to touch them, though Tale seems all but untouchable. The fact that Built claims to be untouchable since we'll never know what she won't share and all, yet I know she was occasionally touched makes me wonder whether Tale is emotionally touchable by the right person.
And in the case of the Built woman (Wow, what an odd phrasing! Though at some level both were clearly "built" that is not how I meant it!) I had the chance to see they did have a heart, even if said heart had no eventual impact in changing the course of the relationships that don't happen. You can even see it in the song -- her asking him all of those questions, explaining that it is her. She cared, even if only about not wanting inflict pain.
It is in the nature of men to live in hope and push it into delusion as need be. She will dance with him once and then refuse for the rest of the night. from her point of view she was being polite and now can't shake him; from his point of view she was interested and perhaps still is so why not keep trying? Neither sees the way the other looks at it through the eyes of the other, but either way it clearly isn't meant to be.
So I think I will choose to be enough of an optimist to say we are talking about the same kind of woman in both songs, though the cynic in me is quick to point out that whether true or not the relationships end so that even if it is true it doesn't make a ton of difference. Trying to rehabilitate the view someone after they break up with you really is putting the whore before the cart (to use Steve Taylor's rather visceral phrase about Tale), but when one has decided that Built isn't a whore and that Tale is her, it is simple logic to decide that tale isn't one either.
Does it matter to her? Not to Tale since he doesn't know about it and probably wouldn't care if she did. But to Built it would matter -- the occasional bout of guilt about the cyclical nature of the relationships she had is somewhat assuaged if the men feel better about her as a person.
But it matters to him. And I'm not just saying that because at times it has been me. Once beyond the bitter phase it is much clearer that negative feelings about the whole thing aren't healthy.
It was Damon Wayans who (in The Last Boy Scout) once said "I want to meet the bitch who fucked you up". It is a sentiment even expressed to me, on occasion. But that kind of thinking isn't very productive in the long run, so it is seldom something that I respond to -- we are responsible for our own actions, and both Built and Tale need to get over themselves enough to realize that we all have impact on each other, whether we realize it or not.
Both songs are on the Zune and in some (though not all) of the playlists, so I hear them from time to time when they come up in the rotation. When they inspire memories (as they sometimes will) they inspire ones that are good. Which I think is a pretty healthy view on the whole.
But perhaps I will feel differently the next time I am dumped under either Tale or Built circumstances. So all of that is a tad situational. :-)
So Andrea, was that what you had in mind? I'm not sure how organic that felt, from a process standpoint (we mostly talked about what was important not to me but to her -- so the new Liz would have to be a bit more willing to bring her own concerns and interests to the mix -- mine just aren't as interesting to me!). But I suppose maybe another interview is possible, there is some potential here.
This post brought to you by ꒝ (U+a49d, aka YI RADICAL YO)
Regular reader Jan Kučera asked over in the Suggestion Box:
Hi again,I know the behaviour I mention here is not problem you can solve, but I'm interested in handling RTL fragments in "plain text". What I've encounted is like this.I have both IMAP and web access to my e-mail. I don't have a SMTP server, so I send the mails from web and read them in Outlook 2007. One day, I wanted to know the author of Hebrew lyrics to the Maya the Bee song, so I wrote an e-mail to an Izrael TV which had a page about the series. The title of the e-mail was "Maya the Bee (הדבורה מאיה)" and I repeated these words in the message body. Need to say, I have the web mail configured to write plain text e-mails.The surprise came with the answer. On the web, everything was okay, as I had written it. But in the Outlook, although the title remained ok (the hebrew phrase being selected from right to left), in the message body, I saw הדבורה first, followed by מאיה, letting the user select the row with hebrew text without troubles, char by char, from left to right.When I copied it and pasted to the notepad, everything was ordered and behaving okay again.The mail was encoded in 1255 and the sender used Thunderbird 2, but I don't think this is too important since in IE and other applications the text is formatted as it should.What is more important is the title, encoded as "Subject: Re: Maya the Bee ( =?windows-1255?Q?=E4=E3=E1=E5=F8=E4_=EE=E0=E9?= =?windows-1255?Q?=E4=29?=" which could prevent Outlook from interpreting badly the title too.E-mail reply was in HTML.Now, the question is, beside whether this is a bug at all, how could be RTL phrase rendered in LTR, and what could we, as developers, do to avoid this issue in our programs.PS: The answer to my question is Dan Zakai (דן זכאי). Or... דן and זכאי, as shown by Outlook? :)
It is actually not that hard to discern the relationship between
הדבורה מאיה
and the weird part of the string in
"Subject: Re: Maya the Bee ( =?windows-1255?Q?=E4=E3=E1=E5=F8=E4_=EE=E0=E9?= =?windows-1255?Q?=E4=29?="
Just look at that Windows code page 1255 chart:
So it is some kind of encoding of text into cp1255 with the text in appropriate logical order that anyone who understands the format should be able to use to decipher the text.
And on the other hand anything that doesn't understand the encoding technique is quite apt to misinterpret it and not show what us expected....
For the body, if whatever control is holding the body knows how to properly use the Unicode Bidi algorithm then it will properly display the text, though the behavior Jan describe that at least some pieces do not know how to interpret the text properly. The fact that it does not corrupt the text makes it somewhat easier to be okay with the interim display issues. :-)
Avoiding this kind of issue? More or less the answer us to avoid processing text in these interim stages, since it is likely way too easy to corrupt the text in the meantime.
Other recent posts of mine like this one and this one and this one jump into the handling of RTL fragments with LTR text and LTR fragments within RTL text. Which is not easy under the best of circumstances though tune in a I might suggest some additional methodologies to consider. :-)
This blog brought to you by ה (U+05d4, aka HEBREW LETTER HE)
The problem has its roots in Mixing it up with bidirectional text and The Bug(s) Spotted, aka Design flaws are worse than bugs, two blog entries which talk about specific lamenesses with the bidirectional support within Windows.
I don't want to imply that there aren't more problems beyond these. Because to be perfectly honest, there are.
Microsoft is incredibly lame here, though to be frank for a moment only lame in a way that everyone else is too, right now. Including Unicode.
To illustrate, I'll need a sample bit of text.
Let's build up a path. :-)
We'll take a nice little English string:
NAME (BIG)
And then we'll make another one in Hebrew, kind of a localized version of that string.
שם (גדול)
It is really quite reasonable to hope one could take these chunks, create a path with them (one chunk per directory) and have everything come out right.
I mean a path like:
C:\NAME (BIG)\שם (גדול)\NAME (BIG)\שם (גדול)
may be a Destryian scenario, but at its root it's just a small valid scenario that you would really want to work.
Let's try it with no special decorative control characters and leave it to the whim of your browser:
C:\NAME (BIG)\שם (גדול)\NAME (BIG)\שם (גדול)
It didn't look right on all four that I tried (Safari, FireFix, Opera, and Internet Explorer).
How about in Notepad?
Well you can choose your means of failure there via the right-click menu:
vs.
Let's try it on the latest and greatest version of Windows, as a path:
Hmmmm. Not so great in the breadcrumb bar, huh? What if we click in the address bar space to get rid of the breadcrumb bar:
Still broken, those tokens. All of the English ones look fine, but the Hebrew ones are broken.
Maybe we can do better on a Hebrew user interface language.
We'll look at the breadcrumb bar again:
Well, good news and bad news here -- the Hebrew looks good now, but the English is broken!
Is the hope for
such a fruitless one? So very unreasonable?
Turns out that if you are running on Windows, it is. :-(
Now obviously you can do some work here with U+200e (LEFT-TO-RIGHT MARK) and U+200f (RIGHT-TO-LEFT MARK) or other Bidi control characters to try and make this better, but obviously this is something one wants to have happening behind the scenes without requiring the user to add control characters to the string.
Especially a string where the intent is so obvious and easy to discern.... a slightly more complicated case than the one in Mixing it up with bidirectional text but not all that much more complicated, is it?
But it is by no means an easy problem for users to have to solve. so it really would be much better if the OS could do the heavy lifting here, rather than forcing it on everyone else.
Which is not to say there is some other operating system that magically does everything right here. Last time I checked, no one was doing so well in this space, and bidirectional support in these edge cases is kind of a myth for now....
Let's pause to do a little RCA (Root Cause Analysis) for the problems here -- that as a standard, the Bidirectional Algorithm is several levels lower than one needs to handle the mix of LTR and RTL scripts, and the various "clients" who more or less support the standard (be they application or operating system or browser or other) but do not provide a whole lot beyond it (other than sometimes providing that notion of a higher level definition of default directionality). It does quite well with cases like Hebrew that actually have some LTR pieces within themselves, but there is no good way to handle other script LTR text embedded within unless a bunch of other work happens. Work that no one really wants to provide. Remember what that one person said in response to that hack bug:
"The correct fix is to delete the test entirely. We are all-Unicode now. We don't need an old hack for Hebrew/Arabic Windows 95."
No one wants to do too much beyond Unicode even though plain Unicode alone (without making use of higher level protocols to place control characters) is insufficient for handling these cases....
Note that is also also one of the reasons RTL IDN is so complicated and looks so broken most of the time.
It all amounts to A place where everyone blows, equally.
This blog brought to you by U+200e and U+200f (aka LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK)
This is not a post discussing some kind of geopolitical issue involving Myanmar (Burma) and the Philipines.
You see the other day, regular reader Andrew West, in a comment to my Who forgot the culture?, asked:
Completely off-topic, but I notice that you embed the sponsoring character (U+1831, MONGOLIAN LETTER SHA) in an html font tag specifying "Mongolian Baiti" as the font face. It drives me crazy that IE7 (like IE6 before it) lists Mongolian as a "language script" that you can configure the font for, but it will not populate the font lists with any fonts regardless of how many fonts you have on the system that support Mongolian (including Mongolian Baiti), so it is impossible to actually configure what Mongolian font to use! The good news is that it does display Mongolian using Mongolian Baiti without explicitly specifying the font in the html, but the bad news is that I can't get it to use a Mongolian font other Mongolian Baiti without messing with the html. I just wish someone would fix IE ... or is this one of those Kafkaesque examples like Uyghur where Microsoft can't fix something, however broken, in case it breaks user expectations?
I suspected that I knew what was going on here, but it was really worthy of its own blog and I wasn't sure how quickly I'd get to it so I recommended he put a note in the Suggestion Box just in case it wasn't going to be quick....
Which he did:
I know I'm not going to like the answer, but can you explain how the font configuration dialog in Internet Explorer works, in particular the behaviour for Mongolian (font list is never populated) and Myanmar (font list is populated with fonts that cover Tagalog, but none that cover Myanmar)?
Then I had to cancel the blog that was happening for this slot and I ended up deciding to do it right away instead....
Andrew is right that he probably won't like the answer, but it is something that is fixable, an even technically work-aroundable if a font author is willing to do something that hr or she would ordinarily consider to be very stupid. :-)
Perhaps I should explain.
We'll start in the Tools|Internet Options... Fonts... dialog:
(I guess I have no Tagalog fonts!)
and for good measure we'll include one that has some fonts in it:
Now in the end the information on actual selections is stored in the registry, under
HKCU\Software\Microsoft\Internet Explorer\International\Scripts
which is clearly an Internet Explorer settings key with SCRIPT ID values 36 (Myanmar) and 39 (Mongolian):
But for the list of potential fonts, that is not IE at all; that is MLang.
Now I blogged this a bit over two years ago in Where are the IE plain text fonts?, and in that blog I mentioned:
Now the actual population of the two lists is happening via MLang, and as Paul points out you could think of the list on the left as being for proportional fonts and the list on the right as being for fixed pitch (monowidth) fonts.MLang goes through a two step process that I will get into in another post, coming soon. :-)
And since I never did get back to it, I guess Andrew has proof that things often get lost if they aren't put in one of those lists like the Suggestion Box! I am actually happy to have the proof because otherwise I look kind of petty or something with my request....
Anyway, I'll explain it now -- it all works via a Trust; But Verify! mechanism.
The Trust part is where it trusts the font to describe its own Unicode ranges in its own internal FONTSIGNATURE.fsUsb bits, the Unicode Subset bits. That is step one.
The Verify part is where it does a spot check on a specific Unicode code point in th script range, to make sure that the FONTSIGNATURE is not lying. Because FONTSIGNATUREs, like men, lie. Like that bit from the movie Up the Creek an it's fictional typographical version Up the Foundry between Tim Matheson (as the font) and Jennifer Runyon (as the user):
Font: I will tell you about my coverage.User: You wouldn't lie to me?Font: Of course I'd lie to you, I'm a font. But I'm not lying now....
In fact, it really relies on that Verify step and perhaps even skips the Trust step a bit, sometimes?
And it spot checks the font CMAP to make sure a specific candidate character is in it.
I mentioned there was as problem here, didn't I?
Here is where the problem sits.
Deep in the heart of MLang, in its mlflink.cpp source file, it has:
And this is why Mongolian never shows up (since it has no explicit character to check for) and Myanmar shows up when your font has Tagalog (since that is the character it looks for).
Which is the essential workaround for Myanmar -- add that one specific Tagalog character to your Burmese font? Totally obnoxious, but until/unless someone fixes MLang....
Let's put all the values in a table so you can see them:
You can probably see the other problem here -- all of the scripts that are missing; perhaps the fix needs to be a bit more than just the two broken ones, in the long run....
Speaking of which -- any NLS testers stirring about who'd like to enter a bug on this small bundle of MLang issues that will also affect IE8 on the next version of Windows if it isn't fixed? :-)
This blog brought to you by ᜀ (U+1700, aka TAGALOG LETTER A)
Over in the Suggestion Box, Bob Richmond mentioned:
Silverlight 2.0 - font and script support. Font embedding, native vs downloaded fonts etc. I'm not clear this has been fully thought through on the 2.0 Beta.
I must admit I find it more difficult to deal with items from the Suggestion Box that don't have a clear suggestion or question in them. :-)
In response, I would say 1) okay, 2) probably not, and 3) both yes and no....
I was at the internal MS Text Summit recently, and there was a pretty clear presentation about the project plan over the next few versions. So I think that there is a plan and that things will get clearer here as the beta progresses. So although I do agree that things don't look to be fully thought out, I don't think things will stay that way. I do expect that things are going to get clearer here....
But if not, I'll definitely be talking about it as things get closer to release. :-)
This blog brought to you by ䷪ (U+4dea, aka HEXAGRAM FOR BREAKTHROUGH)
So today our MVPs (Mihai and Mike) came over to building 24 and got the chance to talk to people from the various Windows International teams.
Everybody kept thanking me for setting up the meetings, but I didn't actually do very much -- just set up some meetings. The real effort was put in by:
Value is relative, but I look at the above two contributions as being of greater relative value than proper use of the clipboard and Outlook meeting requests. :-)
Anyway, at one point, Ian (who is also on the Windows International Fundamentals team) was explaining what we do and the description was very involved, and Mike asked him if he had to describe it in just a sentence or two what would it be. Ian's answer never made it down that small, but he got as close a he could while describing what all of the various folks are doing.
I was thinking about what that shorter description would be, and just couldn't think of anything.
Then I thought about how the main purpose was really to help fill in the gaps in international support.
So we are a bunch of International Gap Fillers.
And that we provide an International Gap Filling Experience (the word "Experience seems like one of those words that VP types like to hear these days).
And that as people on an engineering team, our titles were standard titles not so big now could be International Gap Filling Engineers.
Of course this train of thought derailed when I thought about the consequence of being called an International GFE or that we provided an International GFE -- based on the Urban Dictionary definition of GFE.
Even on the consulting side we don't do that kind of work!
So I am going to stick with that longer answer that Ian gave, it just seems safer -- and if words can be scary then acronyms can be even scarier. :-)
This blog brought to you by ꇥ (U+a1e5, aka YI SYLLABLE GAP)
Content of Michael Kaplan's personal blog not approved by Microsoft (see disclaimer)!Note that this post is entirely offtopic and if that kind of thing bothers you then you re invited to get out right now....
Way back in May of last year, George asked over in the Suggestion Box:
I've been reading about Silverlight at http://msdn2.microsoft.com/en-us/library/bb404300.aspx and it says the iiii will be under 2 MB. How much of that is the .NLP files that support System.Globalization?
The piece of that document he is referring to go something like this:
At the heart of Silverlight is the browser-enhancement module that renders XAML and draws the resulting graphics on the browser surface. It is a small download (under 2 MB) that can be installed when the user hits the site containing the Silverlight content. This module exposes the underlying framework of the XAML page to JavaScript developers, so interaction with the content on the page level becomes possible, and thus the developer can, for example, write event handlers, or manipulate the XAML page contents using JavaScript code.
Of course this article refers to Silverlight 1.0, which is actually more like 1 mb for the browser add-in, though currently even the 2.0 Beta 1 download claims to be about that same size. It does not include .NLP files within it (which altogether would have been more than the size of the entire download for just the data files), which could give rise to all kind of speculation, assuming that either
I am personally more willing to bet heaviest on the third item, myself. Anyone want to do their own guesses?
Ashish Thapliyal's blog has a post called Silverlight Roadmap Questions that also gives some hints about the final expectations though not much about this question, while the VB Team blog suggests that the download size for the Silverlight runtime was bigger than this to start and would hopefully be shrinking, whivh kind of puts the whole question somewhere strange....
This blog brought to you by ՞ (U+055e, aka ARMENIAN QUESTION MARK)
Over in the Suggestion Box, mpz asked:
Suggestion: Write about the new Latin capital letter sharp S introduced in Unicode 5.1.0.
Fair enough....
Though to be honest, by the time I get through:
I think my thoughts on the matter have been pretty much covered.
It is hard to say how things will go on that last point, as my opinions are fairly controversial and it is just as likely that they will not go in that direction....
But otherwise, the invention of letters that do not actually exist is quite powerful, as is the decision to ignore intuitive casing behavior or make unrealistic case mappings. Unicode has been doing it for some time and they seem pretty popular.
The whole issue makes me wonder about how Germany really feels about capital punishment, given all of the capital letter punishment they seem comfortable with. :-)
This post brought to you by ß and ẞ (U+00df and U+1e9e, LATIN SMALL LETTER SHARP S and CAPITAL SMALL LETTER SHARP S)