Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Sometimes I wonder if the posts I write are not clear.
The good thing about the blog is that it is a lot like me talking to people.
Of course the bad thing about the blog is that it is a lot like me talking to people....
I was thinking about this when I read Jan Kucers contribution to the Suggestion Box:
Hello!I'm reading your blog for couple of months and I've learned a lot of things.We've seen a couple of examples what we really should not do and some hints what is better.I'd like to know what is the most right way to compare strings while ignoring the case. (I work with managed classes but others could welcome unmanaged way as well.)From some of your posts it is clear that lower-casing is better than upper-casing, since there are lower case characters without upper case equivavalents.Also StringComparison.OrdinalIgnoreCase seems to be not the best win.So strA.ToLower() == strB.ToLower() ?or strA.ToLowerInvariant() == strB.ToLowerInvariant() ?or string.Compare(strA, strB, true) ?or string.Compare(strA, strB, StrinComparison.InvariantCultureIgnoreCase)?Does using CultureInfo.CurrentCulture for string operations mean that the code will behave differently over the same data when running under different culture? If so, wouldn't it be better to choose any particular culture?Well...is trustworthy case unaware comparation possible at all? :-)Thanks for any hints on this topics. Or have you already answered this in past?Jan
There is a lot in there that does not represent best practices, unfortunately.
There is a post in which I suggested a few guiding principles, entitled Browsing the shoals of managed string comparisons. In particular there is the bit at the bottom:
That third rule is the most important one....
For a slightly more complex breakout of items, you can see the post I mention in Something .NET does less intuitively than they ought that Josh Free wrote. Though for every person who has told me they found the table helpful, I have talked to at least one other person who found it made things more confusing -- the same way having all those different methods does.
But using the above three principles one should be able to resolve just about any question about appropriate string comparisons (whether case sensitive or insensitive)....
This post brought to you by Ω and Ω (U+2216 and U+03a9, a.k.a. OHM SIGN and GREEK CAPITAL LETTER OMEGA)
Rasqual asks:
Hello Michael,I'll keep the question short:What makes a 'good' encoding, and what makes it broken?
When I think back to the various code pages that I have considered to be broken in one way or another:
And I really can't discern any particular pattern in them -- some just have weird implementation issues, some are done in less than ideal ways, and some are simply outright broken.
So it seems like any code page that has troubles is one that I would call broken. I have pretty high standards here, like I do in other areas. :-)
This post brought to you by ။ (U+104b, a.k.a. MYANMAR SIGN SECTION)
Support Engineer Scott Heim had a question he asked yesterday:
Hi all,I have the MonthCalendar control on a form and when this is displayed in XP, the calendar displays correctly; however, on Vista machines the calendar appears larger and the “Saturday” dates are cut off. Has anyone seen this before? Is there a way around this?The form the control is on is not much larger than the control itself. I have a small repro here:[I compiled the repro and ran it on both Server 2003 and Vista to take the screen shots below -- michkap] Thanks,Scott
And support engineer Dave Anderson came to the rescue with the following response:
Yes, the MonthCalendar control is larger when using the V6 common controls on Windows Vista. You can adjust the size of the form based on the size of the control at runtime. I added the following code for the form’s Load event handler: private void Form1_Load(object sender, EventArgs e) { this.ClientSize = monthCalendar1.PreferredSize; } -Dave
Yes, the MonthCalendar control is larger when using the V6 common controls on Windows Vista. You can adjust the size of the form based on the size of the control at runtime. I added the following code for the form’s Load event handler:
private void Form1_Load(object sender, EventArgs e) { this.ClientSize = monthCalendar1.PreferredSize; }
-Dave
And indeed, when you add this code things fit once again:
Perfect. :-)
Now obviously this is a special case (a form that is meant to be the same size as the calendar) but the general principle can be applied in situations where controls are packed too tightly and changing the size might affect localized form by causing controls to overlap (definitely something to avoid).
One thing developers should be very careful about any time they are building dynamic UI metrics this way in projects that are going to be localized is to make sure that the fact that the UI metrics change at runtime is communicated to the localizer -- there are few things more frustrating than truncation bugs that a localizer can't do anything about but that they have to go through multiple iterations to discover that fact!
And now that I have hijacked the question to get up on my localizability soapbox, I'll close with a message of more general use. :-)
The messge? The fact that the Shell common controls do not guarantee backward compatibility with their metrics is an important issue to keep in mind -- or you could find yourself getting truncated, too....
This post brought to you by ⺦ (U+2ea6, a.k.a. CJK RADICAL SIMPLIFIED HALF TREE TRUNK)
Those were the exact words of the person on the phone, those words in the title.
I should back up a minute.
I have a land line phone with Verizon that I only use for emergencies like that recent power outage and also to torture telemarketers (I am not on the do-not-call lists; I only give the phone number to the sort of people who might sell things so I can treat every call like entertainment if I am going to even bother answering it).
The phone has every service stripped down and it is listed (another source of telemarketers, the phone book!).
Anyway, where was I?
Oh yeah. This guy on the phone asked "Are you Mr. Kaplan?"
I figure there is no harm identifying myself since the number is listed. "Yes," I respond.
"Vice president of Microsoft Customer Service?" he asked.
Hmmmm.
"No, that's not me," I reply.
He ends the call quickly. "I'm sorry, I must have a wrong number."
I guess he was looking for Richard Kaplan. Perhaps just going through the phone book calling all the Kaplan entries in the Seattle metropolitan area.
I must admit that it is an interesting way to call product support!
I might have commented had he not hung up so quickly.
And you know I always tell people that I am not Microsoft product support, but Richard can't really get away with that, I suppose.
I'd love to know how this all turns out but Richard and I are not golfing buddies and I am pretty sure we aren't related, so I guess I'll never know....
As an FYI to people, this is likely not the most effective way to get in touch with customer support (and also I never answer technical questions on that phone so it is not the best way to get a hold of me, either!).
This post brought to you jointly by ℡, ⌕, ☎, ☏, and ✆ (U+2121, U+2315, U+260e, U+260f, and U+2706, a.k.a. TELEPHONE SIGN, TELEPHONE RECORDER, BLACK TELEPHONE, WHITE TELEPHONE, and TELEPHONE LOCATION SIGN)
They say that a good lawyer never asks a question in court without already knowing the answer.
Well, I'd probably make a lousy lawyer.
Because when I was doing the research for What's up with MB_ERR_INVALID_CHARS?, I did not know the full extent of the overall limitations in the flag.
But given what I discovered, I made some recommendations.
Though I find myself really agreeing with Yossi and the comment Yossi left:
This inconsistency is pretty bad (the difference between how the actual Code page and best fit tables treat invalid characters). It renders MultiByteToWideChar pretty much useless in certain cases where these invalid characters are finding themselves into the output stream. I'm using MSXML2 to read an XML file which was produced after converting MBSC character stream to Unicode. Since the following characters:0x81 0x00810x8d 0x008d0x8f 0x008f0x90 0x00900x9d 0x009din the 1252 best fit appears to be "OK", the MSXML2 just fails to parse the file.Is there a way to resolve this problem (other than to scan the stream in a for-loop and replacing this invalid characters?Is there a version of MSXML2 that is consistent with the behavior of MultiByteToWideChar?
I do find myself curious about what method msxml2 is using here for its conversions that is managing to fail on these characters that are technically mapped in the code page 1252 that the system defines. How is this component doing its conversions, exactly?
But on the other hand, I am left with the knowledge that this never-before-defined behavior is hardly referring to bytes that are useful in a stream of text.
So if you are seeing them, it is entirely reasonable to consider the text to be corrupt. What is that expression? Garbage in, garbage out.
And then of course relying on code pages in this day and age is not the best plan even when you stay within the valid mapped characters that make up the long-documented portions of the code pages.
The best thing to do is just stay away from them, especially if you think you might have invalid data like Yossi was seeing (or maybe investigating whatever is converting text to these unexpected code points!).
This post brought to you by ǻ (U+01fb, a.k.a. LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE)
You know the drill, nothing technical....
Sometimes a request is made of me and the question is asked in such a way that I just don't know exactly how to respond....
Like when I got home today and my neighbor (who was outside in the porch, holding a cigarette like it was something semi-precious that she do not know quite what to do with) kind of held up the cigarette and asked if I had a light.
I thought back to the last time I had a cigarette (September 24, 1994) and that I honestly didn't remember the last time I had a lighter. I then corrected myself, I had those matches in my apartment still from the outage I mentioned before, though they were far enough away that I couldn't see scooting to my door, going inside, finding the matches, scooting back out to my neighbor, and giving them to her.
So I chickened out. I just kind of shrugged and told her "Sorry, I don't smoke any more" with what I hope was the right tone to indicate that I was, in fact, sorry.
Hopefully she is not a regular reader of this blog; if she sees this post then the jig is up, as it were. Sorry about that, I really just didn't see any reasonable logistics to make it happen....
And then there was earlier in the day, when one of the program managers I talk to from time to time (name omitted for what should become obvious reasons, presently) had an interesting request that I did not know quite what to do with.
The exact words: "...in the future, you should space out your interesting posts (by interesting, read relevant to me). If I'm having a hard time actually caring about the post I should be caring about because theres another more interesting post, its a valid point!"
I had literally no idea what to do with this one, and was essentially rendered speechless.
Truly anyone who found every post to be useful would scare the living hell out of me (since I find each post somewhat interesting, how I feel about myself should be obvious here!). But since I wouldn't expect anyone to find every post interesting, the notion of spacing out the interesting posts is a fascinating one to contemplate though very hard to actually deliver.
My response was something very respectful along the lines of "I'll do my best" though the eye roll that went along with it probably negates the sincerity of the words.
As does this post? :-)
Which of course leads to a meta question -- would a program manager who wanted interesting posts to be metered find a post about that request to be interesting?
Infuriating, sure. But interesting?
This post brought to you by º (U+00ba, a.k.a. MASCULINE ORDINAL INDICATOR)
No, this post is not to do with the phenomenon sometimes referred to as 'beer goggles' in any way, shape, or form!
(by the way, if you search for that term on Google, would that make it become 'beer googles'?)
The other day Scott asked:
Hi,Great blog!Im working on a really specialised text editor that is used for text from all around the world. To do this we are using Uniscribe to convert text to glyphs etc etc. Pretty normal stuff. We do wierd stuff with the glyphs in a printer driver later on!However today Im looking at Bengali, in particular Bengali (Bangladesh), and I found a wierdness in IE that you might be interested in.I have been cut-and-pasting text from webpages into my editor to validate that Im working OK. I have found a issue that is in my editor and in notepad!If you look at the webpage:http://www.prothom-alo.com/index.news.details.php?nid=OTkzMw==If I cut and paste the text into notepad it looses it character order and becomes junk, but whats more If I save the web page locally and reopen it in IE it turns to junk!I can fiddle with the character order manually to sort things out again, but thats not the point really!Anyway,Keep up the good blog work!
Interesting, it does indeed contain text that looks good:
until you try to put it somewhere else (at which point you get lots of dotted circles and such. Very odd!
I went down the hall to talk to Simon Daniels.
Like many people such as Raymond Chen and even myself sometimes, Simon is cursed with the burden of knowing stuff. And the problem with knowing stuff is that people will just randomly want to ask you stuff....
Anyway, he immediately realized what was probably going on. He viewed the source, got the link to the CSS file that was being used, and looked at it:
/* Embeded Font */<!-- /* $WEFT -- Created on 7/16/2007 -- */ @font-face { font-family: Bangsee Alpona; font-style: normal; font-weight: normal; src: url(http://www.prothom-alo.com/fonts/BANGSEE0.eot); }<!-- /* $WEFT -- Created on 7/17/2007 -- */ @font-face { font-family: Prothoma; font-style: normal; font-weight: 700; src: url(http://www.prothom-alo.com/fonts/PROTHOM0.eot); }-->
And of course .EOT files created by WEFT (Web Embedding Fonts Tool) actually have the site that the .EOT was generated for embedded in them, so changing the link to remove the "www" so that the link didn't work showed very different results:
(if you look very carefully you will see lots of dotted circles spread throughout)
In the end, proper font creation following the rules that have been established in OpenType (e.g. this one for Bengali) is crucial. If the fonts you use don't follow those rules then you have to encode the text to match the expectation of the fonts, and then you have strange behavior any time the font in question is not available to you.
Now in fairness to the Bangsee Alpona font, it may be a perfectly valid one at this point, perhaps the version that was used to generate the .EOT files was from before the various changes within Unicode and then later to Microsoft to support the language properly -- and perhaps the editor for the content has some of the same problems -- so new content is created using this slightly different use of Unicode that is not the standard (thus creating text that will not always look right if you try to copy and paste it somewhere else that may not have the font:
১২ োসেੳটਹর োথেক রাজৈনিতক দেলর সেਔ অােলাচনা ੂরઔ
One of the reasons for the effort to provide a standard solution within Unicode is to keep under control the multiple contradictory methods of getting the rendering done, which is clearly what happens here....
This post brought to you by অ (U+0985, a.k.a. BENGALI LETTER A)
So I was chatting with Goldie the other day and I think just after or maybe it was just before I made some ridiculous stretch of a joke joke about Anatevka (forgetting momentarily that she did not go by Golde; her nom de plume was Goldie) she asked me if there was a test case I knew off the top of my head where collation results changed between XP and Server 2003.
Interestingly, this is a question I have been waiting years for someone to ask, ever since I first pieced together the change that happened! :-)
You see, prior to Server 2003, there was no version support. You know, those functions I mentioned in posts like this one, (IsNLSDefinedString and GetNLSVersion.
As a part of the Server 2003 update, a bunch of code points got removed from the table. I'll list a bunch of them and you tell me if you see a pattern:
0x1000 32 2 2 2 ;Tibetan Ka0x1001 32 3 2 2 ;Tibetan Kha0x1002 32 4 2 2 ;Tibetan Ga0x1003 32 5 2 2 ;Tibetan Nga0x1004 32 6 2 2 ;Tibetan Ca0x1005 32 7 2 2 ;Tibetan Cha0x1006 32 8 2 2 ;Tibetan Ja0x1007 32 9 2 2 ;Tibetan Nya0x1008 32 10 2 2 ;Tibetan Reversed Ta0x1009 32 11 2 2 ;Tibetan Reversed Tha0x100a 32 12 2 2 ;Tibetan Reversed Da0x100b 32 13 2 2 ;Tibetan Reversed Na0x100c 32 14 2 2 ;Tibetan Ta0x100d 32 15 2 2 ;Tibetan Tha0x100e 32 16 2 2 ;Tibetan Da0x100f 32 17 2 2 ;Tibetan Na0x1010 32 18 2 2 ;Tibetan Pa0x1011 32 19 2 2 ;Tibetan Pha0x1012 32 20 2 2 ;Tibetan Ba0x1013 32 21 2 2 ;Tibetan Ma0x1014 32 22 2 2 ;Tibetan Tsa0x1015 32 23 2 2 ;Tibetan Tsha0x1016 32 24 2 2 ;Tibetan Dza0x1017 32 25 2 2 ;Tibetan Wa0x1018 32 26 2 2 ;Tibetan Zha0x1019 32 27 2 2 ;Tibetan Za0x101a 32 28 2 2 ;Tibetan Aa0x101b 32 29 2 2 ;Tibetan Ya0x101c 32 30 2 2 ;Tibetan Ra0x101d 32 31 2 2 ;Tibetan La0x101e 32 32 2 2 ;Tibetan Sha0x101f 32 33 2 2 ;Tibetan Reversed Sha0x1020 32 34 2 2 ;Tibetan Sa0x1021 32 35 2 2 ;Tibetan Ha0x1022 32 36 2 2 ;Tibetan A0x1026 1 0 3 0 ;Tibetan Vowel Sign I0x1027 1 0 4 0 ;Tibetan Vowel Sign Short I0x1028 1 0 5 0 ;Tibetan Vowel Sign U0x1029 1 0 6 0 ;Tibetan Vowel Sign E0x102a 1 0 7 0 ;Tibetan Vowel Sign O0x102b 32 37 2 2 ;Tibetan Chuchenyige0x102c 32 38 2 2 ;Tibetan Visarga0x102e 1 0 8 0 ;Tibetan Anusvara0x102f 32 39 2 2 ;Tibetan Right Brace0x1030 1 0 9 0 ;Tibetan Under Ring0x1031 32 40 2 2 ;Tibetan Ditto0x1033 32 41 2 2 ;Tibetan Single Ornament0x1034 32 42 2 2 ;Tibetan Shad0x1035 32 43 2 2 ;Tibetan Tseg0x1036 1 0 10 0 ;Tibetan Candrabindu0x1037 1 0 11 0 ;Tibetan Candrabindu With Ornament0x1038 32 44 2 2 ;Tibetan Comma0x1039 32 45 2 2 ;Tibetan Rinchanphungshad0x103a 32 46 2 2 ;Tibetan Rgyanshad0x103b 1 0 12 0 ;Tibetan Honorific Under Ring0x103c 32 47 2 2 ;Tibetan Left Brace0x103d 1 0 13 2 ;Tibetan Vowel Sign Ai0x103e 1 0 14 2 ;Tibetan Vowel Sign Au0x1040 12 16 70 2 ;Tibetan Digit Zero0x1041 12 47 70 2 ;Tibetan Digit One0x1042 12 66 70 2 ;Tibetan Digit Two0x1043 12 84 70 2 ;Tibetan Digit Three0x1044 12 102 70 2 ;Tibetan Digit Four0x1045 12 121 70 2 ;Tibetan Digit Five0x1046 12 140 70 2 ;Tibetan Digit Six0x1047 12 158 70 2 ;Tibetan Digit Seven0x1048 12 176 70 2 ;Tibetan Digit Eight0x1049 12 194 70 2 ;Tibetan Digit Nine0x104a 32 48 2 2 ;Tibetan Double Shad0x104b 1 0 15 0 ;Tibetan Virama0x104c 1 0 16 0 ;Tibetan Lenition Mark
The problem here? The data is all wrong!
This version of Tibetan, first described in Unicode Technical Report #2, was removed in Unicode 1.1 when the ISO 10646 merger happened, and then Tibetan was added back in Unicode 2.0 in an entirely different place.
If you look at DerivedAge.txt, you will see that the new Tibetan was added in July 1996.
But Windows had been carrying data around from Unicode 1.0 since the very beginning of its 32-bit life, possibly as far back as NT 3.5 or even NT 3.1 (I am almost curious enough to go try and find out which, actually!).
In Server 2003, it was decided that this incredibly invalid data had to be removed.
For one thing, it is just really bad to start a formal versioning functionality with crap like that in there.
And for another, this space that was left empty after the 1.1 merge was actually filled as of Unicode 3.0 in 1999 -- with the Myanmar script. And even though Windows did not add weights for it yet (we did not do so until Vista), keeping known bad data seemed like a pretty bad idea...
So, all of the above code points had weight in Windows from the early 32-bit days until XP, and then again in Vista (and were essentially weightless in the years between).
And of course the snapshots in Jet 4.0, ACE (the version of Jet that ships with Access >= 2007), SQL Server 7.0, 2000, and 2005 all have these somewhat bogus code points as well....
Oops for them (plus we can be snotty and superior about it now that is fixed in Windows!)
When one talks to old timers about the 1.1 merge between Unicode and ISO 10646, you have trouble getting a straight answer -- it is like that bit from The Number of the Beast:
I've given up trying to find out what happened in 1965: "The Year They Hanged the Lawyers." When I asked a librarian for a book on that year and decade, he wanted to know why I needed access to records in locked vaults. I left without giving my name. There is free speech -- but some subjects are not discussed....
So that is all I can say about the old U+1000 TIBETAN LETTER KA which died in Unicode in the early 1990s only to rise from its ashes in 1996 at U+0f40 with U+1000 being assigned to MYANMAR LETTER KA in 1999. The same character lived on at Microsoft until 2003, only to be reborn along with its Myanmar cousin in Vista....
This post brought to you by ཀ and က (U+0f40 and U+1000, a.k.a. TIBETAN LETTER KA and MYANMAR LETTER KA)
The other day Lynn asked:
Michael,I'm not sure which group you are working in, but I am hoping you can forward this message to someone who might be able to look at this.We got a message from a customer about the EU Expansion Font Update v.1.02 for XP. He loaded it on his machine and created the attached table to test out the additional characters. In the bottom table, the Italic and Bold columns are printing a U with grave accent instead of a Bulgarian ѝ (Cyrillic Capital letter I with grave, U+045D) for Arial, Times and Trebuchet. Verdana is correct. If I select the symbol for the Cyrillic I with grave out of the symbol table for any of these three fonts, the little pop-up window shows a U with grave accent and if I insert it into the document, a U with grave accent is displayed. This happens for both the upper and lower case characters (U+040D, U+045D).I am hoping there might be a fix for this, or that one is forthcoming.Thanks for any help.
I forwarded it on to Judy and Simon since they know lots more about fonts than I.
Judy verified that this was an expected difference, kind of the Bulgarian version of the Serbian difference in italic forms I talked about in this post.
Here is an example where you can see this other form:
Of course Tahoma's italic support is calculated by a bit of GDI slanting, so its results are just an algorithmic thing -- the form that looks more "u-ish" is the appropriate one for italic lowercase U+045d.
And then Simon pointed out some Latin-based examples of the issue, which are much easier for people who do not know the Cyrillic script to fathom )I took it as a screenshot in case you don't have all the fonts in question):
Bravo for the small differences -- kind of argues for a Tahoma Italic to get done here, doesn't it? :-)
This post brought to you by ѝ (U+045d, a.k.a. CYRILLIC SMALL LETTER I WITH GRAVE)
(title inspired by that old children's song, I remember the Sesame Street version best)
It was not that long ago that my car (a black 1995 Saab 900 convertible) was parked at the airport in long term parking, and I was somewhere else. San Jose, I think.
While I was there, someone decided to break into the car.
Of course there was an alarm, so they had to be careful.
I was gone for most of a week so they had time to be industrious.
Their first attempt was to remove the lock on the passenger side. This does not let the door open (and even if the door did open the alarm would go off, so it would not have helped).
Undeterred, their second attempt was to cut a hole (well, actually two holes) in the soft top.
Wait, I have some art for this:
You can see the two rips in the soft top. If you really work at it you could maybe pull something out but the only thing there was to pull out was the GPS unit on the floor mostly under the seat (hard to see, even harder to reach), so it would have been difficult to get it out, I guess. strike two. Particular annoying failure for reasons that will become clear shortly.
These thieves were quite determined, though. In the end they broke the driver's side window. And they got the GPS unit.
It was surprising to me that the alarm did not go off. Iinterestingly it did go off when I unlocked the door by reaching in and lifting it, but the cop mentioned to me that it is possible to shatter a window with vibration but without motion, so it would just be my bad luck. I think I should probably look into a better alarm system. :-(
In any case, insurance coverage was apparently not as full as it might have been....
So far, the window and the door lock are covered but the soft top is not due to some kind of exclusion. Though since I had to pay a higher rate for loss/damage/theft for a convertible, it seems like something of an unfair exclusion (though how often can one out-argue an insurance company?).
I will withhold the name of the insurance company for now while I wait for the appeal to finish up, more on this later,,,,
If they won't cover it, then I have to worry about the price of the repair -- the cost of the soft top (about $3000, quoted by the dealer) and the labor to replace it (17 hours!!!) is fairly steep. So I may hold off on the full repair until I decide to sell the car. Or maybe find some auto repair school that wants to take in the repair as a project
(ironically if someone steals it, then the whole car is covered, including the soft top. I can always just hope somebody steals it some day..... sigh).
For some strange reason, the roof does not leak -- but that may not be the case forever as the top goes up and down.
In the end I guess I'll just have to get it replaced whether the insurance company pays or not (one way or another)....
This post brought to you by ⼧ (U+2f27, a.k.a. KANGXI RADICAL ROOF)
A colleague and friend of mine, a former v- at Microsoft who went full time, was talking to me just after I had gone full time.
He predicted that I wouldn't last two years.
(He himself had already left within a year of when I started)
I asked him why he thought I would only last two years, and he told me that I'd realize that the people who didn't want me there would make their feelings clear enough for me to realize there was no future for me.
Intrigued, I turned the question around and asked him why he thought I'd last as long as two years, to which he replied that I am stubborn, idealistic, and cynical -- a combination rarely found in nature but then (as he further pointed out) I did not spend a whole lot of time in nature, either.
I pointed out that I had been a boy scout in my youth (until roughly the time I discovered the existence of girls, in fact!). To which he said the scouts thing just proved the point about being stubborn, and the thing about girls proved the idealism.
"What about the cynicism?" I asked him, but he had a response for that too - the fact that I am not in a relationship and seem to shun opportunities when they arise.
I started to respond but then I realized it was quite possible he had thought this out and was going to be able to outflank any counter-argument I could come up with on the spot. And since no one wins an argument when they think of the comeback a month later, I realized conceding would be more sensible (perhaps a subtle disproof of his theory about me, but I knew enough not to run myself into that trap since pointing it out negates the effect of the concession).
"We'll just have to see where we stand in a couple of years," I said.
Today actually marks my fifth year, though. :-)
I'm not in the office today since its Sunday and won't bring any special candy tomorrow (I have two huge bowls of candy sitting in my office now, I suppose my friend would say that every day that I have not left is reason to celebrate?).
I asked him about the fact that his prediction turned out to not be prophetic, and of course he had a parry to the thrust of this argument too -- the Longhorn/Vista ship schedule threw off the timing.
"We'll just have to see where we stand in a couple of years," I said again.
"Exactly."
So here I am, still sorting it all out. I'll be in late tomorrow (waiting for a repairman) but if you are on campus in Redmond and feel like popping by and grabbing a piece of candy during the afternoon, celebrating my stubborn/idealistic/cynical nature, and just generally saying hi then please feel free to do so. :-)
This post brought to you by 𐒥 (U+104a5, a.k.a. OSMANYA DIGIT FIVE)
(Inspired by the alternate title from Oh Kannada... (ಕನ್ನಡ) and the South Park movie!)
One of those interesting issues related to rendering Indic properly came up the other day, in this case with Kannada....
The string in question, first:
ಅಹ್ಮ್ದ್ ಷರೀಫ್
If you are running on an OS that does the rendering correctly, it will not look identical to this other string:
ಅಹ್ಮ್ದ್ ಷರೀಫ್
Or this third string:
ಅಹ್ಮ್ದ್ ಷರೀಫ್
The customer was in this case seeing that third string visually for all three using some fonts, but not others, and in some technologies, but not others. And it was never working right in .NET 1.1 using GDI+ and its Graphics.DrawString method.
Now as you might have guessed, we are dealing with the combination of several different issues here, including:
These issues are ones that will improve over time as the older implementations that do not have right rendering story are replaced by those that do. Though I can't help wondering whether it would have been so bad to update all of the supported technologies (including GDI+) so that customers could see text correctly without depending on technology shifts....
This post brought to you by ್ (U+0ccd, a.k.a. KANNADA SIGN VIRAMA)
I admit I am no fan of either the MPAA or its ratings system.
But some interesting issues in language are raised by its criteria.
For example, the use of one of the harsher sexually derived words (e.g. fuck) even as few times can lead to a movie being given a PG-13 rating, while using it only once can lead to an R rating if it is used in a sexual sense.
The distinction, while obvious and rather easily defined, can at times be problematic, though.
Take for example the movie Crimson Tide, in which the word is used 28 times, mostly in a non-sexual sense, but certainly enough times to assure an R rating.
Some are obviously sexual (and more than a little offensive), like the first occurrence:
Yeah, horses are fascinating animals. Dumb as fence posts but very intuitive. In that way, they're not too different from high school girls. They might not have a brain in their head... but they do know all the boys want to fuck 'em. Don't have to be able to read Ulysses to know where they're comin' from.
While most of the others are not, like the second (which can be considered offensive to some for entirely different reasons):
Somebody asked me if we should have bombed Japan... a simple, "Yes, by all means, sir. Drop that fucker. Twice."
Or the third:
When you got somethin' to say to me, you say it in private. And if privacy doesn't permit itself, then you bite your fucking tongue.
Now clearly these last two are not meant in a sexual sense. and so it goes for most of them.
But then there is a third sort of a category, like in the thirteenth use which comes just after the non-sexual twelfth
Sonar/Conn - let me know when our range to that Akula is open to 1,000 yards! <<Conn/Sonar>> Aye, sir!Damn it! Let's just shoot this fucker! What's 1,000 yards for?'Cause it takes 1,000 yards for the torpedoes to arm! Jesus! Who'd you fuck to get on this ship?
Now this is a sexual sense, kind of. But not really -- it is obvious hyperbole and not a serious reference to sex. That someone can be so incompetent to understand such as basic issue about submarine warfare that the only way they could make it on submarine would be as payment for sexual favors (perhaps "Who do you have naked pictures of to get on this ship?" could have been used instead to get a similar meaning across, and it would have been more theoretically feasible of an idea in terms of getting assigned to a sub while incompetent. But in any case it is really not talking about a truly sexual context (or if it is, it is less serious of an example than that first one).
Kind of like the twentieth use just after the non-sexual nineteenth:
Weps, we've been ordered to launch. Now why in the world would we do that if they weren't prepared to launch at us? We don't know that for sure. That's the whole point. That's why he wants time to confirm the message. That's the whole fucking point is we don't have time! Radchenko is fueling his birds. Now why do you think he's doing that? Why? You don't put on a condom unless you gonna fuck!
Again, it is sexual, sort of. But really only as metaphor -- trying to explain that fueling missiles without arming them would make as much sense as putting on a condom without actually having sex. It is certainly a different degree of sexuality, if it is truly going to be treated as sexual at all.
Obviously with 28 uses of the word it was going to get an R rating anyway. But if a movie had just one or two examples of this "sexual, but not" kind of reference, I wonder whether it would be PG-13 or R?
A basic problem of having the ratings decision be based on not just the word itself but also on both the semantic context of whether it is being used sexually and the pragmatic context of whether the sexual use really is about sex.
Perhaps it is a distinction without a difference to some, but the uses seem different to me....
This post brought to you by 姘 (U+59d8, a CJK ideograph that may or may not be some relevance)
When I wrote Getting the language (and more!) of an LCID-less keyboard, which admittedly covered a lot of ground, I realized there were a lot of other points that would have to be clarified.
Like how MUI (Multilingual User Interface) fits in.
I mean, it is clear from looking at the registry that something is going on:
That bit with the Custom Language Display Name and the Layout Display Name and their SHLoadIndirectString style strings is fairly obvious.
(I still have to talk more about SHLoadIndirectString; I'll do that another day)
And it leads people like regular reader Ivan Petrov to wonder and even ask:
Hi Michael :-)I've the following question:How can someone USE, let's say something like the MUI technology, for the Description text when the custom Keyboard Layout is installed?I mean when some user is using English User interface (MUI) to see the English Description text and if some user (on the same machine) is using a Bulgarian User interface, to see a Bulgarian Description text. All this at the Language bar and in the Text Services and Input Languages window in the Installed services under the Keyboard tree as localized node!Regards,Ivan.
Now of those two MUI-friendly strings, the Layout Display Name actually was added in Windows XP and is used to support localized keyboard layout names in every user interface language in Windows. All of the strings are in input.dll and the localizers can get to them.
But for custom keyboard layouts, obviously one cannot add strings to the input.dll file that ships in Windows. So we talked about it an decided that the resources of the layout DLL itself would work just fine. Starting with MSKLC 1.4, we automatically add the language name at string resource 1100 and the layout name at string resource 1000.
Of course there is no user interface within MSKLC to let you specify the various translations of those two strings, which would seem to defeat the purpose.
But let's take a closer look at the .KLC file from those adventures the other day, near the bottom of the file:
DESCRIPTIONS0409 Like Totally Fer ShureLANGUAGENAMES0409 Valley Girl (California)
And there you have it. For any language you wish to add a translation for, and this is for either or both strings, you can add them here. MSKLC will not let you edit the name directly but if they are there then it will build the keyboard layout DLL containing them. Quite happily, in fact....
You are LCID (technically LANGID) bound since all of the resources are contained in the one DLL and there is no way to do multilingual resource tagging in one file. Perhaps in a future version this would change to including the various .mui files in the language name directories. And then custom languages might fare better (of course for the time being MUI does not work well with custom locales so MSKLC has some time before anyone needs to worry about getting that bit right. :-)
It is funny, the feature idea within MSKLC has been suggested for years but it never really got very far, as people struggled over what to make that UI look like -- some big grid where you choose the target language and put in the translation? Or would you give the DLL to some localization company and have them translate? Probably once they are separate DLLs, sure. But for now many some UI would have been nice? :-)
Ah well, no worries. If you want to put in some different translations of the custom language name (ignored unless it is in fact a custom language) and/or the keyboard layout name, adding them to the file is easy enough by just putting in the LANGID and the name, one line to each you add.
This post brought to you by ࿏ (U+0fcf, a.k.a. TIBETAN SIGN RDEL NAG GSUM)
So I went and saw a movie last night with colleague/comrade/friend Melanie.
We went to Lincoln Square Cinemas armed with two recommendations (Death at a Funeral and Superbad) but we ended up seeing Becoming Jane, instead....
This would not have been my first choice, to tell you the truth.
Not due to any feelings against Jane Austen, mind you -- while not a genuine Austenite (Janeian? Not sure what the authentic term is here), I loved Sense and Sensibility and Emma nd the others and read them all on my own after being "forced" to read Pride and Prejudice in grade school (it was not the sort of book one could actually admit to enjoying at that point, so I didn't). But Jane taught me irony, something we all have and experience but so few people recognize, and that was quite a gift, if I do say so myself.
But I was skeptical about the romance between her and Lefroy, and I was doubtful about Anne Hathaway in the role (though I loved her in The Devil Wears Prada), and I was petrified that a "Hollywood ending" would be bolted on to the story leaving us with more of a "Becoming What Jane Would Be Like Had She Married" rather than a more truthful "Becoming Jane."
But I'll be honest, it seduced me.
Perhaps there was no actual even almost relationship between them -- in reality I think they had less than a month for it to happen so in the end it is unlikely they did. But the movie made me believe it and I had no trouble suspending disbelief given the chemistry between Hathaway and McAvoy (though there were several scenes that would have benefited from a Steadicam!). After I got home I had to check dates for Lefroy's daughter Jane and see if things lined up as well as they did in the movie -- they did.
Plus the preview of The Jane Austen Book Club coming this fall has also tempted me, I'll probably see it too (the book was wonderful).
I am truly glad we saw this movie.
Of course there was a price to be paid -- I am not the sort of person who can be moved that much toward romance (even ultimately unrequited) and was up most of the night reading a Thomas Gifford novel to swing my sense, sensibilities, pride, and prejudices back closer to where I usually keep them. There is romance there too, mind you -- but with the added notion of conspiracy and a more cynical edge upon which I can tune my own moral compass....
But that is just me. Normal people can see the movie and meet the Jane before the Jane who wrote the books they knew so well, and enjoy thought that she did indeed have a chance to feel the stirrings about which she wrote so very well.
This post brought to you by ൠ (U+0d60, a.k.a. MALAYALAM LETTER VOCALIC RR)