Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
I had somebody ask me what I thought was "best in show" at this last IUC that I was at (and whether I thought it was my presentation).
No, I explained, my talk was interesting and there are several other interesting issues that could get a lot of benefit from the same approach due to the complex nature of the problems -- for example all of the following and the relationship to Unicode and internationalization (only some of which I'd be qualified to present but all of which I'd love to see):
and more. There is something about the "dense with information but slightly irreverent" look at a problem that I think can be very appealing. :-)
But I did not claim it was best in show.
For that I pointed out We're "World-Ready"… What does this really mean?, a talk given by Loïc Dufresne de Virel, Michael Manca, and Cory Whitney (all from Intel). The description of the talk:
Proper internationalization is routinely listed in most software requirement documents, but most development and validation teams are in the dark when the time comes to implementing and testing this basic requirement. Based on real software bugs investigated by Intel's localization team, this session presents the typical internationalization issues that developers encounter every day, but often struggle to properly address in a proactive fashion, prior to an actual localization attempt. Regional settings, language selection, encodings, and UI design are among the topics that will go under the microscope in a very practical way, exploring the probable causes of those issues, along with possible solutions and best known methods implemented at Intel.
Now at its heart this presentation was a bug postmortem of a handpicked set of internationalization/localizability bugs from a myriad of causes and in a large number of different technologies. I even came back with some bugs to get reported in Microsoft....
Such a simple concept, and one that anyone working in the area could imagine that they themselves could do.
They might to be as funny or as entertaining (I know I probably wouldn't be!), but the concept is one anyone can really get their heads around. And everyone was right there in this postmortem, sharing the same joys and pains that we feel ourselves as we fix or postpone the fix of the bugs.
The enthusiasm and energy in the room was undeniable, and the only real complaint I had was that it ended too soon since I wanted more of it!
Perhaps the things that I think are among the best are the things that I could possibly do but I think someone else can do better? :-)
This blog brought to you by ဿ (U+103f, aka MYANMAR LETTER GREAT SA)
It was yesterday in It is the [unexpected] gratitude from people you respect that makes a [stubborn] Bulldog feels best! where I mentioned that I was $20 in the trivia contest.
Without explaining what I meant.
Basically it was part of the evening's excitement, here is the poster about it, which I swiped from the bulletin board as everyone was packing up:
Notice the thing about the trivia questions and cash prizes at the bottom? :-)
The two questions I had the answers to were:
1) What are the countries that are or have been member of Unicode?
The answer is India, Pakistan, and Tamil Nadu (all three are currently members).
Not too impressive, though the reason I knew this one since I had actually had contact with each over the last few years at various timjes on Unicode issues. Though since Tamil Nadu is not an actual country it was the one I named last (after they did not declare my answer right after I named the first two and just kept looking expectantly)....
2) Who is the person behind Sarasvati?
I should explain that this was preceded by a question of who is Sarasvati -- Hindu goddess of knowledge and the arts (similar to but not the same as one of the Greek muses, though in addition to the job differences Sarasvati's status was somewhat higher since she was Brahma's lover/consort), and also the secret person who helps keep The Unicode List from getting out of hand. Any time someone stepped over the line too far, The Effulgent One would come and save us all from ourselves.
I raised my hand for that question too, since I knew both answers, but someone else was called on. That person did not answer the question correctly, but then the above question came in as the follow-up.
I knew this one because the person behind The Effulgent One had, in a past situation of extreme tension, not been as careful about IP addresses and the machines being worked from as is done today, and unintenionally allowed me to reverse engineer the identity.
This answer will not be posted here, out of respect for the Goddess. Witnesses to the revelation whereby I made half of the $20 for my response should likewise respect this. :-)
Anyway, as the poster indicates, part of the evening's entertainment was a song/poetry contest, and given my foolhardy nature about such things I was considering 'Unicodized' versions of one of the following two songs:
I'll let you imagine the lyrics I had come up with both of them for yourselves.
Of the two, I thought the first was the coolest though the most embarrassing for me to actually perform since I lack Tal's vocal range; the second with a stong singer handling the woman's part in the song would have had as much better chance of not being off key or out of range.
Though the issue became moot (and I instead stood mute) after (in the words of Joey Tribbiani) it became a moo point.
Basically after Jim DeLaHunt sang his Unicode Doggerel (”I am the very model of a modern text encoding scheme”), which was hilariously funny, especially lines 14-16 and line 18 and line 22.
I knew I could not top this one so I decided not to try -- some acts you just don't try to top!
Perhaps I'll share some or all of the lyrics of the above two at some point as they were a lot of fun to do.
Next, I'll unpack the Bulldog Award, take some pictures of it, and tell the tale of my adventure getting it home....
This blog brought to you by ꖒ (U+a592, aka VAI SYLLABLE MOO)
The other day, reader Harmony7, in response to my blog Unbloggableable: an inconceivableable term that makes me a little uncomfortableable, commented:
This might be a bit off-topic, but I always thought the word "undoable" was confusing.The dictionary says "undoable" means something unattainable: "An undoable plan" might mean you don't have the necessary money or people.However, especially in the world of computers and UI, it is often used to mean something that can be undone: "An undoable edit control" might be an edit control with an undo function.Maybe things of these sorts kind of... evolve after a while.
Now I agree completely here, Harmony7 -- this was very off-topic.
But on the other hand it is an interesting new topic, which I will say something about now
This is the interesting difference between two separate etymological paths for the same word -- undoable as either undo-able or un-doable!
Now un-doable is the one Harmony7 is thinking about in the dictionary entry.
If one can do something, it is doable.
And if one cannot do something, then we add the "un-" prefix -- calling it undoable because by being not doable, it is un-doable.
Now the UI feature is thinking of the undo feature, one is thinking about the memory a computer program has of things you had the computer do.
Especially in terms of those item(s) on that list that one can undo. Because if you can undo them then they are undoable, because they are undo-able.
The fact that the same prefix and suffix are being used is unfortunate, as there are no methods or markers to know the difference between
un((do)able)
and
((un)do)able
which, since they both spell out the same word is troublesome.,
Oh well, at least someone had the awareness in capturing the case where an action a program may do that cannot allow the undo action is described as can't undo rather than unundoable! :-)
Because unundoable would be a doubleplusungood way to refer to things!
This blog brought to you by ឌ (U+178c, aka KHMER LETTER DO)
It turns out everybody knew.
Rick McGowan had sent me mail a few days ago asking if I would definitely be at the Internationalization and Unicode Conference. I said yes to do my talk. I mean of course I did, I promised I would do the Behind the Proposed Change to Tamil in Unicode talk, and I'm not the sort to break promises....
He persisted, asking if I was attending the whole conference.
Now it was my vacation time, but on a Mon-Wed in San Jose I doubted I'd have much else to do, so I said I'd try. :-)
Then on Monday I night I went out to dinner with some conference friends (I remember a conversation at dinner where I explained that for me it was always the unexpected gratitude from people you respected that felt best, apropos of something we were talking about but I thought no further about it), and then later stayed up to do some work. Yes technically I was on vacation, but it was interesting and I was making progress.
In bed by about 5:30am, it hadn't occurred to me that I had pulled an all-nighter until I looked at the clock!
I got up at about 8:30am Tuesday morning, checked email, and then got in the shower.
No hurry -- I knew that the conference was starting at 9:00am, but I had been up all night and I'd be down before the keynote was over.
Who was gonna notice?
Famous last words!
Just after 9:00am, Magda Danish calls my room, and asks me if I am coming downstairs.
"Sure," I explained, "soon."
"Could you come down right away? We need you."
I agreed, and as I was getting dressed I worried a bit.
Was I in trouble?
Well, I knew I was from a mail I had gotten 12 hours and 12 minutes prior. But that was for other stuff, not from folks working for Unicode!
I put on my İ şéè đêäđ ķéÿš shirt -- hey, I am on vacation, remember?
Time to face the music.
No sense putting off a trip to the dentist.
And then after running out of cliches on facing the inevitable, I decide to head downstairs.
I scoot toward the door, slowing down as I neared, and Magda is frantically gesturing to me to come in.
I now have no idea what was going on, whatsoever.
Then I hear Rick talking into a microphone and a minute later a picture taken of me last month from Murray's house at a UTC party of me wearing the same shirt and suddenly I know what is going on.
I am the winner of the Unicode Bulldog Award!
As described all of those years ago:
On April 2nd 1997, Rick McGowan suggested that the consortium sponsor an occasional award for "outstanding personal contributions to the philosophy and dissemination of the Unicode Standard". In May 1997, Ken Whistler came up with the term "Bulldog" in reference to a remark made by Thomas Huxley to Henry Fairfield Osborn, in the mid 1870's: "You know I have to take care of him [Darwin] -- in fact, I have always been Darwin's bull dog. Three months later, at the Eleventh International Unicode Conference, Mark Davis introduced the award for the first time. He said: There are many people whose dedication and perseverance has helped to bring the future of Unicode even closer. To recognize such achievement, we have created a new Unicode award, to be given to those tenacious champions of Unicode who have produced solid achievements in promoting its use around the globe. This award is called the Bulldog Award; once these guys bite, they never let go!
On April 2nd 1997, Rick McGowan suggested that the consortium sponsor an occasional award for "outstanding personal contributions to the philosophy and dissemination of the Unicode Standard".
In May 1997, Ken Whistler came up with the term "Bulldog" in reference to a remark made by Thomas Huxley to Henry Fairfield Osborn, in the mid 1870's:
"You know I have to take care of him [Darwin] -- in fact, I have always been Darwin's bull dog.
Three months later, at the Eleventh International Unicode Conference, Mark Davis introduced the award for the first time. He said:
There are many people whose dedication and perseverance has helped to bring the future of Unicode even closer. To recognize such achievement, we have created a new Unicode award, to be given to those tenacious champions of Unicode who have produced solid achievements in promoting its use around the globe. This award is called the Bulldog Award; once these guys bite, they never let go!
Rick then said some very nice things and I accepted the award that I think I was too dazed to remember (I'll wait till hey update the Bulldog page to find out for sure -- I now know why Oscar winners forget stuff in their speeaches!) and then Magda helped me take it to pack up in the box.
Throughout the day people congratulated me and I spent the day feeling pretty pleased. In the words of Frank Burns, it's nice to be nice to the nice.
Though obviously I was mistaken (could not have been more wrong, in fact!) about no one noticing if I was a few minutes late coming down, it was fun joking that the reason I was late was that I had to let them take the picture of me first! :-)
All day long people were asking me if I knew beforehand, and I told various shortenings of the above -- I had no idea.
At the Unicode trivia quiz I won $20, and afterward Rick admitted to me that there was pretty fast consensus about the idea on the Unicode Officers List. And then at dinner I found out from Peter Constable that he had also known for a few days and he seemed slightly amused that I had no idea.
I guess stubborn and clueless aren't mutually exclusive terms. :-)
But it really is always the unexpected gratitude from people you respect that feels best.
And I'm really looking forward to the Behind the Proposed Change to Tamil in Unicode talk tomorrow -- a tale of Unicode and language and politics and technical issues and linguistics and romance and anger and a job offer and a [veiled] death threat, it promises to be what I think will be 50 of the more exciting minutes of the conference!
Thanks to everyone who has ever come to me asking about a Unicode question or issue or problem or bug. You're the ones who inspire that stubborn streak in me....
This blog brought to you by 𐂍 (U+1008d, aka LINEAR B IDEOGRAM B109M BULL )
Kamal asks:
Is there way to participate in the discussion of Tamil (Sri Lanka). I mean the regional settings. I would like to participate and contribute if I can.Please direct me. Thank youTruly,Kamal
Generally speakng, there are not huge public discussions on data being added to future versions of Windows.
Data has been suggested in the past from some sources but the triage process has thus far not been able to justify the steps to adding it officially.
Though in the meantime I suggested the programmatic way to add it (in Where are the other Tamils?). And as that blog and this one points out, you can also use Custom Locales via Microsoft Locale Builder to get the locales added! :-)
This post brought to you by ᗪ (U+15ea, a.k.a. CANADIAN SYLLABICS CARRIER PE)
Sorry for the pause there, I have been busy. :-)
George Carlin once pointed out that the real problem with driving was the other cars. Were it not for them, everything would just be so much better.
Perhaps we can learn something from him on our current problems.
Now as that last blog indicated, we are in a real problem because of all of these other processes that might be viciously trying to use or update or remove the fonts we want to do something with. And as that old Sesame Street song went, you're the most important person in the world to you, so you can hardly be held accountable for the strange things that others do, the miserable cretins.
But how do we go about insulating ourselves from these annoying other people and applications who have the vicious nerve to commit the unpardonable and mortal sin of getting in our freaking way?
Interestingly, there are several insulation techniques that we can use here, all of which have been covered either in this blog series or in earlier blogs:
First of all, we have (as discussed in the Private fonts: for members only blog) private font loading. Through the use of private font loading, you can insulate yourself from any changes that anyone tries to make to a font, since you are working from your own private copy. Perhaps the font file is something in your own application directory, perhaps it is somewhere else. It might even be (as in the sample code in that blog on private fonts) embedded within one of the application's own binaries.
Now this is something we can even do with fonts that are contained in the Fonts folder -- thus allowing us to work from the initial list the user has access to but then insulating ourselves from the changes these other people might try to make. what better way to make sure that we look out for number one and stop worrying about what other, lesser applications are doing?
Of course one must have the permission to ship the font if one wants to cart it around, otherwise the only legal recourse is to keep to things on the machine and not try and persist it for later sessions.
If you do this, while one might theoretically be in violation of a font license if one loads a font privately that is then removed, since your load within a session won't even survive a logoff, even though I am not a lawyer I doubt that the cease and desist warning would really mean anything. If they are so worried about it then they why would they not have insisted on a reboot (which is the same situation that could have existed if you had never gone the private font route, slthough in this case your application wouldn't fail prior to the reboot.
Again, I'm not a lawyer, but I would not be afraid to act as an expert witness if you don't do nefarious things with someone else's file other than use it privately within a session. :-)
Second of all, we have (as discussed in the Rhymes with Amharic blog series) font subsetting and embedding. Through the use of these features all of the benefits of private font loading can be realized, with the additional benefit of not having to cart the entire actual font around -- which can be very helpful when it is a big font.
You also get to have many other benefits, such as better document/report/application fidelity between machines or versions when those other machines or versions might have their own copies of those ame font faces that are either mildly or incredibly different.
Now there are interesting nuances here one must keep in mind, and I was reminded last week that I should be writing about some of the more interesting nuances that exist here, to keep those nuances from acting as the computer application version of booby traps or land mines.
One thing you can do is use subsetting and embedding the same way I suggested using private fonts above -- only use them within a given session. This would really work similar to private fonts but would allow you to ermain idnepndent of changes to fonts (updates or removal) within a given session.
Another Carlinism -- you know how lots of people tend to grab bread from the inside the loaf rather than the bread on the ends, and how what they are saying here is "let my family eat the rotten bread. I'm looking out for numero uno!"
Well, it's like that. Grab the piece of the font in the middle that you want and let the other applications fend for themselves. It's not like they eem to give a crap about you, so why not be selfish? :-)
Third of all, there is (as discussed in other parts of this very series) the literal separation between how things are handled within a single session and how they are handle between sessions.
When loading an updated font, you can simply:
and then you never have to worry about the problems with files updated out from under other files like I mentioned earlier.
You can use similar procedural tricks to take advantage of the complexities of font installation/load/removal in order to make that complexity work for you.
All you have to do is care mostly about yourself, and then perhaps as a secondary concern (one your needs are taken care of) desire to not bother others.
Fourth of all, you can consider that this blog will be published to splog sites all over creation, so there may be another application doing the same things with its own fonts.
If everyone is able to act selfishly then they really all can all still work together since they are so busy insulating themselves from others that you know how to make sure to avoid bad assumptions about what you should be doing.
Knowledge is power -- and the knowledge that all of the smart people might be also trying to be selfish makes it even easier to insulate yourself from their problems.
It is really the naive people you have to be most worried about since they will be making all of those unrealistic assumptions about what might change, and screwing themselves up thereby
Did you ever think you could trust the selfish people more than the unselfish ones? :-)
This blog brought to you by ֆ (U+0586, aka ARMENIAN SMALL LETTER FEH)
Previous blogs in this series of blogs on this Blog:
By the end of this series, all of my regular readers or even cursory followers know that I'll be talking you into all kinds of corner cases.
Here in the beginning, Iam going to start with the obvious.
There is one reasonably simple flaw in the whole desire to "move from UCS-2 to UTF-16", one that really should come out now.
The premise is complete and utter crap.
Yes, that's right.
For the vast majority of all operations, for the great bulk of possible things that computer programs do with text, support of Unicode is all one needs.
The issue is largely as people stare in the abyss as they are thinking beyond the BMP of Unicode, forgetting in their rush to believe there is a huge work item to consider that this not a battle -- that UCS-2 vs. UTF-16 is not quite Kramer vs. Kramer.
For most of what happens, the project was done before it started and you were freaking out for nothing.
Asking the question (as was done for example back in 2005) Is SQL Server really supporting UTF-16? really misses the point that for most operations, it is.
All that UTF-16 adds to the whole situation is a single example of a problem that exists in UCS-2 and actually in UTF-8 and UTF-32, as well as in UTF-16.
The problem is that question Raymond Chen raised 20 months ago in his blog What('s) a character!.
It is that most times a character is a storage character, and that occasionally it is a linguistic character. Forget about the base character combined with a buttload of diacritics scenario, there are plenty of valid scenarios too. But none of them are the default case, or the most common scenarios.
To be honest, after considering situations like Vowel DISharmony?, aka The case of the missing dot, anyon who thinks that the biggest problem here is a split surrogate pair (which would show that square box something like an unknown character) and not the linguistic characters potentially stripped of their actual meaning and validity, likely needs a vacation.
They're important, sure. But they are not world stoppers (or if they are, then they were evn back when the application claimed to support Unicode yet clipped diacritics just as indiscriminately and probably more often).
Thus the first and most important point is that we need to upgrade the question into somethin more meaningful, something that covers what we are actually trying to accomplish.
Next time I'll start talking about the cases where one ought to care, and how to make sure it a productive kind of caring.
This post brought to you by ı (U+0131, a.k.a. LATIN SMALL LETTER DOTLESS I)
They say the difference between wisdom and intelligence is subtle -- it is intelligent to know asbout the risks of cigarettes and lung cancer, while it is wise to not smoke. So maybe one ay of defining them is calling intelligence "theoretical smarts" and wisdom "applied smarts" or something like that?
I was thinking about this the other day, actually.
It was late last week
Before a training session.
One that I was the presenter for.
Before we started, I ran across a familiar face -- Dave Sell.
Now Dave and I go way back, all the way to when I was working on the wizard to wrap the MAPI IISAM in Access for import and link operations.
He was the guy behind several of the key IISAMs in Jet that so many people depend on -- text, HTML, Excel, MAPI. But that wasn't the part that impressed me the most about his work.
What impressed me most, the part that really made me want to aspire to something, was his work on an OLEDB IISAM.
Now forget about most of the content in What does DAO have that ADO/ADOx/JRO do not? for a moment -- the king of the Jet IISAMs was writing the IISAM that would be able level the playing field and implement something on top of OLEDB that was made for Jet, something that ADO/ADOX/JRO all together never really tried to be, and so unsurprisingly was not.
Now writing against a single IISAM using the same syntax would always be easier than writing against the individual ones. In a very real way, he was making a lot of his former work less necessary, less important, less interesting --and we are talking about functionality that a lot of line-of-business applications have been depending on for more years than SQL Server has actually even been a usable product! Maybe it would happen, maybe not (since it isn't up to the engine to decide how people will use it), but the potential was there.
As it turns out, Access did not go that way, so although the IISAm is there none of the existing UI or wizards were changed to use it -- so the existing body of work is still there. But that wasn't something he knew would happen, going in.
I always wondered if I would be able to be as willing to take a significant chunk of work that I did and actively work on a project with the potential to obsolete it.
Just because it was the right thing to do, as a project. As how a component could best work in a new world.
I'll be honest, I really don't know if I would or not -- would I subconsciously try to find some eay of preserving what was?
But being able to do this is certainly something to aspire to.
One of the reasons I am still at Microsoft -- not just some of the smarter people.
But also some of the wiser ones, too. :-)
This blog brought to you by ㊣ (U+32a3, aka CIRCLED IDEOGRAPH CORRECT)
One to file under the "I thought I was blind all those years until I suddenly realized my hat was two sizes too large" category. :-)
In the past I described in blogs like Some blog boggles and a few disclaimers my use of the term moronic wingnut and how some people really didn't like the term.
It turns out that at least one of the reasons might be the Safirishly suggested connection between the term wingnut and the extreme right wing side of politics.
Now although I did repost the Wanker-wingnut continuum, where the term wingnut is clearly used in this way, my mind was on something else entirely when I started using with the term.
(By the way, it looks like The Poor Man is back, though all of the old content is gone. Though still worth the read if that is your taste!)
For me it dated back to those RPS days I mentioned before.
It was a shift where someone in the trucks was doing something particularly stupid.
Not the ordinary kind of stupid one might expect from the type of person who voluntarily agrees to unload packages inside tandem trailers getting up to high temperatures for four hours for wages that really did not make it worth the time.
I mean really stupid stuff.
Like sending hazardous packages up the belt which led to a downward incline where packages would slide down to people sorting them out by destination.
This was NEVER supposed to be done, since if such a package turned out to have a leak, then some kind of hazardous substance would be leaking down on one of the sorters, which would be very dangerous.
Anyway, one day someone kept doing this, and Jason (the guy wearing a tie and supervising the shift), yelled down to the guy who was doing this an who thought it was such a hoot "Will ya cut it out, ya wingnut?!?" and for me, in this warehouse from the early '90's, the use of the word wingnut was born.
In a completely non-partisan way.
Anyway, the other day someone suggested to me that this kind of a read of the term wingnut might be why someone took the term moronic wingnut as being so offensive. You know, if they thought that I was dragging down republicans or something.
Just for the record, I swear that my usage pre-dates the bloggers phenomenon Willian Safire pointed out and that it is non-partisan and really an equal opportunity kind of an opportunity. Regardless of age, race, gender, political status, sexual orientation, sexual preference, hair color, eye color, or any other openly or covertly distinguishing feature.
It is an equal opportunity slam for when people misquote me. :-)
This post brought to you by + (U+002b, a.k.a. PLUS SIGN)
The other day, the question came in again.
And when I say the question, I mean THE question.
You know, the What Unicode version do you support? question.
Well, technically it was a slight variation, more of a What version of Unicode should we support? but clearly the same question is being asked. Someone running a program connected to (well, running on) Windows want to know what version of Unicode good programs connected to (well, running on) Windows support.
Unfortunately, not every question that is reasonable to ask is necessarily one that has a reasonable answer.
Sure, to start with there is everything I pointed out in the What Unicode version do you support? blog back in the end of 2005.
But there is a bigger issue here.
The issue is the fundamental difference between:
and to make matters worse te question is implicitly trying to take specific formal versions of the latter and trying to understand how it fits into the former.
Wanna know how it fits?
I'll tell you.
Slopily.
Unicode adds things for two reasons -- proposals for new scripts that are generally brought in as the proposals mature and algorithms and descriptive processes put in for a whole host of possible reasons.
Now obviously market forces can enter into the equation since strategic scripts can be fast-tracked and important algorithms can be pushed by companies that need solutions to deal with the market pressures they deal with. But these amount of nudges, to pushes to triage how quickly some things are looked it. These are the tactics of Unicode, not the strategy, which get to things in when they are mature and ready to be added to the standard in a formal version.
Now comapring that to the planning process by which Microsoft or any company chooses what languages to add locale or rendering or font or collation or formatting or pasrsing or word breaking support for itsn't like comparing apples and oranges.
It's like comparing apples and earmuffs, or other similarly different things.
The question is unfair in another way, too.
You see, a Unicode version is a complex cluster of characters and properties and algorithms that are released on a specific date.
Kind of like how a Windows or an OS X version is a complex cluster of applications and features and yes languages and fonts and so on released on a specific date.
Why is it reasonable to expect that the summary name of one (e.g. Unicode 4.1.0) would ever map to the other (e.g. Windows 5.1, aka Whistler, aka Windows XP) exactly? When in addition to all of the above differences you can't even look at dates/schedules and see connections?
Perhaps the answer is to provide the data for what each version of the product supports, so that the program connected to (well, running on) Windows knows what is available and therefore what has a potential of actually working.
Then the program connected to (well, running on) Windows can then do its own planning to decide what subset or complete set (or occasionally superset) they wish to themselves support.
Though in the end, each program connected to (well, running on) Windows will always start with the very same question.
What version of Unicode should we support?
This blog brought to you by ? (U+003f, aka QUESTION MARK)
I admit I was tempted to really do up the Destiny's Child lyrics completely just like the title, but I decided if I ain't going to write the IME below that it would be kind of obnoxious to put that much time into a Yankovikian tribute like that....
Less than half a day ago, conradoplg asked in the Suggestion Box:
In one of your blogs you explain a method to type random Unicode points with an unicode IME.But this is driving me nuts for some time now. Is there a way to enter random Unicode points by their *names*? I can't possibly know the hex code of all the characters!What I'm talking about is some kind of IME where you would type a character name and it would show a list of candidates where you could choose.Is there such a thing? Or more importantly, could one (who doesn't work for Microsoft :) ) implement it? Is it possible to write an IME?Thanks!
That just inspired me, so I pushed a bit of the rotation foward for it. :-)
Now there is no built-in way to do this, no.
But the Unicode Character Database, and more specifically the ~1.1mb UnicodeData.txt, is available.
Writing an IME can be really difficult. But there is that simple Table Driven Text Service mechanism I document at length in this series.
I don't know if its underlying code ever been tested with strings as long as some of the official character names in the UCD, and who knows what the performance would be like, but perhaps it would be worth generating the input method based on the names to see if it would work.
Worth a try at least, maybe. :-)
Now I don't know how useful it would be in practice given what so many of the names actually look like, but as long as we are keeping this all theoretical....
This blog brought to you by ﯹ (U+fbf9, aka ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM, what I think off the top of my head is the longest name in the UCD)
So the report that came in (about the Windows Keyboard Layouts discussed previously) was an interesting one:
For the Korean keyboard, I think the KOR and ENG buttons are reversed.
Now this is one of those interesting philosophical issues, where one has to decide if toggle keys on a keyboard that changes its display should display the current state of the keyboard or the state that one is moving to by hitting the toggle. Thus the current design:
Now one could argue that this is confusing, this use of the button text as showing the "current status" rather than showing the status when you press it.
It is how both the Japanese and Korean IME keyboard layouts work, with ToolTips that provide additional information when you hover:
where Kor is a reasonable abbreviation for Korean and Eng is a reasonable abbreviation for English
where ひら is a reasonable abbreviation for Hiragana (ひらがな) in Hiragana and カタ is a reasonable abbreviation for Katakana (カタカナ) in Katakana.
where 半 really does mean "half" and 全 really does mean "all" or "full".
There is a pattern here.....
Clearly this is not just a tiny mistake in a single layout -- this is an explicit, conscious design choice.
I can't compare against a soft keyboard like the Optimus since I couldn't get it to shown the Japanese layout.
Other keys like the CAPS LOCK key do not change to indicate their status other than to look pressed and not pressed -- in their own way they follow the same design since their display indicates the state they qre in, not the state that pressing them makes.
And this design choice has its uses -- like when they are put on websites (like this one) or in appendices in books (like mine, or DIS v.2) and the are in no way dynamic -- they become self-documenting.
But in the end I think this one really could be argued either way, really.
There is no right answer.
It's all a matter of perspective.... :-)
This blog brought to you by 半 (U+534a, a CJK UNIFIED IDEOGRAPH)
Over in the Microsoft VOLT Users Community, Pavanaja asked:
I heard that dynamic fonts -EOTs created by WEFT -don't work in IE8. Is that true?Link - http://channel9.msdn.com/forums/Coffeehouse/397944-IE8-beta-and-WEFT/ -Pavanaja [I posted this in WEFT Forum, but no one replied. Hence posting it here]
And colleague Sergey responded:
This thread is well outdated. EOT files did not work in new layout engine in IE8 Beta 1. But we just released Beta 2, which supports EOT fine.Thanks,Sergey
No font embedding is something I have covered before, like in that Rhymes with Amharic series.
Now the fix for the problem is very good news, obviously. :-)
Though there are always troubling aspects when stuff that was working stops doing so for any length of time.
The real problem is that neither the answer nor anything else gives insight into an actual RCA (root cause analysis) that would explain how/why it broke, in order to help understand the nature of the problem.
This insight can help guide trust (or mistrust!) in a technology in the future, so it can be a very powerful thing.
Lacking that understanding, one cannot do much more than guess wht is going on behind the scenes.
To be honest it is why most of the blogs I write are also blogs I would probably read were I not the author, since I try to bring some of those motivations into perspective when I know them.
I also like Blogs like that Engineering in Windows 7 one since they can give insights into the way problems are approached.-- and then with that knowledge and other clues you can assess the story beind the product, and whether/how much to trust it going forward.
Of course this gets back to that bug -- I have absolutely no insight whatsoever into the bug, how it was broken, how it was found, how hard the fix was, or the steps taken afterward to make sure the break couldn't happen again.
Without that information, how can I decide what it means for the technology? Shoudl I throw out the whole incident for insiufficient evidence, or should I treat it as a clue and take the fact that it was fixed, apparently not too long afetr it was reported, and move on?
Transparency. And Trust.
Getting insight into the people making such decisions can help one decide how to feel when one doesn't have all the information.
Now I trust Sergey, so in this case I do have some implicit trust based on that.
But I know that not everything is up to him every time. And I don't have nearly enough of a feeling about the IE team overall to know how this all fits.
Something to keep an eye on....
This blog brought to you by B (U+0042, aka LATIN CAPITAL LETTER B)
Martin asks:
Hello Mr. Michael!I've a problem with Arabic (FARSI) unicode. I searched all the web many days - without success. I also posted to a newsgroup (microsoft.public.de.vc) and there they couldn't help me by answering my question, but the linked me to you. So I was visiting your side, and I think .. They are right! "If anybody can help you, then M. Kaplan!" So I hope you can, and you feel like helping me - of course ok if not.My problem: I've an access db (mdb). There is one field with arabic names. When I fetch the data, I get this in 'normal" unicode (U+0600 – U+06FF) but I need the presentation forms unicode (U+FB50 – U+FDFF andor U+FE70 – U+FEFF) depending on the position where it is (init / medial / final / isolated). Hope you understand what I mean - my english isn't the best, also not my articulation.. It's possible to get it?Or maybe you can tell me how e.g. notepad is handle this. I mean, when I write arabic text in it, and I save the file, open binary then I can see that every char is in the 'normal' unicode. But it is displaying the string correct, all the chars in correct contextual form. How does it know which form it has to display?! Or is it calculating this by itself? (something like "if the last char = middle, then ......"?hmm .. Very complicated ..OK, I don't want to spam you with so much text. Maybe you can have a look at this and maybe you can give just a short hint or something like this .. Would be very great!Thanks A LOT (!!) in advance!With Regards,Martin
This question is one that I have talked about various aspects of before, in blogs like
The simple fact is that the model used in Unicode is to not force a person to have up to four different ways to display every letter, and requiring them to choose the letter to use each time.
Because like I point out in that very first blog, this is not such a great way to do things.
Now as to how this model works, well for that one might want to dig into OpenType, specifically Developing OpenType Fonts for Arabic Script.
Going the other way (into the conpat. zone) isn't a good idea, and not just because there is no good way to do it -- but also because they will find many bugs since Arabi will often have even more forms than those four in order to havae appropriate connections happen between certain letters -- the kind of thing that the compatibility zone approac just can't handle....
This blog brought to you by ﹰ (U+fe70, aka ARABIC FATHATAN ISOLATED FORM)
So the question that Tom asked was:
Hi Michael,your blog is really amazing.I have a question: I only have a character, like "A", and I need to find out the VK for the given keyboard layout. I have been struggeling with this topic for months without a solution. I have even made a look-up table which I feel is really weird for such a problem.Can you help?Kind regards
Well, the easiest place to start is with Michael's 3rd Keyboard Law for Developers, which is that Not every keyboard contains every character.
Though some might suggest VkKeyScan or VkKeyScanEx, I'll just remind people in a series of blogs such as:
to relay how I feel about those functions.
The real answer is that there isn't a real answer here except maybe doing all the work in the Getting all you can out of a keyboard layout series until and unless you find the letter you are looking for....
Should this be easier? Well, some people seem to think so. Though to be honest nothing is stored to do lookups this way, so the only way for some function to be added that makes this easy is for that function to do all the hard stuff in its implementation. And getting a new function added to Win32 really requires a compelling scenario, believe me1. But this case, I unfortunately just don't see the huge compelling scenario....
1 - I'll mention some point the new function I petitioned for in the next version of Windows and the excitement there!
This blog brought to you by A (U+0041, aka LATIN CAPITAL LETTER A)