Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
(I suspect that there will be nothing technical in this post)
I am going to ramble a bit so if that sort of thing bothers you then go ahead and skip this post. :-)
My Italian is less than perfect, though I believe that the title of this post roughly translates to "When god wishes to punish us, our prayers are answered."
I like to keep this in mind sometimes. It keeps me from being tempted to pray for anything....
It was probably a little under three decades that I first "got" MS, and it has been a little over a decade and half since the first stab at a diagnosis took place. And the way I have felt about the disease has certainly changed quite a bit over that time. But some things have stayed pretty consistent.
For example, it generally does not depress me. Maybe it just my cynical nature (one of my really good friends in high school, she used to point what a cynical bastard I was, even back then), but as diseases go there are lots that can be a lot worse. I have had friends pass away from accidents, a lover pass away from a brain tumor, and I have a couple of friends now who have terminal illnesses. So I figure who the frak am I to get depressed about my lot in life, right?
And also, there is the fact that I don't mind new symptoms. They have often been interesting, have usually been weird, and have occasionally been fun. Although I never had a twin, I understand a bit about the whole "language of twins" thing when I am talking to someone else with M.S. about this thing that not even a good neurologist can always grok as well as someone who is going through it.
But there are other things that have changed over the years.
Like exacerbations.
At first I hated them, since they were a bit more involved than some odd symptoms, and could sometimes hit me hard. They were a psychological reminder that the occasional weird feelings were not like some sort of high without drugs, they were a disease.
Then there was a period of time where my ex called them exasturbations and I never had the heart to correct her. For one thing, it is never good to make the person you are dating feel foolish, and for another having her tell me that it was exasturbation was a normal part of my life? It just made me smile each time and (immature geek that I am) it never got old -- I would sometimes not mind getting an execerbation because it would mean I'd have to bite my tongue to keep from smiling when she looked at me concerned and asked if I was having an exasturbation....
I think I waited a few years before I actually said anything. I probably ought to grow up a bit, but I still laugh when I think about it.
As things stand now now, when I sit somewhere in between relapsing/remitting-secondary progressive and primary progressive M.S. where I don't know whether to blame the lack of exacerbations on the fact that I am doing well four doses into a Novantrone course or that I have finally moved into what the Novantrone docs like to call worsening M.S., a stupid name given the fact that the one thing all forms of M.S. do is worsen to some degree.
Anyway, before each Novantrone dose the folks in the hospital always gave me a Zofran pill for nausea, though I hadn't actually ever felt nauseous. My neurologist even gave me a prescription for it which I had never used since I just didn't feel like throwing up.
Though this last time, I was feeling a little nauseous, so I was taking the Zofran. Three times a day. I always waited until I was feeling a little nautious again. I actually had to refill the prescription that I was just a month away from having the pills expire.
I began wondering if one of the side effects of Zofran was nausea? I meen, its not as funny as saliva causing stomach cancer (only when it is taken slowly over a long period of time!), but still it would be kind of ironic, in the Katie sense, if you know what I mean.
And I was feeling some disequilibrium again. My long time M.S. symptom. I ended up calling my neurologist between the two problems, and she called my back pretty fast (I think it's because I don't complain much, so anything I call about must be serious?), and we decided that they really were unlikely be to related. The nausea would hopefully go away soon, and after I described it at greater length, that the disequilibrium may be an actual exacerbation.
Somehow it made me feel good. The knowledge that I was perhaops still in that secondary progressive category. Where the uncertainty of not getting one was somehow worse now. Where it was better to know the M.S. was out there, in the background somewhere.
I haven't been praying in the past to not get exacerbations, and I don't pray now to get them. But I think now about how if I had been doing so, and if she had been listening, and if she decided to answer the prayers, that it would really have been a punishment of sorts. I want it out of my hands so I can react to it without fear that my desires have actually influenced the course of the disease.
I had a friend a few years back who I would visit in Orem, Utah. I once went with her to hear one of the quorum of the twelve apostles of the LDS chuch speak, and I remember him talking about how he used to find himself praying for little things like finding a good parking spot and then saying a prayer to give thanks afterwards. I remember my friend saying that she understood why she was having such a hard time getting her prayers answered -- God was too busy finding Jeffrey R. Holland his parking spaces!
Of course she was joking. But I remember thinking later about how disturbing it would be to feel like prayers were answered that immediately -- especially in light of that Italian proverb in the title of this post.
It is probably better for me that there is some distance, I think. Especially because anything I'd ask for in relation to my disease would almost certainly be worse for me -- there is never a time when an exacerbation is a good thing, truly there isn't. But if I never have another one again then technically I would have primary progressive M.S., which isn't all that good either from a morale standpoint. So either of them can be a punishment, and it might be good to keep the whole thing out of the realm of prayer, all things considered.
Perhaps this is not very logical. Sorry about that, it seemed much clearer in my head than it looks now that I am reading this post.
I'll post about something interesting tomorrow, I promise. :-)
Yesterday when I was talking about WYSIWYG font dropdowns and using the GetGlyphIndices function to determine if a font supported a character or set of characters, regular reader Mihai mentioned in a comment:
GetGlyphIndices does not handle chars outside BMP (true in XP SP2 and an older Vista build)
This is true, it doesn't. As a function it looks at each individual UTF-16 code unit and checks to see if it is in the font.
Now in theory you could create a font that puts surrogate code points in the ligature table that might work here, but (a) that won't help GetGlyphIndices since it will just globally say the high surrogate and low surrogate are there and not whether there is an actual ligature defined, and (b) it is not generally considered good typographic practice to create an "off the BMP" font that way.
(We don't mind it so much in keyboards, but that's a story for another day!)
There is actually a real solution that can handle things off the Basic Multilingual Plane -- the ScriptGetCMap function, which is described as being able to retrieve "...the glyph indexes of the Unicode characters in a string according to either the TrueType cmap table or the standard cmap table implemented for old style fonts."
And it will do the extra work with supplementary characters defined as surrogate pairs, and it is even easier than GetGlyphIndices in terms of determining whether the font supports the string since it will simply return S_FALSE if one or more of the code points were mapped to the default glyph.
There is one warning in the documentation which is theoretically troubling:
Note that some code points can be rendered by a combination of glyphs as well as by a single glyph — for example, 00C9; LATIN CAPITAL LETTER E WITH ACUTE. In this case, if the font supports the capital E glyph and the acute glyph but not a single glyph for 00C9, ScriptGetCMap will show 00C9 is unsupported.
Although in practice the situation is very uncommon, because in general any time a particular composite sequence is suppprted in a font, the precomposed character (if it exists) is also supported. It is just the way things usually work in fonts, and it is much less common for the precomposed character to not be supported.
Though there is info in the docs on how to handle that situation if one suspects it may be happening, anyway, via the ScriptShape function....
Now I was talking to Peter about this whole issue yesterday, and he pointed out that people could simply special case their code to use GetGlyphIndices for BMP cases and ScriptGetCMap for when things are off the BMP (in truth GDI handles nothing off the BMP anyway, so it is hardly an artificial split -- if you are using supplementary characters, you are using complex scripts as far as GDI is concerned).
Though in truth if one is going to use Uniscribe here, it seems using it all the time when it is available is probably better, or at least more consistent. And why not use Uniscribe in little ways explicitly if it is going to be use implicitly in bigger ways like rendering anyway? :-)
This post brought to you by ফ (U+09ab, a.k.a. BENGALI LETTER PHA)
So, the question that Bill asked was clear:
I have a WYSIWYG font dropdown in my toolbar. Some international fonts (like Estrangelo Edessa) show up as squares, apparently because they don’t support all Unicode ranges. I need to detect these fonts and display them in the WYSIWYG list using Tahoma (like Office does). What’s the best solution? My app is a managed app but I’m assuming I’ll need to use native interop to solve this problem.Thanks,Bill
The list he is referring to is like the one you see in Word's font dropdown:
Or even in Outlook's mail message dropdown:
So, the real question here is even simpler: "How can one determine if a font supports the characters in a string?"
The easiest way is via the GetGlyphIndices function, which has in the initial description:
The GetGlyphIndices function translates a string into an array of glyph indices. The function can be used to determine whether a glyph exists in a font.
So you can pass the name you were going to use to the function (making sure to pass the GGI_MARK_NONEXISTING_GLYPHS flag), and then look at the glyph indices for the 0xffff marker.
If it shows up for any of the characters, then you know you should use some other font.
Now the Office dropdown has a few more tricks up its sleeve that I scrolled to avoid, so I should probably mention them now....
If you look, some of the fonts have some additional stuff next to the names:
All of the Hebrew fonts have the first eight letters of the Hebrew alphabet (U+05d0 to U+05d7), and all of the symbol fonts have a sampling of the symbols. And this extra feature is done even when there is a fallback done for the font name itself (due to the letters in the name not being supported).
Incidentally, Arabic fonts do the same sort of thing that the Hebrew fonts do....
Now it is easy enough to detect a symbol font, but figuring out that a font is specifically a Hebrew or an Arabic font (e.g. Miriam Fixed or Arabic Transparent) rather than just a font that contains Hebrew or Arabic (e.g. Arial Unicode MS) is a bit more complicated.
I have no idea how they actually do it -- perhaps they are calling the GetFontUnicodeRanges function, perhaps they are grabbing the FONTSIGNATURE-like info that is inside the font and using it to find out. Though of course this is the kind of feature that one could quickly go wild with in showing the characters of many different language-specific fonts, from Tibetan to Sinhalese.
Paradoxically, the only "boring" fonts on the list would be ones that support multiple scripts!
At some point, there are also performance concerns, plus there are people who find the list somewhat visually distracting (not to mention the list as it exists in Word has an obvious bias for the myriad of ways one might display the Latin script!)....
This post brought to you by ܨ (U+0728, a.k.a. SYRIAC LETTER SADHE)(A character that is well-supported by Estrangelo Edessa!)
Peter Constable asked some Unicode folks:
I’m just curious to know why 0f77 and 0f79 were given compatibility decompositions rather than canonical decompositions? (I don’t see any obvious reason why canonical decompositions would not have been feasible.)(Yes, I know this can’t be changed – that’s not my objective.)Peter Constable
And Ken Whistler stepped up with a good historical look at these two characters (which in my humble opinion deserves a more permanent location for others to see!):
0F71;TIBETAN VOWEL SIGN AA;Mn;129;NSM;;;;;N;;;;;0F76;TIBETAN VOWEL SIGN VOCALIC R;Mn;0;NSM;0FB2 0F80;;;;N;;;;;0F77;TIBETAN VOWEL SIGN VOCALIC RR;Mn;0;NSM;<compat> 0FB2 0F81;;;;N;;;;;0F78;TIBETAN VOWEL SIGN VOCALIC L;Mn;0;NSM;0FB3 0F80;;;;N;;;;;0F79;TIBETAN VOWEL SIGN VOCALIC LL;Mn;0;NSM;<compat> 0FB3 0F81;;;;N;;;;;0F80;TIBETAN VOWEL SIGN REVERSED I;Mn;130;NSM;;;;;N;;;;;0F81;TIBETAN VOWEL SIGN REVERSED II;Mn;0;NSM;0F71 0F80;;;;N;;;;;0FB2;TIBETAN SUBJOINED LETTER RA;Mn;0;NSM;;;;;N;;*;;;0FB3;TIBETAN SUBJOINED LETTER LA;Mn;0;NSM;;;;;N;;;;; NFD NFC0F76 0FB2 0F80 0FB2 0F800F77 0F77 0F77 <-- discouraged (strongly)0FB2 0F71 0F80 0FB2 0F71 0F80 0FB2 0F71 0F80 <-- preferred0F78 0FB3 0F80 0FB3 0F800F79 0F79 0F79 <-- discouraged (strongly)0FB3 0F71 0F80 0FB3 0F71 0F80 0FB3 0F71 0F80 <-- preferred0F80 0F80 0F800F81 0F71 0F80 0F71 0F80 <-- discouraged0F71 0F80 0F71 0F80 0F71 0F80 <-- preferredNote that the preferred forms appear in both NFD and NFC, with the decomposed form for 0F81 resulting from the non-starter exclusion and the decomposed forms for 0F76 and 0F78 resulting from explicit addition to the script-specific composition exclusions.If you gave 0F77 and 0F79 *canonical* decompositions, then:0F77 --> <0FB2, 0F81> --> <0FB2, 0F71, 0F80> 0 0 0 0 129 130 0F79 --> <0FB3, 0F81> --> <0FB3, 0F71, 0F80> 0 0 0 0 129 130 NFD NFC0F76 0FB2 0F80 0FB2 0F800F77 0FB2 0F71 0F80 ???? <-- discouraged (strongly)0FB2 0F71 0F80 0FB2 0F71 0F80 0FB2 0F71 0F80 <-- preferred0F78 0FB3 0F80 0FB3 0F800F79 0FB3 0F71 0F80 ???? <-- discouraged (strongly)0FB3 0F71 0F80 0FB3 0F71 0F80 0FB3 0F71 0F80 <-- preferred0F80 0F80 0F800F81 0F71 0F80 0F71 0F80 <-- discouraged0F71 0F80 0F71 0F80 0F71 0F80 <-- preferredNow you've made your life more difficult and normalization implementations maybe more complex. The decompositions <0FB2, 0F71, 0F80> have to be prevented from recomposing. They won't decompose partwise, because <0F71, 0F80> is blocked from recomposing, and <0FB2, 0F80> is also blocked from recomposing, but the sequence of 3 has, at least in principle, a target it should recompose to, unless blocked. Depending on how you set up your tables, you might or might not get this right, and in any case, you end up introducing the strongly discouraged characters as a source of valid sequences that you have to contend with in NFC and NFD, whereas under the current scheme you don't.Also, this was all part of a very head-breaking set of problems for Tibetan when decompositions and canonical combining classes were being reviewed for the introduction of normalization in the first place.In Unicode 2.0, 0F77 and 0F79 *were* given canonical decompositions, but they were *different* decompositions, to wit:0F77 = 0F76 + 0F71 = 0FB2 + 0F80 + 0F710F79 = 0F78 + 0F71 = 0FB3 + 0F80 + 0F71*and* they had funky fixed position class assignments, as well:0F77 = 0F76 + 0F71 = 0FB2 + 0F80 + 0F71 (not in canonical order) 135 134 129 6 143 129 0F79 = 0F78 + 0F71 = 0FB3 + 0F80 + 0F71 (not in canonical order) 137 136 129 6 143 129 That was clearly hosed, as it broke all kinds of rules that we were trying to establish for normalization, including ensuring that all decomposition mappings produced sequences in canonical order and ensuring, as much as was possible, given the constraints in place, that the resulting sequences would follow the logic of the script *and* that NFC forms would decompose if that was what the users of the script preferred (hence the introduction of script-specific composition exclusions for several scripts, including Tibetan).During that conversion from Unicode 2.0 to Unicode 3.0 with normalization, the UTC did the best it could with the mess for Tibetan. It was clear after the analysis that 0F77 and 0F79 should never have been encoded at all -- which was why they got those "strongly discouraged" labels -- but there was nothing to do about that mistake at that point. The compatibility decompositionswere the best compromise to keep them from contaminating the normalization processing of the rest of the Tibetan vowels.--Ken
Anyway, like I said, this seemed to me like good historical information to put out there, and certainly to help show that Every character has a story!
This post brought to you by ཷ and ཹ (U+0f77 and U+0f79, a.k.a. TIBETAN VOWEL SIGN VOCALIC RR and TIBETAN VOWEL SIGN VOCALIC LL)
Earlier today when I talked about Keyboards under LUA, i was very focused on installing the keyboard layouts that MSKLC creates under Vista.
And I did talk a bit about what might be a Vista bug.
But I didn't talk at all about Microsoft Keyboard Layout Creator itself. And it too has a bug that affects Vista most prominently (but can affect earlier versions of Windows, too).
Let's say that you go to the MSKLC download page and download MSKLC.EXE. When you run the program, it pops up a nice dialog that clearly suggests wht you might want to do:
Unfortunately, if you hit the Setup button, you will then get another dialog:
Huh? Doesn't Vista include the .NET Framework? What the hell is this dialog talking about, anyway?
The problem here, which is actually a bug in the MSKLC setup, is that it is looking for the 1.0 or 1.1 version of the .NET Framework specifically. It is not thinking that the 2.0 version is the right one.
In my defense, I was using the built-in stuff in Visual Studio setup projects to build the dependency. Maybe I forgot a setting or something. :-)
In any case, if you click Yes on that dialog, you will be taken to the SDKs, Redistributables, & Service Packs page of the .NET Framework Developer Center. This too is all of the built in stuff from Visual Studio setups, which I have to admit is fairly cool.
Unfortunately, that link contains only 2.0 versions of the .NET Framework for download. And if that were good enough, then you wouldn't have had the problem in the first place!
Many people actually try installing that 2.0 downloiad, only to be told that it is already installed. And now they are really confused....
(By the way, that seems like a bug in the page content to me; if Visual Studio 2003's setup projects create links to a download page, then that page ought to at least have a link to that version's .NET Framework!)
In any case, the original page that comes up after you download MSKLC includes a What Others are Downloading list. Today that list looks like:
Others who downloaded Microsoft Keyboard Layout Creator (MSKLC) Version 1.3.4073 also downloaded: Microsoft .NET Framework Version 2.0 Redistributable Package (x86) Microsoft .NET Framework Version 1.1 Redistributable Package MDAC Utility: Component Checker Microsoft .NET Framework 1.1 Service Pack 1 IntelliType Pro 5.5 Keyboard Software for Windows
Others who downloaded Microsoft Keyboard Layout Creator (MSKLC) Version 1.3.4073 also downloaded:
Note item #2 on that list (the fact that it is #2 is because of that bug in the redist page content I mentioned above) and the fact that even if the 1.1 download were on the page there is no clear indication that it is what you need.
Hopefully this blog post will help turn it around a little bit? :-)
Interestingly enough, after you install the 1.1 .NET Framework and install MSKLC, you can go to Add or Remove ProgramsPrograms and Features in the Control Panel and uninstall the 1.1 .NET Framework. But Microsoft Keyboard Layout Creator will still run just fine. It is just a silly, setup-time restriction that has nothing to do with the actual functionality of MSKLC....
This post brought to you by ෛ (U+0dd8, a.k.a. SINHALA VOWEL SIGN KOMBU DEKA)
People have been noticing that attempting install an MSKLC-generated keyboard layout on Vista has some problems. Say you take your handy dandy custom Bulgarian keyboard created by MSKLC:
You try to double click on the Bulg.msi file, and you may or may not get the LUA dialog:
However, whether you get it or not, and no matter which option you choose if you do, the final result is unfortunately the same. :-(
Now it may or may not be a bug that if you are running a recent enough build to get the LUA dialog that clicking "Allow" doesn't allow the install to proceed. We'll figure that out later.
For now, the easy workaround!
Just run from an elevated command prompt, by right clicking on the Comand Prompt option on the Start Menu and choosing to Run as administrator:
After saying Continue when that User Account Control dialog comes up:
you will then have a command prompt that you can run Bulg.msi from, and it will install just fine!
Don't you love happy endings?
Now LUA (Least-privileged User Account) or UAC (User Account Control) is serving an important purpose here -- to make sure that you are okay with the changes this installer tries to make to your system -- a file in your system32 directory, and registry keys under KEY_LOCAL_MACHINE. So it is a pretty invasive little bugger, an MSKLC generated keyboard layout install. The key is making sure that you give it the right permissions to get its work done. :-)
Now the other question, the one I tabled earlier -- whether, if you say "Allow" to a LUA dialog should it still fail? I assume this would be a bug (the same thing happens if I run msiexec /i Bulg.msi from an unelevated command prompt). Probably someone ought to fix that. :-)
This post brought to you by K (U+004b, a.k.a. LATIN CAPITAL LETTER K)
You know the drill, if you have seen any of these other previous posts:
Microsoft has done it again, though. This time with Tswana!
Some more information about the language:
Number of speakers: 4 million Name in the language itself: Setswana Tswana is one of the 11 official languages of South Africa and is spoken mainly in the North West province, by about 3 million speakers. It is also national language in Botswana where 70% of the population (or 1 million) speak it. Tswana, often called Setswana, has also been known under so different names as Beetjuans, Chuana (hence the name "Bechuanaland" for the British protectorate that became Botswana), Coana, Cuana, and Sechuana. Classification: Tswana belongs to the Bantu languages which in turn are part of the huge Niger-Congo language family. It is most closely related to Sesotho and Sesotho sa Leboa. Script: For Tswana, the Latin alphabet is being used.
Number of speakers: 4 million
Name in the language itself: Setswana
Tswana is one of the 11 official languages of South Africa and is spoken mainly in the North West province, by about 3 million speakers. It is also national language in Botswana where 70% of the population (or 1 million) speak it.
Tswana, often called Setswana, has also been known under so different names as Beetjuans, Chuana (hence the name "Bechuanaland" for the British protectorate that became Botswana), Coana, Cuana, and Sechuana.
Classification:
Tswana belongs to the Bantu languages which in turn are part of the huge Niger-Congo language family. It is most closely related to Sesotho and Sesotho sa Leboa.
Script:
For Tswana, the Latin alphabet is being used.
Enjoy!
This post brought to you by T (U+0054, a.k.a. LATIN CAPITAL LETTER T)
So the question that Matt asked was something like this:
Why is it that calling GetUnicodeCategory on character '00ad' returns a category of DashPunctuation? I expected the Unicode 4.1 category of Format. Here is the code I used: using System;using System.Text;namespace ConsoleApplication2 { class Program { static void Main(string[] args) { TestChar(Convert.ToChar(Convert.ToInt32("00ad", 16))); } public static void TestChar(char testing) { Console.WriteLine("Categorys for char 0x" + Convert.ToInt32(testing).ToString("x4")); Console.WriteLine("\tUCD Category : " + Char.GetUnicodeCategory(testing)); Console.WriteLine("\tNLS+ Category : IsPunctuation=" + Char.IsPunctuation(testing)); } }} Do you know what is going on here?
Why is it that calling GetUnicodeCategory on character '00ad' returns a category of DashPunctuation? I expected the Unicode 4.1 category of Format. Here is the code I used:
using System;using System.Text;namespace ConsoleApplication2 { class Program { static void Main(string[] args) { TestChar(Convert.ToChar(Convert.ToInt32("00ad", 16))); } public static void TestChar(char testing) { Console.WriteLine("Categorys for char 0x" + Convert.ToInt32(testing).ToString("x4")); Console.WriteLine("\tUCD Category : " + Char.GetUnicodeCategory(testing)); Console.WriteLine("\tNLS+ Category : IsPunctuation=" + Char.IsPunctuation(testing)); } }}
Do you know what is going on here?
Indeed, the problem is sort of the root of the title of this post. The simple fact is that not all GetUnicodeCategory() methods are created equal!
There is char.GetUnicodeCategory that has been around all along, and starting with version 2.0 of the .NET Framework there is CharUnicodeInfo.GetUnicodeCategory, which I have talked about previously (though in fairness not with 100% accuracy!).
They are different for some sort of backcompat reason related to programs that are just assuming certain behavior. Let's see how different they are, with code like this:
using System;using System.Text;using System.Globalization;namespace ConsoleApplication2 { class Program { static void Main(string[] args) { for(ushort ich = 0x0000; ich < 0xffff; ich++) { UnicodeCategory ucC = char.GetUnicodeCategory((char)ich); UnicodeCategory ucCui = CharUnicodeInfo.GetUnicodeCategory((char)ich); if(ucC != ucCui) { Console.WriteLine("{0}\t{1}\t{2}", ich.ToString("x4"), ucC, ucCui); } } } }}
If you run it, it will, amazingly enough, return just that one character that is different:
00ad DashPunctuation Format
Although in the future, the plan is that CharUnicodeInfo.GetUnicodeCategory will be updated when Unicode is, while char.GetUnicodeCategory will usually also be updated though occasionally there may be some sort of application dependency that would force it to not change for specific characters.
Kind of a whole new way to solve the consistency vs. correctness argument -- support both? :-)
This post brought to you by U+00ad, a.k.a. SOFT HYPHEN
A few days ago, Eric Rucker posted about the various limits in Access 2007 in this post, including my favorites:
Number of characters in a table name 64 Number of characters in a field name 64 Number of characters in a text field 255 Number of characters in a validation rule 2048 Number of characters in a validation message 2048 Number of characters in a record (excluding Memo and OLE Object fields) when the UnicodeCompression property is set to Yes. 4000 Number of characters in a field property setting 255 Number of characters in an object name 64 Number of characters in a password 14 Number of characters in a user name or group name 20 Number of characters in a cell in the query design grid 1024 Number of characters in a parameter in a parameterized query 255 Number of characters in a SQL statement Approx 64,000 Number of characters in a label 2048 Number of characters in a text box 65535 Number of characters in a SQL statement that serves as the Recordsource or Rowsource property of a form, report, or control. 32750 Number of characters in a condition 255 Number of characters in a comment 255 Number of characters in an Action Argument 255
As to why these are my favorites?
Well, because each use of "Number of characters" actually ought to read "Number of UTF-16 code units" since e.g. the string
ểểểểểểểểểểểểểểểểểểểểểể
is actually 66 of those types of "characters" long even though any user would reasonably count it up as being of length 22.
Unfiortunately, Access and Jet don't consider that object and one with this name:
ểểểểểểểểểểểểểểểểểểểểểể
(the first is in normalization form D, the second in form C).
Anyone want to calculate the number of possible unique object names one can create with just this one character? :-)
This post brought to you by ể (U+1ec3, a.k.a. LATIN SMALL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE)
When I first mentioned yesterday that I was going to be blathering about the Fonts folder for a bit, regular reader Rosyna commented:
It'll be interesting. I know virtually nothing about the Windows fonts folder. Many people hate the Mac OS X's multiple fonts folders (System, "Local", Network, User, and Classic). It becomes hard to deal with everything has having multiple duplicate fonts is very possible and almost the case on every single Mac OS X box. The good thing though is that any font dragged into a Mac OS X font folder is available to all applications immediately.
I suddenly realized I wanted to start things a bit differently than what I was originally going to do.
(For those of you who hate the "Add Font..." dialog, wait for the next post -- I'll be getting to that shortly!)
So, we'll start with simple question: What the hell is the Fonts folder?
The simple answer? It is a Shell namespace extension that causes any display of %WINDIR%\Fonts in the Windows Explorer to have a special view where typical filesystems actions in Explorer such as copy, move, and delete have special handlers.
It is by no means the only way to get fonts ontop Windows, though.
If you look at the Platfrom SDK documentation for the AddFontResource function:
The AddFontResource function adds the font resource from the specified file to the system font table. The font can subsequently be used for text output by any application.
So all you have to do to get a font available to everyone is call this function!
Of course there are things to keep in mind in the remarks section:
Any application that adds or removes fonts from the system font table should notify other windows of the change by sending a WM_FONTCHANGE message to all top-level windows in the operating system. The application should send this message by calling the SendMessage function and setting the hwnd parameter to HWND_BROADCAST. When an application no longer needs a font resource that it loaded by calling the AddFontResource function, it must remove that resource by calling the RemoveFontResource function. This function installs the font only for the current session. When the system restarts, the font will not be present. To have the font installed even after restarting the system, the font must be listed in the registry.
Wow, those three paragraphs are like three strong drinks, aren't they? They tell you:
That work in the third bullet point is to add the font's name and filename to the registry in the following subkey:
HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Fonts
And although most of the files listed there will have no path (in which case %WINDIR%\Fonts is assumed), this is not a requirement.
All that our Shell Extension Fonts folder does (well, it does more than this, but bear with me for a bit) is wrap all this up. In fact, it is GDI that makes an AddFontResource call on every font in that registry key when it does its own initialization, and it is the folder's handling code that adds and removes fonts from the list in the registry.
But all of the hints are here -- anyone who hates the Shell Fonts folder interface to these things can build their own code. They can even make it a Shell Extension wherever they like, and put in all the same kind of handling code, or better.
Now the Shell code does a bit more here -- for example you can install a font programatically by opening the Fonts folder in Explorer and simply copying the font file to the folder (more on why this is not the best way to do it later!). It also gives some unique views of fonts that are in the folder (more on these later, too), and it completely hides hidden files even if you have the "Hide protected operating system files" setting unchecked (which you would do as well if you saw what was in there!). And lots of other little things like that.
But clearly the widespread belief that fonts need to be in the Fonts folder (which requires admin privileges to write to) is a myth. The only action that truly requires admin-style privileges is writing to that registry key (since it is under HKLM), but just about everything else is a bit more open to people.
The myth is a tribute to What it means to be in the default install, and how hard it is to get noticed as a feature if the feature is not available in some default, built-in fashion.
Imagine for example if the Fonts folder was actually further virtualized into supporting both the current folder in %WINDIR%\Fonts and some kind of per-user folder like under %USERPROFILE%\Fonts or something -- so that anyone could install fonts and you would either have the "just me" or "all users" choice, or if you lacked permissions it would be a "just me" install automatically.
For now one will have to keep imagining (there is no such feature in Vista, for example).
But I will be talking more about the Fonts folder in future posts, specifically to cover:
So stay tuned, there will be much more for font users, typographers, people who find my posts witty, Microsoft bashers, basically just about everyone....
This post brought to you by F (U+0046, a.k.a. LATIN CAPITAL LETTER F)
Eric Sassaman (the MVP lead who has helped to keep me and my team involved whenever the MVP Summit has been happening at Microsoft) sent me some mail the other day about a cool project (currently in beta) that Dr. Joseph M. Newcomer put up on his website.
(the site is very cool in general, and have enjoyed this glimpse into a great mind of computer science and also of typography and other interests!)
You can check it out The Locale Explorer here.
And every single developer, tester, and program manager on my team involved with NLS support on Windows will definitely want to take a look here -- this project can be thought of a great example of a smart developer who has taken the time (though docs and actual usage) to understand the NLS API, to the point where any time he gets something wrong, we as a team should perhaps assume that is actually we were the ones who got some sort of explanation wrong that contributed to the problem....
And although, as he says in his disclaimer at the bottom of the page:
The views expressed in these essays are those of the author, and in no way represent, nor are they endorsed by, Microsoft.
I'll have to say that I highly recommend taking a look at this fascinating deep view of NLS support on Windows from someone outside of Microsoft
I've been wanting to do a post about the font folder for a while now, and Simon Daniels has given me some additional ideas that will stretch that into two or maybe three posts.
Just thought I'd give you all a heads up on what I was going to be posting about tomorrow. It wil be firmly entrenched in the Windows fonts folder and installing fonts in general. :-)
Well even if they have to go across the border to find people who grow beets, they do have LIPs now.... :-)
Previous posts in the series:
And now you can add Luxembourgish to the list of Language Interface Packs!
A bit about Luxembourgish....
Number of speakers: ~300,000 Name in the language itself: Lëtzebuergisch Luxembourgish is spoken in the small Western European country of Luxembourg where it is official language since 1984 (together with French and German). Though so closely related to German (see "Classification") that German speakers should have no major problems understanding Luxembourgish, the differences between the two languages in terms of grammar are considerable. Luxembourgish has also borrowed many words from French (from merci for thank you to Prabbeli from parapluie for umbrella). Interesting facts: Though Luxembourg is founding member of the European Union, Luxembourgish is not an official language of the EU. In Luxembourg itself laws are not published in Luxembourgish either. Classification:Strictly linguistically spoken, Luxembourgish is a West Central German dialect - but due to its standing it can be considered a language on its own (As the saying goes, "A language is a dialect with an army"). Luxembourgish as a Germanic language belongs to the family of Indo-European languages. Script:Luxembourgish is written in Latin script. There are four special characters: é, ä, ë and ü.
Number of speakers: ~300,000
Name in the language itself: Lëtzebuergisch
Luxembourgish is spoken in the small Western European country of Luxembourg where it is official language since 1984 (together with French and German). Though so closely related to German (see "Classification") that German speakers should have no major problems understanding Luxembourgish, the differences between the two languages in terms of grammar are considerable. Luxembourgish has also borrowed many words from French (from merci for thank you to Prabbeli from parapluie for umbrella).
Interesting facts: Though Luxembourg is founding member of the European Union, Luxembourgish is not an official language of the EU. In Luxembourg itself laws are not published in Luxembourgish either.
Classification:Strictly linguistically spoken, Luxembourgish is a West Central German dialect - but due to its standing it can be considered a language on its own (As the saying goes, "A language is a dialect with an army"). Luxembourgish as a Germanic language belongs to the family of Indo-European languages.
Script:Luxembourgish is written in Latin script. There are four special characters: é, ä, ë and ü.
Enjoy! :-)
This post brought to you by é (U+00e9, a.k.a. LATIN SMALL LETTER E WITH ACUTE)
It was a very sad thing when Mike forwarded me the link the other day --
Lego to lay off 1,200, end U.S. production
Though Conan O'Brien had an amusing take on it tonight, pointing out that perhaps they would not be shutting down the US plant in Enfield, CT and moving it to Mexico had they not made the factory so easy to take apart and put back together again? :-)
As someone who grew up with Lego, I am less worried about the move than I am about the fact that Lego seems to be in trouble beyond what they will save with thus particular relocation -- given the plan to get rid of another 900 employees in Denmark over then next few years.
I hope that Lego's plans to re-invent itself are successful. If anyone can rebuild themselves, then the coolest building blocks in the world have the best chance!
Overheard in microsoft.public.win32.programmer.inernational:
Hi Guys,Sorry for posting this. I have gone through lots of postings here and unable to get an answer. Here is my problem.I am using Windows XP Pro English.I have a VB dll that passes strings to COM DLL (written in VC++). VC DLL converts the string to ANSI before writing to a file. The VC DLL has function called AddData that expects 2 parameters - One is the string to write and second parameter is length of the string.Sample Code -s="Write this to file"MyFile.AddData(s,len(s))ResultContents of file - "Write this to file"Well, everything was fine till the application was supposed to write chinese characters as well. I first tried to do the same on my machine and could not generated any chinese characters (only ???). I was then asked to configure my machine to display chinese characters (the way chinese guys are doing it). This is what I did -1. Install East Asian language support.2. Added Chinese (PRC) language to input/keyboard language (With US Keyborad layout being default)3. Changed the non-unicode language from default to Chinese (PRC)Sample code -s="Write 测试this to file"MyFile.AddData(s,len(s))ResultContents of file - "Write 测试this to fi"The number of characters lost are same as number of Chinese characters in the string.My Analysis -Len returns the number of characters in a string and not the number bytes that will be written to the file.LimitationsI cannot change the VC COM dll as we do not own it.My questions -1. Why is the string length not reported correctly?2. How can I get the actual length of the string?CheersSiva
The problem here is that strings in VB are Unicode, and the Len() function returns a count of UTF-16 code units in that Unicode string.
To work around this, if you want the length in bytes of the string in the default system code page, you need to try something different, such as:
s = "Write 测试this to file"MyFile.AddData(s, LenB(StrConv(s, vbFromUnicode))
This will convert the string using the default system locale's code page, and pass the length of that string in bytes to the AddDate method....
Note the dependence on the default system locale, which will not be correct if the conversion being done in the VC++ COM DLL is not CP_ACP, the default system code page. In that case, you just need to make sure that the same code page is used in both the VB and the VC++ components....
This post brought to you by 测 (U+6d4b, a CJK Ideograph)