Blog - Title

June, 2007

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Sometimes if the two wrongs are consistent, they may make a right!

    • 2 Comments

    John Caffrey, one of the awesome SDETs we have over at the EDC (European Development Center) over in Dublin, sent mail the other day reporting a bug in the CultureAndRegionInfoBuilder class:

    Hi guys,
    We’ve noticed that the CalendarWeekRule setting being saved out to LDML by the CARIB is incorrect. i.e. here’s some code which repros the issue :-

               CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder("en-IE", CultureAndRegionModifiers.Replacement);

                carib.GregorianDateTimeFormat.CalendarWeekRule = CalendarWeekRule.FirstFourDayWeek;
                carib.Save("Test FirstFourDayWeek.ldml");
                // Bug: Saved LDML file contains <msLocale:weekRule type="firstFullWeek" />

                carib.GregorianDateTimeFormat.CalendarWeekRule = CalendarWeekRule.FirstFullWeek;
                carib.Save("Test FirstFullWeek.ldml");
                // Bug: Saved LDML file contains <msLocale:weekRule type="firstFourDayWeek" />

    Just wondering if this is a known issue? 

    Well, it wasn't as known issue, though it certainly is now that it has been reported!

    As it turns out, this is only  bug in the LDML creation code, not in the code of the piece of CultureAndRegionInfoBuilder that creates custom cultures. Though it appears to be "properly" misinterpreted in both read and write operations (i.e. in both CultureAndRegionInfoBuilder.Save and CultureAndRegionInfoBuilder.CreateFromLdml), which is the most likely reason that no one noticed. :-)

    It does lead to an interesting question as to whether it should be fixed or not -- since fixing it would break all prior versions of CultureAndRegionInfoBuilder for this member as well as making LDML files written by prior versions not work as well in versions that are fixed.

    Maybe the answer now is to leave it alone, perhaps adding a comment to the XML to explain the "mistake" in the file so no one fixes it? :-)

    A good catch, though -- this could be really confusing for people looking at the LDML directly. Kudos to the EDC, as usual!

     

    This post brought to you by § (U+00a7, a.k.a. SECTION SIGN)

  • Sorting it all Out

    Beauty is in [the] Bones?

    • 9 Comments

    I was watching a Bones rerun the other night (The Truth in the Lye) and one of those funny language moments came up:

    Angela Montenegro: What you thought were teeth marks, Dr. Saroyan, turned out to be Chinese characters engraved along the side.
    Jack Hodgins: What do they say?
    Angela Montenegro: They say "what make foolish man think I speak Chinese?"
    Jack Hodgins: I thought you were half Chinese!
    Angela Montenegro: And I thnk you're half Swedish. Let's hear some Swedish!
    Camille Saroyan: What is it Angela, please.
    Angela Montenegro: Its a chopstick, only its not the kind you eat with.
    Jack Hodgins: Like there is another kind?
    Angela Montenegro: Well you wouldn't comb your hair with a fork, would you?
    Jack Hodgins: My hair?
    Angela Montenegro: Alright look the one character I was able to translate, off the internet, is the word beauty, then I realize that its meant for hair
    Camille Saroyan: Where you twist it in a bun and stick it in to hold it in place.

    The things we take for granted without realizing it -- I can't even count the number if times that I have asked people at work if they knew what some text said. I guess I've been lucky so far (though I supoose being the daughter of Billy F. Gibbons of ZZ Top (as Angela is the series) might have been an interesting conversation point, too.... :-)

    Here is the chopstick in question:

     

    I even knew which character Angela found, though I have no idea how or where I learned that particular ideograph for beauty....

     

    This post brought to you by (U+7f8e, a CJK Ideograph)

  • Sorting it all Out

    DST 2007 -- too big to judge

    • 4 Comments

    Based on The Dark Knight Returns, with apologies to Frank Miller (and to the people who really were working hard during the recent DST crisis):

    siao fan: you stand for everything i believe in, michael. i've always wanted to be the kind of guy you are (except maybe socially). i can't understand how you can think DST thing was a good thing to have happened.

    michael kaplan: you'd just think i'm senile.
    i'm sure you've heard old fossils like me talk about pearl harbor.
    *koff* excuse me.
    fact is, we mostly lie about it. we make it sound like we all leaped to our feet and went after the axis on the spot.
    hell, we were scared. rumors were flying, we thought the japanese had taken california. we didn't even have an army. so there we were, lying in bed pulling the sheets over our heads--
    --and there was roosevelt, on the radio, strong and sure, taking fear and turning it into a fighting spirit.
    almost overnight we had our army.
    we won the war.
    since then, presidents have come and gone, each one seeming smaller, weaker ... the best of them like faint echoes of roosevelt ...
    jesus, i'm talking too much.

    siao fan: go on ...

    michael kaplan: a few years back, i was reading a news magazine -- a lot of people with a lot of evidence said that roosevelt knew pearl was going to be attacked --
    --and that he let it happen.
    wasn't proven. things like that never are. i couldn't stop thinking how horrible that would be ...
    ... and how pearl was what got us off our duffs in time to stop the axis.
    but a lot of innocent men died.
    but we won the war.
    it bounced back and forth in my head until i realized i couldn't judge it. it was too big.
    he was too big ...

    siao fan: i don't see what this has to do with the DST thing being good.

    michael kaplan: maybe you will.

    A few days ago I was in the audience at an interesting panel discussion moderated by M3 Sweatt, the Microsoft Director who among other things championed the DST efforts throughout Microsoft (ref: his DST-tagged blog posts!).

    The panel was really an interesting opportunity, for several reasons.

    For one thing, I got to visit with Beth Scott again (I think I last saw her like 6-7 years ago when I almost took a job in her org; my how time flies...).

    Oh, I also got to talk to M3 for a bit, too.

    But the panel itself was also interesting, as it really highlighted a lot of the lessons learned by groups throughout Microsoft (in Outlook, in WINSE, in Exchange, in .NET, in CRM, and so on).

    Many of the stories were entertaining, though appropriately self-deprecating, and people clearly felt quite humbled by the experience.

    One thing I am sure of -- no matter how many barrels of oil it may have saved us, it cost the country a lot more in the IT resources spent.

    I was disappointed by one thing, though -- the fact that no one pointed out one of the biggest lessons, in my mind -- that DST2007 has mirror versions of itself pretty much every year for as long as there have been computers with clocks and calendars supporting DST -- both inside and outside of the US.

    The entire IT industry, including Microsoft, was hit hard by a problem they have had early warnings about, for decades.

    And they never paid attention.

    And yet, it was DST 2007 that got everyone off their duffs to finally fix this bug.

    But it cost the induistry a ton of money.

    But it mobilized huge resources to finally fix problems inflicted on the entire world for many years.

    But a lot of good people lost time. And money. And sleep.

    But we are addressing the real issues now.

    It's too big.

    DST is too big.

     

    This post brought to you by © (U+00a9, a.k.a. COPYRIGHT SIGN)

  • Sorting it all Out

    Planning a party for between 26 and 388 of your colleagues

    • 0 Comments

    One last heads-up to Microsoft full-time employees, everyone else may as well ignore....

    Going into this presentation going on Friday at 4pm in 33/Kodiak at the 2007 Engineering Excellence & Trustworthy Computing Forum (the one I mentioned before), I don't have the first clue how many people are going to be there.

    I mean, on the one hand it is the last talk on the last day.

    And on the other hand there is a 50% chance of rain all day that lowers to 40% after the talk, which makes the leaving early a bit less likely.

    And on the other other hand I really am only 100% positive about 85% of the 31 people who have specifically mentioned they would be there....

    And on the other other other hand the 225 people who were signed up as of Monday are people who signed up in a system that allows one to double/triple/quadruple book without mentioning to you that you are violating natural laws and that encourages people to book the whole week even if they aren't going to be there the entire time.

    And on the other other other other hand there is the fact that there are seven other talks going on at the same time, five of which do not require going outside into the 50% chance of rain.

    And on the other other other other other hand is the fact that I immediately follow one of Mark Russinovich's talks that saw 388 people sign up and therefore I may see some of the same gift as is seen by the TV shows that air after the Super Bowl. The gift of inertia.

    I have run out of hands at this point, and I figure somewhere between 26 and 388 people will show up. I'm glad I don't have to cater it!

    No worries, I'll do my best either way. :-)

    Anyway, I must run now. I decided to update two of my examples, replacing them with other ones that came this week....

     

    This post brought to you by ® (U+00ae, a.k.a. REGISTERED SIGN)

  • Sorting it all Out

    Tell yourself 10 times that you don't own that anymore

    • 0 Comments

    One of the issues that tends to pop up after something like Track change (a.k.a. A new job that has a few things in common with the old one) happens, when all of the following is true:

    • You still work for the same company
    • You are still in the same building (albeit a different hallway, finally!)
    • You still have some passion for your old areas
    • You continue to care some extent about knowing that The Right Thing™ is being done in those areas
    • Everyone knows you have lots of opinions
    • You tend to help people out, all things being equal
    • Lots of people very new to your old areas are trying to ramp up

    is that you get put on a lot of planning mails, invited to a lot of meetings, and asked a lot of questions by people.

    You have to decide if you will (to avoid perhaps unintentional invitations to such activities) choose to take yourself off the old dev team alias (keeping in mind that your lead never took himself off of it even though he hasn't worked in the area in at least half a decade and one your current team members who left the old team at the same time as you hasn't either).

    And you have to decide how much of your time will be spent above and beyond the usual transition assistance -- and how best to start cutting the cord on some of the interactions that feel more like people are somehow internally thinking you are still on the team.

    Some of those people may even read your blog and consider posts like this one to be really chicken-shit, passive/aggressive way to encourage them to not be so dependent on you. But they aren't -- they are just you struggling with how to leave the other two thirds of your old job behind so that you can have every opportunity to succeed at your new job.

    Then you notice that after a casual glance at current specs doesn't show anything in a random area you probably would have pushed for were you still there, and you realize that if you take the time to ask about it then there is no way to not look to reasonable people like a hypocrite who wants to stay to the same as they go, anytime something interesting comes up.

    In the meantime, some of the engagements you are doing with other teams involve your old expertise, which is honestly not all that old. and it is not like you planned to rename your blog just yet, or stop answering questions in newsgroups or fora or aliases. Especially hen the next engagement you work on may start from one of those random sources.

    So you breathe, you count back from ten in Greek, stop worrying about the one random question/invite/whatever that got you feeling all introspective and you tell yourself 10 times that you don't own that anymore.

    And you feel better again until the next time. :-)

     

    This post brought to you by (U+2122, a.k.a. TRADE MARK SIGN)

  • Sorting it all Out

    Behold the [non-fa-IR default] PersianCalendar class

    • 4 Comments

    From the MSDN Feedback Center....

    The title is simple enough: Jalaali Calendar should be default calendar for fa-IR culture.

    Well, maybe.

    I mean, it makes sense to me.

    And the site is tracking it as a top issue, I guess since people agree with the idea, and all....

    (This a calendar I have discussed before, in this post)

    But there is a problem here....

    The fa-IR locale exists in Windows, and the Jalaali Calendar does not, at this time.

    (Once it is added and this update is made, we can look forward to this other issue coming up again, for a new culture and calendar!)

    But for now, the data always starts from Windows, so until/unless the calendar is added to Windows (including a constant in winnls.h and so on!), it can't be the default in the culture....

     

    This post brought to you by پ (U+067e, a.k.a. ARABIC LETTER PEH)

  • Sorting it all Out

    The MB_PRECOMPOSED flag is stupid, and the MB_COMPOSITE ain't no genius either

    • 1 Comments

    The other day when I suggested that if Your VC++ files don't support Unicode identifiers? Drop a BOM on them!, John Bates commented:

    RC.exe can properly compile .rc files saved as UTF-16LE (strangely not UTF8-with-BOM though)...

     The reason UTF-8 is not supported here is not due to any brilliant technical issue, though.

    Basically, with the exception of UTF-16, code page support is via a simple command line switch, as described in Using RC (The RC Command Line):

    /c

         Defines a code page used by NLS conversion.

    Take this doc at its word -- this switch literally tells the Resource Compiler what code page value to feed to a MultiByteToWideChar call.

    (FYI, this is also why you cannot pass 1200 or 1201 for UTF-16 LE/BE -- because MultiByteToWideChar does not support these code pages!)

    A call which, unfortunately, is done with the MB_PRECOMPOSED flag. The flag that briefly came up in my post A few of the gotchas of MultiByteToWideChar.

    I say unfortunately because of that note in the MultiByteToWideChar topic:

    For the code pages listed below, dwFlags must be set to 0. Otherwise, the function fails with ERROR_INVALID_FLAGS.

    50220       50227       57002 through 57011
    50221       50229       65000 (UTF-7)
    50222       52936       42 (Symbol)
    50225       54936

    Note: For UTF-8, dwFlags must be set to either 0 or MB_ERR_INVALID_CHARS. Otherwise, the function fails with ERROR_INVALID_FLAGS.

    Aha, so UTF-8 is failing in the Resource Compiler because it is always including a flag that is documented as not working with UTF-8 (or a bunch of other code pages).

    Now as the title of this post indicates, the MB_PRECOMPOSED flag is stupid.

    To do their work, MB_PRECOMPOSED and MB_COMPOSITE actually use the lame tables that FoldString used to suppose MAP_PRECOMPOSED and MAP_COMPOSITE prior to Vista (when it started using normalization). I cal;l these tables lame since they are incomplete. But no one wanted to slow down MultiByteToWideChar by making it normalize text, and no one wanted to update this lame set of tables, so everything was left as is. They are just dumb flags to use, ever. You should just normalize if you want to get into Form C or Form D, and call it a day.

    Anyway, I am sure that the UTF-8 issue in the Resource Compiler will see itself fixed in some upcoming version, given how easy it is to either (a) never pass a stupid flag or at least to do the minimal change and (b) never pass a stupid flag if it makes the function fail on a code page you do not want it to.

    (In an ideal world it would also recognize the UTF-8 BOM just like it recognizes the UTF-16 BOM, but again the whole minimal change thing would probably have a lot of influence here!)

     

    This post is sponsored by U+feff (ZERO WIDTH NO-BREAK SPACE)

  • Sorting it all Out

    Sometimes you drop the BOM, and sometimes the BOM drops you!

    • 5 Comments

    Back when I posted Your VC++ files don't support Unicode identifers? Drop a BOM on them!, I promised I'd say more about how Microsoft's Visual C++ was deciding what parts of Unicode could be used as valid identifiers.

    And if you'll recall, I talked about how

    ...one should not assume that the full support of Unicode Standard Annex #31 (Identifier and Pattern Syntax) is being implemented, but hopefully some not entirely incompatible subset is what would turn out to be available....

    Well, consider my hopes dashed at this point....

    I'll include the code comment above the validation function, though it is my heartfelt desire to see this code ripped out with extreme prejudice in some future version. :-)

    //
    // Validate Unicode Identifier
    //
    // Every symbol of an identifer can belong to one of the following two
    // disjoint sets of characters:
    //
    // 1. Characters from the basic character set. We don't have to enforce it here,
    // and we also know that such characters cannot be entered as UCNs;
    //
    // 2. Characters greater than 0x80 encoded either directly or as UCNs. The exact ranges
    // are specified by Annex E in C++ Standard or by a superset of it in clr. It is not
    // implemented yet, but most likely we will decide to go with the clr range.
    //

    Of course currently the code does not do anything in #2 (as the comment indicates); instead for #2 it just relies heavily on the CRT iswspace function, which of course relies heavily on the NLS GetStringTypeW function looking for those C1_SPACE characters (which of course means that the compilation behavior can be OS-dependent as new Unicode versions are supported.

    It also means there are a lot of very silly characters that could be chosen to be identifiers right now (including lots of undefined ones and lots of weird symbols).

    I mean, imagine code something like this:

    if (≤ < ≲) {
        ≥();
    }

    and so on....

    Like I said, future versions should really be a lot more reasonable in this regard. And they will if I have any say in the matter at all.

    So please don't get to enjoying this too much....

     

    This post brought to you by (U+2272, a.k.a. LESS-THAN OR EQUIVALENT TO)

  • Sorting it all Out

    Sometimes when you say 'the fix is in' you mean it in a good way

    • 10 Comments

    It was just a few months ago that that I posted about a particularly bad bug in Vista in Double Secret ANSI, part 2 (the brokenest one yet, sorry 'bout that!).

    And of course our favorite Romanian Cristian Secară has talking about this bug in the newsgroups recently, going so far as to suggest that people may want avoid upgrading:

    This is not always a pure appication issue. For example most television subtitling formats are 8 bit. This is likely to be changed only in the broadcast industry if/when/with the introduction of the new XML-based EBU subtitling format, but even this is doubtful to be properly implemented because somewhere in the specifications they require backward conversion possibility to the old format (which actually is a misery, being based on ISO/IEC 6937).

    Because practically all subtitling formats are 8 bit (.sub, .ssa, .srt, etc.), applications don't feel the need to switch to Unicode just for themselves. Western countries don't use subtitling, so there is no push from the software industry here in the Unicode direction.

    So we must preserve some 2000/XP systems in the long term just to have proper subtitling on the screens. It is good to know ...

    Thankfully, things are not quite this dire. :-)

    And please note that it has nothing to do with bug, whatsoever, since it only impacts keyboard input in these applications!

    The fix for the actual problem is going into Vista client SP1 (and is in Server 2008 as well). And for those who really don't want to wait (I do not know the exact dates myself), it is available as as hotfix now.

    The article will be KB 936060 (the article is not yet public) , but if you are running to this problem of some keyboard languages not working correctly in non-Unicode applications, you can talk to someone in Microsoft product support and get this hotfix (they can find it via the KB article in their tools) and since it is a bug you shouldn't be charged for the contact....

    Cristi raises and excellent point about the non-Unicodeness of specific components of applications that may themselves be in Unicode. I will need to talk about that issue a bit at some point. :-)

    But for now, if you need a fix to this problem, there is a way to get it. And soon everyone running Vista will be getting it.

    The fix is in, truly....

     

    This post brought to you by ț (U+021b, a.k.a. LATIN SMALL LETTER T WITH COMMA BELOW)

  • Sorting it all Out

    Marshaling your resistance

    • 2 Comments

    I was recently involved in one of those interesting email discussions where I had the opportunity to both help and hinder my reputation.

    And at the same time the conversation got to make something in the design of a programming language look like it both a bug and an inconsistent (or maybe I should say impure) design.

    (It also hurt at least a little bit the reputation of a developer who gave me advice about the same issue months before, but their advice was not attributed so it only hurt their reputation with me, in this case!).

    Anyway, let me explain what I am talking about.

    There are some functions that accept or return buffers that are arrays of strings, and they are laid out like the following:

    string1\0string2\0string3\0string4\0\0

    This is the pattern you see in functions like GetProfileSection and WritePrivateProfileSection. It is not the most common kind of design pattern, but if you are using C/C++ it might be mildly convenient (though from C or C++ a va_list might be a bit easier to use).

    Now one problem that this pattern has is that it is much harder to use from languages like VB or C# (though truth be told a va_list is much harder to use in that case so it is probably not so bad that this harder to use but still in theory available method exists!).

    Anyway, some other functions that uses this pattern are SetThreadPreferredUILanguages and GetThreadPreferredUILanguages. I covered a way to call them from C# in the post Thinking about MUI is making me bipolar, and one of the people who had an earlier look at the post suggested that instead of using the following pinvoke declaration:

            [DllImport("kernel32.dll", CharSet=CharSet.Unicode, ExactSpelling=true, CallingConvention=CallingConvention.StdCall, SetLastError=true)]
            static extern bool GetThreadPreferredUILanguages(
              uint dwFlags, ref int pulNumLanguages, [MarshalAs(UnmanagedType.LPWStr)] string pwszLanguagesBuffer, ref int pcchLanguagesBuffer);

    that I might want to use StringBuilder instead (then converting the final results to a string using the StringBuilder.ToString(int,int) overload. But I thought about it and realized that this would perhaps allocate the string twice, and the code I had already worked (and I was much more worried about the actual bugs -- convincing the MUI folks that bugs existed -- by that point) so I let it go.

    Anyway, a few months later someone else was calling a similar function and they were having trouble. Suggestions ranged from using byte arrays and converting to strings via the Encoding class to using Marshal.PtrToStringUni, and I figured I'd trot out that blog entry with the [MarshalAs(UnmanagedType.LPWStr)] code that works and then also suggest the StringBuilder.ToString(int,int) solution for people bothered by the voodoo of treating a System.String as a buffer for a function,  violating the whole immutability thing.

    After not too long someone turned around and proved that the StringBuilder method did not work -- that the marshaller for the StringBuilder appeared to be truncating after the first NULL long before the ToString() method was even involved. Insane!

    Okay, so I was wrong about that. It isn't like I actually tried that code; I just had a random developer wonder why I chose the method I did when there was a more conventional one available.

    At least I had that answer that worked to fall back on. :-)

    Not to mention the fact the actual StringBuilder behavior here is hard to argue as anything other than a bug fat bug that probably deserves consideration of a fix on its merits....

    Even if the method is not "pure" and would not exist in a perfect NGWS/.Net world, I could make a strong case that the truncation at NULL bug in StringBuilder wouldn't exist in a perfect world, either (and both the Encoding class and Marshal.PtrToStringUni solutions involved extra allocations and potential conversion bugs, which would also be pretty impure....

     

    This post brought to you by Ǥ (U+01e4, a.k.a. LATIN CAPITAL LETTER G WITH STROKE)

  • Sorting it all Out

    Same time, bigger station

    • 1 Comments

    Just a heads-up to Microsoft full-time employees, everyone else may as well ignore.... 

    You may have read about the presentation I am doing at the 2007 Engineering Excellence & Trustworthy Computing Forum next week in this post.

    They just moved me to a bigger room (I was in 34/Quinalt and now I'm in 33/Kodiak), so I guess I got some people interested. I'll work hard to fight the good weather that day (well, at the moment there is a 30% chance of rain so between that and the fact that I am in the Conference Center means that I am liking my odds here).

    So remember, last day (June 29th), last talk (4pm).

    No holds barred, protected by NDA.

    No heads are expected to explode, but the ears of some components may burn...

     

    This post brought to you by (U+29dd, a.k.a. TIE OVER INFINITY)

  • Sorting it all Out

    Your VC++ files don't support Unicode identifers? Drop a BOM on them!

    • 15 Comments

    What with Unicode being the default way that things are compiled in Visual Studio, the fact that identifiers were limited to ANSI has really been a sore point for a lot of people.

    Including me. :-)

    Thankfully, VC++ Architect Mark Hall helped set me straight on a not-entirely-well documented feature in the latest version of Visual C++.

    If you don't like the lack of Unicode identifiers, then all you have to do is drop a BOM on your UTF-8 source!

    You can read more about it in the topic Unicode Support in the Compiler and Linker.

    And you do need the BOM for UTF-8 in this case (no matter how controversial that requirement may be). As one of the cool test leads over there pointed out:

    ...we don’t support UTF8 w/o BOM, since it’s pretty much indistinguishable from ANSI

    Note that you do not need a BOM for UTF-16LE or UTF16-BE.

    In theory the BOM probably shouldn't be needed for UTF-8 given how easy UTF-8 detection is, but since NLS didn't provide a function I can hardly blame them for not wanting to go off and write their own (note that it *should* in my opinion be a part of IsTextUnicode, as I have pointed out before. But I don't know when (or even if) that might be happening.

    Maybe I'll just post a little IsTextUtf8 function in the meantime. :-)

    The other problem would be that all of the important docs that refer to things like identifiers are still making ASCII assumptions in 7.1, 8.0, 8.5, and even current (preliminary) 9.0 docs.

    So one should not assume that the full support of Unicode Standard Annex #31 (Identifier and Pattern Syntax) is being implemented, but hopefully some not entirely incompatible subset is what would turn out to be available....

    I'll say more on this soon, both as soon as I find out what the answers are and as soon as I get the doc story to be updated (I assume the former will happen prior to the latter).

     

    This post is sponsored by U+feff (ZERO WIDTH NO-BREAK SPACE, a.k.a. "Da BOM")

  • Sorting it all Out

    Liaison may not be the best role any more (for me)

    • 2 Comments

    I haven't talked about Tamil for a while, and I think that is mainly an exercise of the stages of grief. Not talking about is the denial phase, where I avoid acting like there is an issue by not mentioning it. But the problem is still there, so perhaps it is time to move to the next phase....

    The problem is that I am theoretically the liaison between Unicode and INFITT (INternational Forum for Information Technology in Tamil), in practical terms that role no longer makes as much sense, since the overall participation and management of it has shifted from outside of Tamil Nadu to inside of it, and the general consensus of the people working in the group is to try to re-encode Tamil.

    Since I don't agree with that, I know there are people who feel I am not the best person to argue their concerns. And those people are correct -- I will always do my job as a representative and relay any message I am asked to, but it is hard to synthesize passion when you don't agree with the opinion.

    So it is probably best for me to resign as liaison. I'll stay within the working groups on both sides since I am still interested and knowledgeable, and just drop out of a role that I believe requires advocacy....

    I'll end the post with a bit from Daniels and Bright, near the end of Section 39 (Tamil Writing):

    Occasional proposals to change the individual symbols to purely alphabetic characters, by using vowel-initial allographs for all vowels, with consonant + puḷḷi for all consonants, have not been taken seriously; and they probably never will be, since the existing system represents Tamil syllables very well.

    (Note that this is from 1996, before the first proposal to Unicode for such a change was ever received.)

    Many people in Tamil Nadu do not feel this way, of course. And I suspect it will be many years before this is fully accepted.

    To be honest, the whole thing reminds me of analogous issues with Korean where people in government and scholars disagree (and sometimes scholars and other scholars disagree), leaving people in the middle with only the hope of software helping them represent text in the meantime....

    (I guess I have moved into acceptance at this point, or maybe it is just bargaining!)

     

    This post brought to you by (U+0b83, a.k.a. TAMIL SIGN VISARGA, a.k.a. aythem, a.k.a. āytam)

  • Sorting it all Out

    TTC indexes, the hard way...

    • 2 Comments

    I swear that I had already answered this question, but I couldn't find it anywhere. Hopefully it isn't a repeat. :-)

    Meema asked in the Suggestion Box:

    hi!

    I have been looking for a solution that doesn't require parsing the actual TTC file bytes. Is there any way of getting the ttc index of a font (given the facename and style or just hDC/LOGFONT)?

    Thanks. I appreciate any answer.

    Meema

    Unfortunately, there is not a way to do this.

    You have to work with the description of the OpenType file format (here) and its information about the TTC Header table, which will have pointers to the individual fonts inside the file (and from there you can get the names of each font and eventually work backward based on the index).

    This is obviously not as entirely trivial as one might like (using the best metric there is, the amount of samples that are out there, which is approximately none!).

    So I talked to Sergey about the possibility of putting together some examples for working with font files for problems like this -- real problems if you want to work with functions like TTEmbedFontFromFileA and CreatFontPackage that just assume you have this data. And lots of other random tidbits that one might want to get at....

    And I might also play a bit more in this space, like I did in Getting all of the localized names of a font. Because going in blind to specs like the OTF/TTF one to write a bit of code actually make for interesting little projects to work on.... :-)

     

    This post brought to you by (U+0d36, a.k.a. MALAYALAM LETTER SHA)

  • Sorting it all Out

    Oops! Somebody did broken it!

    • 3 Comments

    Some times, things get broken and nobody notices.

    Other times, things have been broken for years and nobody noticed that, either!

    As an example, back at the end of May, Arunaabh asked over in the microsoft.public.win32.programmer.international newsgroup:

    In my application I am converting a string in one charset to other charset. I am using GetCPInfoEx() API to fill the CPINFOEX structure corresponding to the codepage number I am providing. I am using the following lines of code :

    ...
    BOOL bNLS_Exist=GetCPInfoEx(nCodePage,0,&cpInfoEx);
    DWORD err=GetLastError();
    ...

    For nCodePage = 1147 which corresponds to charset 'ibm-1147' , GetCPInfoEx() returns me false even if it fills the CPINFOEX structure properly . The GetLastError() also returns zero. The corresponding nls file C_1147.nls is also present in the registry.

    For nCodePage = 20949 which corresponds to charset 'X_CP20949' , GetCPInfoEx() returns me false even if it fills the CPINFOEX structure properly .

    In this case GetLastError() returns with a value of '1814' which is "The specified resource name cannot be found in the image file" . The corresponding nls file C_20949.nls is also present in the registry.

    Please help me out for the abrupt behaviour of GetCPInfoEx() API for the above two cases.

    What can I say, Arunaabh is right.

    Code page 20949 appears to have been failing for years (I could not get it to work on XP or Server 2003 or Vista. The GetCPInfo() pieces all work, but the string is not there (which actually explains the 1814 error, since the resource couldn't be found.

    Now code page 1147 appears to be a more recent break -- it fails on Vista just as Arunaabh described, but on prior versions it used to work.

    Someone should probably get a bug or two in to add a string and to figure out why the function is failing to show a string that is there that it used to load....

     

    This post brought to you by (U+0dc5, a.k.a. SINHALA LETTER MUURDHAJA LAYANNA)

Page 1 of 4 (59 items) 1234