Blog - Title

April, 2007

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Which one has the astigmatism?

    • 7 Comments

    We've been talking about DPI a whole bunch, including not disabling and disabling the high DPI support in Vista. These two screen shots taken on a machine with 192 DPI will not show the final word (that comes in another post, soon!), but they will provide the penultimate word....

    Which one has the astigmatism? :-)

    I can't be the only person who is tired of that ACUVUE commercial....

     

    This post brought to you by (U+25c9, a.k.a. FISHEYE)

  • Sorting it all Out

    If you find that GetLocaleInfo is driving you crazy, it may not be the right function to use

    • 14 Comments

    Aaron asks (via the Contact link):

    I apologize for this totally unsolicited email, but I'm starting to wonder if I'm crazy or not.  I'm using GetLocaleInfo to determine what language the user wants their UI strings displayed in.  I'm using LOCALE_SISO639LANGNAME and LOCALE_SISO3166CTRYNAME to get a string such as en-US. 

    The part where I'm confused is where GetLocaleInfo is grabbing that information from.  In the XP regions and languages control panel, there's a strangely worded option in the "Advanced" tab called "Language for non-Unicode programs."  When I set that to something like "French (france)", I still get en-US instead of fr-FR.

    The reason I'm doing this is because we have a custom localization scheme for our application, which runs on Windows 98 through Vista.  So we are not technically a Unicode application (we don't #define UNICODE), but we support Unicode in that we dynamically load the W version of every API we come across and prefer that to the A version (and our strings are encoded accordingly).  So does that option in the Advanced tab even apply to our application?  If it does, how would I get that information?

    How far gone is my misunderstanding of things?  ;-) 

    Thanks for your time!

    GetLocaleInfo does not grab its information from any setting in Regional Options. It grabs its info from its own internal database of information, based on the locale you pass it.

    The "Language for non-Unicode Programs" is also known as the "Default System Locale" and really if not a good setting upon which to base a localization strategy, for way too many reasons to enumerate fully (but the fact that the intent is to provide the locale to use for conversions between Unicode and "ANSI" ought to be reason enough on its own).

    If you really wanted to get the information from this setting via GetLocaleInfo, you could just pass LOCALE_SYSTEM_DEFAULT as the LCID. But like I said, you do not want to use that setting (you claimed to want to support Windows 98, Aaron -- this setting is not changeable in Windows 98).

    Now you could in theory use the "Standards and Formats" setting, also known a the default user locale (you would use LOCALE_USER_DEFAULT in that GetLocaleInfo call). It has the advantage of being settable on all platforms, if nothing else.

    Clearly though, it is not intended to drive the UI language, and thus if you made it drive UI language in your application you would be providing confusing UI to the user.

    But if you think about that locale list for a moment, the odds that it will match your list of UI languages for your application are probably pretty close to nil. So you do not miss much by not using that setting.

    Now I could claim that you should use the results of the user interface language functions provided by MUI, but to be honest even though it has the advantage of being an accurate setting, it really isn't likely to be able to match your UI languages of your application either.

    (Look on the right side of the page and expand the one that says Regional Options for more information on what each setting there is generally for.)

    And also, not every version of Windows supports the two-letter ISO codes (and LOCALE_SISO639LANGNAME and LOCALE_SISO639CTRYNAME are often the three letter codes from which you cannot deterministically derive the two letter codes), not to mention the fact that from time to time some of them have been wrong. So using Win32 NLS API functions to call at runtime to get the language tags to use on any version of Windows from Win98 to Vista just seems like a bad idea.

    If you are providing a localized copy of your application then you can default to the UI language of the operating system and then you should honestly provide your own user interface to let them change it, based on the list of localized versions of your application that you support. The various lists that Windows provides aren't actually good ways to choose your UI language (beyond that possible idea of the initial one you might choose via GetUserDefaultUILanguage when it is available -- that function is included in almost every version you need other than Win98; it even is there on WinME).

    Thus my guess in the title of this post, Aaron -- the reason GetLocaleInfo is driving you crazy is that it is really not the function your application should be using here. It is driving you crazy for the same reason that a pair of pliers would drive you crazy for fixing a hangnail....

     

    This post brought to you by(U+10ef, a.k.a. GEORGIAN LETTER JHAN)

  • Sorting it all Out

    Rhymes with Amharic #5 (a.k.a. [Sub]setting up this code where it can do the most good?)

    • 4 Comments

    I may never be entirely used to working for Microsoft, as opposed to working with Microsoft products....

    One of the things Scott Hanselman suggested yesterday in response to this whole series about font embedding in a managed application (prior posts here, here, here, and here) in bold red so that it would not be missed, in this post of his:

    I hope that folks tell Michael and Microsoft that this is a significant business scenario and encourage them to advance Michael's Sample Code into a full-fledge and supported feature in WinForms.

    This idea had never occurred to me!

    No need to convince me about the scenario, though. I recognized the number of times that this would be useful and it is one of the reasons I put the sample together (it is hard to pass up those "real world" requirements when they come up. 

    I had been thinking about perhaps getting some samples integrated into the documentation eventually (after they were cleand up a bit, of course!), but the idea of adding support for embedded fonts into WinForms directly to get proper support within managed applications is a really good idea for people to be thinking about, I think.

    Interestingly enough, I was sitting in a meeting on Friday where some people across the street working on .NET asked me for some product feedback ideas for next version telling me that this was a good time to bring up actionable ideas. They were BCL people, to be sure. But ideas are still ideas, right?

    My timing is never this good!

    Next step, talk to some people on the WinForms team.... :-)

     

    This post brought to you by (U+12eb, a.k.a. ETHIOPIC SYLLABLE YAA)

  • Sorting it all Out

    Rhymes with Amharic #4 (a.k.a. we're all [sub]set so turning out the lights and going to [em]bed!)

    • 14 Comments

    (see also parts 1, 2, and 3

    OK, we are getting close to the end of this little mini-series....

    First there was a comment from Dennis E. Hamilton asking about the DPI in the screen shots of that first post:

    I notice two things here.  First, the impact of Cleartype is amazing.  Secondly, the Vista rendering seems fuzzy somehow and not as crisp as the XP SP2 Cleartype.  I realize the sizes are different, with different assumed resolutions, but the subjective experience at scale is important.  (I think I see this on my Vista-equipped Tablet PC too, so I really wonder ... )

    If you use the same DPI, how well does Vista match the XP Cleartype case?

    I decided to engage in a bit of experimental DPI viewing. I used the funniest string from Why that is positively Ethiopic! (፳፩፼፳፰፻፷፯፼፶፫፻፱) and took screen shots at 96, 120, 134, and 144 DPI (note that I did not change the sample application; I just pasted the string into the TextBox controls):

    I'll let you decide on your own about the quality (the code was unchanged so it was trying to use a 32pt size for the font in all four cases).... :-)

    Now the additional issues to keep in mind here for font embedding....

    First we'll take a look at Nyala's OpenType support from 10,000 feet:

    Notice that it does have some OpenType tables that provide support for Ethiopic, though the main reason to consider Ethiopic to be a complex script is for that undocumented sixth reason to be considered s complex script that I described in Font Linking vs. Font Fallback, #2.

    The technology provides the selected [subset of the] font and allows you to embed it in your application if the font's licensing restrictions allow it. But it does not give you updates to shaping engines and it does not give you pieces of the font that are excluded by subsetting decisions you might make. In the end, this means that any time the language you are trying to display is a complex script, the proper display might be limited by what the machine itself can support (for an example of this imagine some of the complex scripts added in Vista like Khmer or Sinhalese or Tibetan and try to imagine displaying them in Windows 2000!).

    XP SP2 will actually do very well here, much better than one might expect at first. But it turns out that the update to the Uniscribe shaping engines provided by the update I first described in Lions and tigers and bearsELKs, Oh my! included some of the (not completely finished but certainly in progress) updates that eventually made their way into Vista. So you may find you have better luck in XP SP2 then in most other downlevel platforms. But n matter what you will always have the constraint of the platform's support to contend with.

    This also applies to keyboards (you'll have to provide them via MSKLC or whatever), locale support (custom cultures, anyone?), or collation (currently no solution for this one, sorry -- so beware this problem!).

    You will probably always want to be using .NET Framework >= 2.0 so that you can use Uniscribe and not be limited to what GDI+ supports.

    And then when you are done, be sure to call TTDeleteEmbeddedFont as the sample does in its FormClosing event. And in your real world samples you probably should embed the font in your application's resources and then just use a MemoryStream rather than a FileStream to read it (though even is you treat it as a file like the sample does, there is not really anything else that you can do with the file anyway....

    As a final note, let me once again remind people to follow the licensing restrictions of the font you want to use. Your font foundry will thank you for attention to your attention to this particular detail!

     

    This post brought to you by(U+1335, a.k.a. ETHIOPIC SYLLABLE PHE)

  • Sorting it all Out

    Rhymes with Amharic #3 (a.k.a. Read and write a language w/o even getting out of my [em]bed? Kewl!)

    • 6 Comments

    (see also the first part and the second part)

    We now have that binary chunk that needs to be loaded, so let's go ahead and load it!

    The core bit of the code for this should have been:

    if (File.Exists(FONTNAME)) {
        // We are reading in the embed file info if the file exists (we may have just created it!)
        TTLOAD ulStatusRead = 0;
        FileStream fsRead = new FileStream(FONTNAME, FileMode.Open);
        READEMBEDPROC rep = new READEMBEDPROC(this.ReadEmbedProc);
        TTLOADINFO ttli = new TTLOADINFO();

        ttli.usStructSize = Convert.ToUInt16(Marshal.SizeOf(ttli));
        ttli.usRefStrSize = 0;
        ttli.pusRefStr = IntPtr.Zero;
        ulPrivStatus = 0;

        rc = TTLoadEmbeddedFont(out this.m_hFontReference, TTLOAD.PRIVATE,
                                out ulPrivStatus,
                                LICENSE.EDITABLE, out ulStatusRead,
                                rep, fsRead,
                                "NyalaSIAO", "NyalaSIAO",
                                ttli);
        fsRead.Flush();
        fsRead.Close();

        this.tb1.Font = new Font("NyalaSIAO", siz);
        if (this.tb1.Font.Name != "NyalaSIAO") {
            // We had everything but embedding failed anyway.
            this.lbl1.Text = "Embedding failed, font is: " + this.tb1.Font.Name;
        }
    }

    And of course the ReadEmbedProc (note the similarities and more importantly the differences when comparing to the WriteEmbedProc mentioned earlier, a ripe potential source of copy/paste codewriting errors!)

    [UnmanagedFunctionPointerAttribute(CallingConvention.Cdecl, CharSet=CharSet.Unicode)]
    internal delegate uint READEMBEDPROC(FileStream lpvReadStream, IntPtr lpvBuffer, uint cbBuffer);

    internal uint ReadEmbedProc(FileStream lpvReadStream, IntPtr lpvBuffer, uint cbBuffer)
    {
        byte[] rgbyt = new byte[cbBuffer];
        lpvReadStream.Read(rgbyt, 0, (int)cbBuffer);
        Marshal.Copy(rgbyt, 0, lpvBuffer, (int)cbBuffer);
        return cbBuffer;
    }

    However, in the end it actually proved to be a lot harder than it should have been due to the way that GDI+/WinForms handles the work of fonts, refusing to recognize any font that was not available at application boot time. So even though there is a font available in the process, GDI+ is unwilling to believe it.

    The next thing I tried here was to just create it the old fashioned way and stick it into the device context, but that also failed because after all this is not plain old GDI doing the work here, this is either GDI+ using it's notion of the font to use or the WinForms concept of the font to send to Uniscribe (via TextRenderer).Doesn't anyone respect a device context any more? :-)

    The solution was to add this last bit of code after succeeding in the call to TTLoadEmbeddedFont:

        IntPtr hdc = GetDC(this.tb1.Handle);

        this.m_hFontEmbedded = CreateFont(MulDiv(Convert.ToInt16(siz), GetDeviceCaps(hdc, LOGPIXELSY), 72),
                                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, "NyalaSIAO");
        if (this.m_hFontEmbedded != IntPtr.Zero) {
            uint cb = GetFontData(hdc, 0, 0, IntPtr.Zero, 0);
            if (cb != GDI_ERROR) {
                byte[] rgbyt = new byte[cb];
                GetFontData(hdc, 0, 0, rgbyt, cb);
                this.m_pfc = new PrivateFontCollection();
                IntPtr pbyt = Marshal.AllocCoTaskMem(rgbyt.Length);
                Marshal.Copy(rgbyt, 0, pbyt, rgbyt.Length);
                this.m_pfc.AddMemoryFont(pbyt, rgbyt.Length);
                Marshal.FreeCoTaskMem(pbyt);
                this.tb1.Font = new Font(this.m_pfc.Families[0], siz);
            }
        }
    }

    What this code does is load the font that TTLoadEmbeddedFont has (if you think about it) reconstituted into an actual font and then put it into a memory font via the PrivateFontCollection class, just like what happened in the code from Private fonts: for members only.

    When I tested with subsetted font binaries it worked as well, which means that TTLoadEmbeddedFont really is doing a good job here at making what it puts together look like a font. :-)

    Now some of the things this code is ignoring include the license info that TTLoadEmbeddedFont returns, as well as the return value. And more importantly, right now the code is assuming that the call succeeds and then just trying to use the results, a strategy which is fine in the constrained situation here but if it is expanded to other fonts then you might want to consider altering that strategy since the CreateFont call will succeed even if it does not recognize the font name and you may specifically not like the font it gives instead....

    Next up, some other interesting issues to consider about embedding, and what happens when you are done....

     

    This post brought to you by (U+12ee, a.k.a. ETHIOPIC SYLLABLE YO)

  • Sorting it all Out

    Rhymes with Amharic #2 (a.k.a. Before you embed, you have build something to embed)

    • 10 Comments

    (see here for the first part) 

    The first part of the code centers around a call to TTEmbedFont. It only runs on Vista and above (since no one else should have the font on their machine!):

    IntPtr hDC = CreateDC("DISPLAY", IntPtr.Zero, IntPtr.Zero, IntPtr.Zero);
    if (hDC != IntPtr.Zero) {
        IntPtr hFont = CreateFont(MulDiv(Convert.ToInt16(siz), GetDeviceCaps(hDC, LOGPIXELSY), 72),
                                  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, "Nyala");
        if (hFont != IntPtr.Zero) {
            IntPtr hFontOld = SelectObject(hDC, hFont);
            if (hFontOld != IntPtr.Zero) {
                // We are writing out the embed file info for the font if the file doesn't exist.
                uint ulStatus = 0;
                FileStream fsWrite = new FileStream(FONTNAME, FileMode.CreateNew);
                WRITEEMBEDPROC wep = new WRITEEMBEDPROC(this.WriteEmbedProc);
                TTEMBEDINFO ttie = new TTEMBEDINFO();

                ttie.usStructSize = Convert.ToUInt16(Marshal.SizeOf(ttie));
                ttie.usRootStrSize = 0;
                ttie.pusRootStr = IntPtr.Zero;
                ulPrivStatus = 0;
                ulStatus = 0;
                rc = TTEmbedFont(hDC,
                                 TTEMBED.RAW | TTEMBED.TTCOMPRESSED,
                                 CHARSET.UNICODE,
                                 out ulPrivStatus,
                                 out ulStatus,
                                 wep,
                                 fsWrite,
                                 IntPtr.Zero,
                                 0,
                                 0,
                                 ttie);
                fsWrite.Flush();
                fsWrite.Close();
                if (rc != E.NONE) {
                    // Since creation of the file ultimately failed, delete whatever
                    // interim bits might have been written.
                    File.Delete(FONTNAME);
                }
                SelectObject(hDC, hFontOld);
            }
            DeleteObject(hFont);
        }
        DeleteDC(hDC);
    }

    You'll notice that I am passing the flags to include the raw font and not a subset of it. My initial reason for that was the suggestion from some people that the subsetting would not pick up any of the different forms of glyphs that might be available. But Sergey actually told me that the code is rather generous at including all of the alternate forms and glyphs that could potentially derived from the ones that are specified, so in the situation where the text is static, subsetting the font may be worthwhile (and will certainly make for a smaller file!).

    If one was going to subset, putting all the text in a string and then changing that IntPtr in the pinvoke declare of pusCharCodeSet to a string and then passing it (after all, what else is a string but an array of ushort values?). :-)

    The key piece of code that does this part of  work is that WriteEmbedProc. To be honest, I am not entirely happy with it. You may see why if you look it:

    [UnmanagedFunctionPointerAttribute(CallingConvention.Cdecl, CharSet=CharSet.Unicode)]
    internal delegate uint WRITEEMBEDPROC(FileStream lpvWriteStream, IntPtr lpvBuffer, uint cbBuffer);

    internal uint WriteEmbedProc(FileStream lpvWriteStream, IntPtr lpvBuffer, uint cbBuffer) {
        byte[] rgbyt = new byte[cbBuffer];
        Marshal.Copy(lpvBuffer, rgbyt, 0, (int)cbBuffer);
        lpvWriteStream.Write(rgbyt, 0, (int)cbBuffer);
        return cbBuffer;
    }

    Okay, so because I am using the .NET FileStream class to do the writing, I am forced to do that extra bit of copying into a byte array that I'd rather avoid. You know, just something to write from that lpvBuffer pointer directly to the file. But the actual hit is small (it is a small file, after all!), so I just kind of thought it would be worth earmarking as an area to potentially revisit if performance became a problem. In the meantime, it does get the job done....

    I also chose not to get involved with the whole TTEMBEDINFO structure and its link checking, though people looking at the sample might see it as worthwhile to look into (this is why I bothered to define the struct rather than just making it an IntPtr and passing IntPtr.Zero in this case).

    Anyway, when everything is done you end up with a nice little binary file that can be used in your application that needs to display text that may not be available....

    In the next post I'll talk about the harder bit, which is actually loading that file....

     

    This post brought to you by(U+1275, a.k.a. ETHIOPIC SYLLABLE TE)

  • Sorting it all Out

    Rhymes with Amharic (a.k.a. How about a little breakfast embed, dear?)

    • 19 Comments

    I have a lot of ideas for blog posts that are on my generic "to do" list.

    In fact, any time someone suggests a potential topic these days, I already had the topic on my list of things to cover some day....

    I was looking at my blog summary page a moment ago and I realized that this is going to be blog post #1708.

    Apropos of nothing, you might be thinking. But I'll explain why this was interesting to me.

    You see, I tend to think that there are a few core posts that I do which have a lot more to do with real influencing/assistance, like The jury will give this string no weight or the Converting a project to Unicode series or the Private fonts: for members only post or the Getting all you can out of a keyboard layout series.

    I have been building this one up in my mind for a while now -- in fact, since I first talked about it last year in Font embedding -- the intro: a sample that really shows how font embedding can work. I hadn't gotten to it yet, but it was on the list.

    Then a few days ago Scott Hanselman asked me:

    I’ve seen the Custom Culture stuff, but I’m wondering if anyone’s done a sample (and with what font) showing Amharic on Vista? I’d like to post about it and enable some Ethiopians.

    I had to remind him that we actually added Amharic as a locale to Vista (as I sometimes have to do with Scott!), and it did suggest to him something that really might be important:

    Hm…I’ll try making a WinForms app in Amharic…I’ll let you know. Since Vista [h]as am-ET I guess we don’t need it…although, it’d be nice to talk about how to write a WinForms app that is ONE SOURCE, TWO OS’s. Meaning, it would know what to do on XP vs. Vista. Can we copy the font over to XP?

    That “straddling” sample would be VERY valuable for those languages that were added in Vista.

    Now copying the font file for Nyala is indeed a violation of the EULA, even to another Windows box. But it suddenly occurred to me that this might be the perfect time to provide a font embedding sample!

    After a bunch of work between other work and meetings and email and such (and by the way special thanks to Sergey Malkin and David Brown for their assistance here!).

    Warning: do not violate the license for any font file from Microsoft or any other source. You can use the licensing information in the Font Properties Extension to find out if you are allowed to do it!

    First a few gratuitous screen shots of the sample, on Vista (with higher DPI settings):

    and on Server 2003 (which does not have the font or the locale or anything, and with ClearType turned off):

    and on XP SP2 (again without the font or the locale or anything, and with ClearType turned on):

    Notice how the bottom TextBox control does not show the text on the platforms that do not have the font, while all of them can display the text in that one on top.

    And in fact if you used a custom culture to add am-ET, also known as Amharic (Ethiopia) or አማርኛ (ኢትዮጵያ), one can get even more of the support running on both platforms, just as Scott was hoping for!

    Ok, enough with the build-up, let's jump in....

    You can download the project from here. It basically relies on a few of the font embedding API functions:

    • TTEmbedFont -- given a device context containing a specific font that is legal to embed, creates the compressed binary file that can be embedded;
    • TTLoadEmbeddedFont -- given that compressed file, uncompresses it and turns it into a font that can be used within the process;
    • TTDeleteEmbeddedFont -- removes the embedded font's information when you are done with it.

    The sample was a bit more involved as it had to make use of the PrivateFontCollection class to load the font within WinForms, because the load is only valid within the process but GDI+ does not load any font that is not known to it at the time it has started up. Luckily, by using a technique similar to the one I used in Private fonts: for members only, you can load up the font that is ready to go in GDI/Uniscribe and cause it to be available to your managed application controls as well!

    The logic is:

    • On >= Vista, if the embedded font file has not there, it is created.
    • On all platforms, it tries to use the font, loading it up into a private name so it won't have trouble loading on platforms that contain the font.

    NOTE: The sample download does NOT include the binary file containing the embedded font file. To get that file you have to run the sample on Vista and it will create a ~150kb file named "NyalaSIAO.bin" in the same directory as the EXE. From there you can put the EXE and the .BIN file on the downlevel machine and display Amharic in your application to your heart's content, provided you are just using it in your application.

    In the real world you probably would not set up your application the way I did the sample -- you would probably embed the font as a resource like that font about private fonts did, and you'd likely only create it during development, not on the user's machine later. But it should be enough to get you started....

    I will talk more about the code soon (and the embedding support and what happens with it) in an upcoming post. :-)

    And I'll probably do an unmanaged sample too, at some point. Because I knew even 15 years ago that when someone at Microsoft talks about how easy something is to use, if they provide no samples for it, even after years pass, that we might well be full of crap and that it is hard.

     

    This post brought to you by (U+12a2, a.k.a. ETHIOPIC SYLLABLE GLOTTAL I)

  • Sorting it all Out

    I don't want you to go

    • 5 Comments

    (Absolutely positively nothing technical, whatsoever)

    My grandmother said those words as I stood in the kitchen, about to head out to the car taking me to the airport.

    Of course I still had to go.

    I had been in Ohio a week, probably one of the longest vacations I had taken since I first started working for Microsoft full time.

    Vacations have perhaps gotten less glamorous than they used to be.

    I mean, back in the day it might have been Bangkok or Grand Cayman or Hong Kong or Hawaii or Singapore or Amsterdam or Taipei or Little Cayman or Tokyo.

    Suddenly it was Beachwood.

    And now I am heading back to Redmond on a 757.

    It was just last week that I realized that I have lived in Redmond longer than any other place I have in my life. I guess the short term contract worked out okay in the end....

    But I think back to my grandmother's words again -- I don't want you to go.

    Now maybe it is just the music I am playing at the same time as I am writing this post, and with that in mind you can proably discount everything that follows to some extent.

    But I have heard those words before. And to be blunt the people who said them were people who were important to me.

    Well, ast least more important than the people who have said "I do want you to go" (or less formally "get the hell out"!).

    A few of these people were girlfriends, or lovers. Some of them were very good friends, people I relied on (and vice versa). One of them was just four years old. And don't think for a minute that the last one on the list was the easiest of the bunch.

    Yet each time, at the point where someone was saying the words, I was not going to stay.

    Worse, each time, I think the person saying it knew nothing was going to change just because they said something.

    So what is being expressed, exactly?

    Sadness? Anger? Frustration?

    A general sense of pathos about a universe that would conspire to move two people away from each other?

    Perhaps all of those things. And more.

    Or maybe I am underestimating everyone's intentions.

    It could be just what Kathleen Edwards was thinking about in Old Time Sake, or maybe even what William Thacker was thinking when he answered Anna Scott's request to stay a bit longer with Stay Forever.

    It may just be that in some cases they were actually hoping I would stay. For a day, for a week, for a month, forever.

    Maybe by leaving (the situation, the place) I really was letting someone down, dashing a mad hope that someone who would smuggle a cat into Ankara on the way to a Jethro Tull show for no other reason than he promised he would return the cat to its owner might bend the universe for a moment and delay or dash the plans for no other reason than someone said the words.

    Would it make a difference? Hard to say....

    Now as I re-read this entry that I may just delete rather than posting it, I can recall one time that I did heed the words.

    A time that she said I don't want you to go that I stopped what I was about to do and asked her if she meant it. And when she said she did I changed the plan of ending a relationship and turned back to her so that I could hold her and tell her that I was hers.

    Not that it made much difference, though -- that relationship was over too, eventually. In fact, it might have been easier had I not turned around.

    Maybe I just decided to stop heeding the words. Maybe now I just take them as a very sweet expression of sadness in a world that can't change on the basis of six words, even for a little bit. So I nod and say I wish I didn't have to and I still leave.

    Perhaps I am just a cynic now.

    But I'll tell you a secret, though. I don't believe it.

    Because I said the words to someone once, and that someone is still in my life.

    And they smile around me just often enough that I believe they are happy about it.

    In other words, they didn't go. And in the process of all that staying, they showed a strength of character for which I am grateful, of which I am jealous, and to which I aspire.

    I mean, I believed in life's rich tapestry even before Modern English was singing about it. And I believe in it now.

    If you look Farther Down (apologies to Matthew Sweet!), I am an optimist, no matter how cynical I may seem at times.

    So the next time it happens, maybe I'll be braver. Maybe I will change the itinerary or the plans or the direction in life. Whichever might be appropriate.

    (Unless the person saying it actually read this post, in which case I might have to disqualify the words; readership may have its privileges around here but I have to draw the line somewhere!)

    You may not have any idea what this post is about right now. But maybe some day you will.... :-)

     

    This post brought to you by ˺ (U+02fa, a.k.a. MODIFIER LETTER END HIGH TONE)

  • Sorting it all Out

    The three stages of grief^H^H^H^H^Hcollation

    • 1 Comments

    Just as there are stages to grief, it seems there are stages to support of collation in both product and platform....

    STAGE #1: IGNORANCE (a.k.a. Denial)

    This first stage has one just going and doing as one pleases, adding language sorts and fixing bugs (and occasionally even removing sorts, e.g. Lithuanian Classic!). It is characterized by being a real problem for people trying to use the support for indexing of data where the indexes span versions, especially as real world requirements of collation start breaking those expectations and people in the stage find themselves arguing for retaining the wrong answer even if it makes their product appear wrong in the eyes of customers and partners (or where the laws of God and Unicode are flouted!).

    STAGE #2: THE BACKLASH (a.k.a. Anger/Bargaining)

    The fear created when people realize the consequences that the ignorance of Stage #1 has caused them leads to a sing of the pendulum that is too far in the other direction. Suddenly every version has to be identical. This of course leads to a whole new set of problems like support for collating Hindi running on .NET 1.0/1.1 in Windows 98 even though the fonts are not there, or collation support in Jet or SQL that can be so far out of date that the bugs and limitations on use keep piling up to the ceiling.

    STAGE #3: MATURITY (a.k.a. Acceptance)

    This is the stage where one balances the need for stability with the need for handling the dynamic nature of human language, user expectations, and increased support across versions. Whether one simply accepts re-indexing as a reality or adds versioning schemes so results can be consistently reached even as issues are addressed in the future, there is some plan (or even multiple plans) in place.

    Unfortunately, even as a platform reached stage #3 (where Windows is in the position to be starting with Windows Server 2003), many clients can be running with the assumptions of Stage #2 or even Stage #1. So there is a constant battle to try to bring people to that third stage....

    And that is why Sorting It All Out is here -- to try to bring people through this process intact! :-)

     

    This post brought to you by  (U+2263, a.k.a. STRICTLY EQUIVALENT TO)

  • Sorting it all Out

    ESE is still not so easy for me!

    • 4 Comments

    Koushal asks via the contact link:

    Hi Michael,

    How you doin? I found your reply to a post regarding the steps to be followed and APIs to be used to get a list of tables in a .edb file using ESE APIs. You've written that you are not providing the sample source code because it was way long back when you had written and tested it and now, it may not work.

    But I'll be very much grateful to you if you can give me the source code. Presently, I dont have any source of reference so as to use the ESE APIs. I dont have the ESENT.H file and the ESENT.LIB file either. I didnt find the mentioned files in the Exchange SDK and I'm just referring to sample codes on the internet to get the constants and datatypes which any ESE programmer is supposed to collect from the ESENT.H file.

    So, can you please send to me your source code (no matter if it doesnt work) ? Also, if you could send me the header and lib files, it will help me in getting things going the right way. Presently, I have to load the ESE.DLL library and call GetProcAddress() for every function I have to use. If I had the LIB, I could had saved 1 function call with every step I had to perform and eventually lessen the size of my source code.

    Hoping to get your help at the soonest,

    Thanks in advance,

    Koushal

    I have actually checked with folks here (like Brett!) and although I was able to repro the problem of not having esent.h/esent.lib in some versions of the Platform SDK/Windows SDK, all versions of the Core SDK from Window Server 2003 SP1 has the files -- so all you have to do is install the core SDK and you will have them installed.

    Contrasted with code calling Jet Red which I cannot give any headers or lib files for, which are not in the Platform SDK, and whose functions are exported without even having names, and trust me when I say that trying to learn about ESE through them is like trying to learn about driving by reading a comic book!

    The source code I had can occasionally give insights into using ESE, but more often it does not and occasionally it can be downright misleading (their functionality has never matched and their respective APIs have diverged over the years). And as I pointed out in this post, I won't usually know the answer. On top of everything, it is pretty off-topic here at SIAO!

    You may have better luck asking Brett or the Exchange Team Blog which has links to several other Exchange blogs....

  • Sorting it all Out

    The seven is not being crossed out

    • 0 Comments

    The question in the microsoft.public.word.international.features newsgroup was:

    In Europe the handwritten number seven has a short horizontal line through it.
    My question: Is there any Word 2007 or Vista system font that allows one to
    type such a European-style seven?

    In Brian Livingston's and Paul Thurrott's recently published book, Windows
    Vista Secrets, they used this European-style seven. I first wrote to both of
    them and have not received a reply.

    Perhaps I should also post this to the Vista newsgroup too. My thanks in
    advance if you can help me.

    Looking at all of the fonts I have installed on this machine, here are the ones with that short horizontal line through the seven:

    • Bradley Hand ITC
    • Forte
    • Guttman Haim
    • Guttman Haim-Condensed
    • Magneto
    • Script

    None of these fonts look like the ones used in a book, though....

    There do not seem to be many (and I am tempted to say *any*) fonts that will combine the number 7 with U+0335 to allow a line through the seven to look right....

    Perhaps some fonts include it as an OpenType feature.

    The real answer might be to ask Brian and/or Paul? It may not get the font but at least one will know where the seven comes from!

     

    This post brought to you by 7 (U+0037, a.k.a. DIGIT SEVEN)

  • Sorting it all Out

    Can you really say international support is irreplaceable?

    • 0 Comments

    Well, here is my logic.

    The System.String class has many members that support the current culture by default, and other cultures/comparison types by parameter.

    You know, like string.Compare and so on.

    But there is one member that does not have this behavior, at all. Either by default or via parameter modification -- string.Replace.

    Which is really unfortunate given that the "find" support in string.IndexOf makes it so easy to expect that if you can find a string using some cultural/linguistic sensitivities that you'd be able to replace using those same cultural/linguistic sensitivities, right?

    Though the fact that this does not work may be in part due to an omission I mentioned previously in in On being consistently consistent, while still managing to be dead wrong:

    ...in the case of FindNLSString there was a pcchFound parameter that would let the caller know that what was found was also of zero length so that a sensible and consistent check on that return of 0 would keep one from an AV. With the bonus that being that the check made sense in the non-corner cases, too. And not just for the sake of consistency but for the sake of returning correct results.

    Looking back to managed code, .NET doesn't have this feature, and there is in fact no easy way to emulate it in linguistically appropriate string comparisons; this was the reason that FindNLSString was added to Vista in the first place!

    In fact, it was in the initial design planning for FindNLSString, back when we were calling it plain old FindString, that Tarek (another dev on our team who I have mentioned before) pointed out that without the functionality that the pcchFound parameter provided, there was no good way to add replace logic via FindNLSString.

    (I find Tarek's cool contribution to FindNLSString to be quite ironic given that he owns a lot of the .NET Framework side of things, and the .NET Framework does not yet have this functionality. It causes lots of weirdness that keeps the System.String.Replace method from working consistently with System.String.Compare in many cases!)

    So there it is....

    What with all the green bits/red bits silliness, it will be a while before this could be potentially addressed with another overload. And then even longer before a string.Replace overload that made use of the new functionality could be planned out.

    Since it is just doing a binary operation, clearly string.Replace does not support "international" the way all of the other methods do, by default or otherwise.

    And thus by the converse theory of logic international support does not cover Replace.

    Which could easily be reworded (by a blog author who was willing to go to great lengths to try to take advantage of a pun!) international support is irREPLACEable.

    How'd I do? :-)

    Ok, seriously. This would make a cool set of features in a future version, I think. Even if the string.Replace method couldn't be made consistent with the other string members involving collation support in the default case for fear of backward compatibility breaks....

     

    This post brought to you by я (U+044f, a.k.a. CYRILLIC SMALL LETTER YA)

  • Sorting it all Out

    Microsoft is not uncaron^H^Hing about the issue!

    • 0 Comments

    Sometimes, in order get the best results in collation, one has to use constructs that from a linguistic or a Unicode general category standpoint might seem incorrect.

    A good example is times that a character which is not on the list of Mn (Mark, Nonspacing) characters in Unicode is given only diacritic weight, such as (U+0abd a.k.a. GUJARATI SIGN AVAGRAHA) that I discussed here and which is an Lo (Letter, Other). The expected result in proper linguistic support in collation is achieved, but only by violating the common sense expectations about how the character ought to be classified and used.

    Another example came from friend and former colleague Juraj who just emailed me a question the other day:

    My friend in Slovakia asked me: "How do I ignore diacritics when doing a query on a column? I tried COLLATE Slovak_CI_AI but it didn't work. When I did for example a query WHERE Column LIKE '%C%', entries with 'Č' were not selected. Then I found a post on a forum saying that this is not a bug, but a feature. I don't understand: how is such collation useful? Can you explain?"

    This is obviously not specific only to Slovak language (and not only to letter 'Č'), as I found through another blog post (in Czech):
    http://blog.vyvojar.cz/mafalt/archive/2006/10/31/_0C01ED00_m-n_E100_s-mohou-p_5901_ekvapit-collations_3F00_-II_2E00_.aspx

    Juraj is right -- both Czech and Slovak consider some of the letters such as U+010c (LATIN CAPITAL LETTER C WITH CARON) to not be treated as a base letter plus a diacritic but instead as a character with a unique alphabetic weight, an issue I discussed previously in this post and this one.

    But notice what is happening both the example Juraj gave and in the Czech blog post he referenced -- people are not trying to order data, they are searching within data and are thus concerned with identity. And because of this, they consider the support that has been calibrated around the principle that ORDER and IDENTITY should be treated the same to be returning non-intuitive results.

    It actually takes me back to the issues I described in Hungarian is even more complicated than I thought where the fact that collation algorithms such as both Microsoft's and Unicode's do not separate the kind of results produced by a CompareString function and a mythical EqualString function.

    In the specific IDENTITY case of search, however, one can make the case that they should sometimes be separate, on a per-language basis -- it is pretty clear, for example, that most Swedes still would not want å (U+00e5, a.k.a. LATIN SMALL LETTER A WITH RING ABOVE) to ever be treated like an ordinary a (U+0061, a.k.a. LATIN SMALL LETTER A).

    The difference between the two cases is obvious, of course -- if the letter sorts nowhere near its base (as in the case of Swedish where it sorts after z rather than near a), one would probably not want search to find anything other than the character one asked for. While in the cases such as Czech and Slovak (where the letter sorts after the base character but with a unique alphabetic weight), the "folding" within search is perhaps more expected.

    HOWEVER, with that said, a simple IGNORE_NONSPACE result is also not what users would expect. They actually would in all likelihood still (when searching for Č) want the results with the CARON to be preferred, rather than simply treating the characters both with and without the CARON to be treated identically.

    This is not a feature that collation in Windows or .NET or SQL Server or the UCA in Unicode currently supports, and in fact the data are laid out in a way that makes it harder to do, requiring multiple passes over the data.

    This simple fact suggests possible solutions to the problem, none of which are easy but all of which are tractable. People interested in search may even want to be thinking about a whole new semantic around how to morph the meaning of the word to IGNORE to mean something more like PREFERBUTNOTREQUIRE.

    But since someone from Microsoft has pointed the issue out, you clearly can't claim that Microsoft is unCARON about the problem!

    And as an aside I think it is hilarious that most spellcheckers recommend uncaring when they see the word uncaron, my favorite "pun-as-a-psedudo-back-formation" of the day!

    (this post dedicated to Petra and Oskar, the two people in the world who were special enough to draw away one of the five best collation testers of all time from their job in Redmond; no one will ever accuse Juraj of being unCARON!) 

     

    This post brought to you by č (U+010d, a.k.a. LATIN SMALL LETTER C WITH CARON)

  • Sorting it all Out

    WinNT keyboard file source?

    • 1 Comments

    Yesterday's Win9x keyboard file source? kind of anticipates this post, I suppose.

    But it is happier news, at least. :-)

    The Windows Driver Kit (WDK) has both information on keyboard layouts on NT-based platforms and the build-able source and header files for several samples. And unlike the Windows 98 DDK, it is available for download!

    But beyond that, you can use MSKLC 1.4's kbdutool.exe, which (when run with the /S command line parameter against a .KLC file) will emit the source files for the layout designed in the MSKLC user interface.

    The biggest advantage to this would of course be that you can build a much wider array of samples where you can look at the source (for features like keyboard "ligatures" and SGCAPS, just to give two examples).

    It is true that most people probably wouldn't care about this kind of thing -- and they can just use MSKLC directly. But those who would will probably love the chance to really see how these features are represented in the keyboard layout DLLs by looking at the source directly....

     

    This post brought to you by(U+a444, a.k.a. YI SYLLABLE NJYT)

  • Sorting it all Out

    Win9x keyboard file source?

    • 9 Comments

    Thorsten Glaser asked over in the Suggestion Box:

    Hi,

    thanks for MSKLC, now I'm able to have the same keyboard layout on the BSD wscons (text mode) console, under X-Window and on Windows NT/2k/… – with a “meta” key that just adds 0x80 to the value of the character (e.g. maps Meta-d to ä), emulated with AltGr on NT and Mode_switch on X11, and a few funny characters I'm occasionally needing (…€„™“”•–), and Ÿ for the sake of  completeness.

    Now I've seen Janko's Keyboard Generator for Win9x, and I wonder if the format of the .KBD files is publically documented. If so, I could create the same (almost) layout with a hex-editor, which sometimes is capable of doing more than some random UI programme. (Even MSKLC wouldn't let me re-map AltGr-Tab at first.) Maybe it's just not possible, but even then, I'd be interested if there's some kind of docs for that format which I couldn't find (probably because it's been 12+ years since “Chicago” was new).

    Thanks in advance!

    The source, header files, samples, and build environment to build keyboard layouts on Win9x has been a part of the Windows 98 DDK even back all those years ago when it was the Windows 95 DDK.

    For documentation, the entire section of the documentation entitled Windows 95 Keyboard Driver is of particular use here, as is the subsection within entitled Keyboard Layouts.

    In fact, the only problem with this advice (which totally answer's Thorsten's question!) is that the DDK no longer appears to be available for download (not entirely surprising since it is as old as it is and Windows 98 is no longer supported). So I hope Thorsten has a copy of the DDK installed somewhere, or knows someone who does....

    I may post more on this topic in the future, though I am probably more likely to talk about NT-based keyboards, all things considered. :-)

     

    This post brought to you by  (U+a1d9, a.k.a. YI SYLLABLE LYR)

Page 3 of 4 (50 items) 1234