Blog - Title

MSLU

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    When my decisions come back to haunt me (and/or others!)

    • 28 Comments

    A little over two half decades ago, I made a particular technical decision for a project I was working on at Microsoft.

    I mention the reason over a half decade ago in a blog in this Blog o' mine.

    And a little under two days ago, a blog by Eric Lawrence brought it all home to roost.

    His blog on EricLaw's IEInterals titled Brain Dump: Shims, Detours, and other “magic” is a good read, and describes a fascinating bug involving IE10, a third party extension IE10 ships, and MSLU, the Microsoft Layer for Unicode.

    You can read the full blog (it's a good read!) but I'll quote the relevant portion here:

    I spent several hours pondering this question and aimlessly touring around in the debugger. I was whining about this scenario to a colleague, complaining about code so ancient that it was shipping with unicows.dll, when I realized that I’d never used this library myself, and in fact I’d never seen a toolbar use it before. When trying to explain what it did to the colleague, I decided that I’d probably stop hand-waving and pulled up unicows up on Wikipedia. And bam, there it was, plain as day: 

    By adding the UNICOWS.LIB to the link command-line [ ... ] the linker will resolve referenced symbols with the one provided by UNICOWS.LIB instead. When a wide-character function is called for the first time at runtime, the function stub in UNICOWS.LIB first receives control and [ ... ] if the OS natively supports the W version (i.e. Windows NT/2000/XP/2003), then the function stub updates the in-memory import table so that future calls will directly invoke the native W version without any more overhead.

    …and there’s the problem!

    When IE first loads a toolbar, the shims run against the module and wrap all calls to CreateWindow with a call to the compatibility wrapper function. But when IE loaded this toolbar, it didn’t find any calls to CreateWindow, because those calls had been pointed at a function inside unicows.dll instead of at the original function in user32.dll. As a result, the compatibility shim wasn’t applied, and the function call failed.

    Now, this wouldn’t have happened if unicows did its import-table fixup the “normal” way, using the GetProcAddress function. That's because the compatibility shims are applied to GetProcAddress as well, and the fixup would have been applied properly at the time that unicows did the update of the import table. However, for reasons lost to the mists of time, the implementers of unicows instead copied the source code of GetProcAddress from user32 into their own DLL, so the shims had no way to recognize it. While we could add a new shim to handle unicows.dll, the obscurity and low priority of this scenario mean that we instead decided to outreach to the vendor and request that they update their build process to remove the long-defunct support for Windows ‘9x.

    Well, I'll object a little about the characterization that things that feel so recent to me are "lost to the mists of time". :-)

    Though I won't complain too much, since the issue in question caused him to be randomized so obnoxiously!

    The blog my mine that covers the issue is from point one of May of 2005's Why does MSLU wrap ________ ?:

    1) There is, for example, the GetProcAddress function. It takes a string, but never a Unicode string, on NT or otherwise. So why would it need to be wrapped?
     
    Well, it turns out that the GetMonitorInfo function, defined in multimon.h, is not just a simple prototype. There is a bunch of complex code in it that conditionally calls various APIs, including GetProcAddress, to get a function pointer to replace any call to GetMonitorInfo. Because of this, MSLU could not wrap the GetMonitorInfo function, because the wrapper would never be used. The only way to allow the to wrapper to work was to wrap GetProcAddress and look for where someone was trying to retrieve the address of GetMonitorInfoA or GetMonitorInfoW!

    This was back in the heady days when I had the DaveC like power to have influence on pretty much any function in multiple versions of Windows.

    Even if the versions were Windows 95, Windows 98, and Windows Me.

    I suppose there is a small procedural problem with trusting a troubled perfectionist such as myself to act as sole architect/program manager, principal developer, and only tester on a project.

    But my manager at the time had quite a knack for making me feel slightly foolish while asking questions that in retrospect seem quite reasonable like

    Don't we need someone with PM experience here?

    or

    Should I really be in the only one in charge of testing  code I wrote myself?

    while simultaneously making me feel like I could get the job done.

    So perhaps I can be forgiven this particular sin.

    Though really I think I owe Eric lunch one of these days to apologize.

    Eric -- sorry about that! Call me after I get back from Brisbane in a couple of weeks! :-)

  • Sorting it all Out

    LOAD_LIBRARY_AS_DATAFILE intends to be an underachiever

    • 0 Comments

    The recent question reminded me of something:

    Hi, 

    We recently switched over to using LOAD_LIBRARY_AS_DATAFILE for our INTL dlls, which excludes the DLL from the list of loaded modules.

    We tend to get a lot of Watson crashes of mismatched intl dll testing. Without having the intl DLLs in the list of modules, we can't tell the version anymore.

    Is there a workaround other than manually loading the version and sticking it in some global?

    It reminded me of how when you linked to unicows.lib to load/use the Microsoft Layer for Unicode, you'd suddenly "lose" all of your Unicode exports in the binary's official export table!

    Kind of a required feature of the MSLU loader, that -- to redirect all those Unicode calls! :-)

    In this case, LOAD_LIBRARY_AS_DATAFILE really is working as designed.

    One of the features of it is lower overhead, which includes both not running init code, and not showing up in the loaded dependent modules.

    If you need that info, you either have to back out the change, or load all of the information yourself.

    Other solutions such as loading them the okld way under debug won't help if you want to look at Watson crashes, since customers will seldomly be using a debug build of your product!

    Since you are loading the DLL as a datafile to get it's string resources, you can definitely load up the version info.

    You can even write code to detect problems or even fix them -- and even find bugs earlier! :-)

  • Sorting it all Out

    "Now you know what it's like to live in my brain...."

    • 1 Comments

    Where were you at 10:54am on February 28, 2001?

    It was a Wednesday, if that helps you remember.

    Or, if you are more event oriented, it was the date and time that the "clocked at 6.8" Nisqually intraplate earthquake happened.

    I was at work, in Building 9 at Microsoft.

    I had come in early that day because I was excited about finding the fix for a bug in the Microsoft Layer for Unicode on Win9x Systems, which was going to be announced soon. In fact Cathy and I got the approval to not turn in our slides for our stodgy sounding Unicode on Downlevel Windows talk at the 18th Internationalization and Unicode conference in Kowloon Bay (HongKong). We broadly hinted that it was something e couldn't talk about yet but they wouldn't regret sticking with us (our slides were technically late at that point, but since both of us were on the committee for the conference we had some ability to influence and plead to not be replaced by a backup talk.

    But alas, I digress.

    Anyway, I was working on this bug that I had found a fix for, and wanted to make sure it wouldn't cause performance problems.

    Going on with me personally, the Multiple Sclerosis was (really just in the few months prior) starting to have a more marked effect on my balance, as I had been moving from an 'occasionally falls down" place to a more "depends on the cane to not fall down all the time" place. and I had moved from mere disequilibrium (where I wouldn't feel unsteady but the ground would suddenly come up at me) to a more overt feeling of unsteadiness that I could no longer ignore but was doing my best to not pay attention to.

    People in the hallway were making noise all of the sudden. So I grabbed my cane and got up to investigate.

    Everyone described what was happening with us all pretty much poking our heads out of our office doors, basically standing in our door frames because of some vague notion that this would be safer.

    We were a bunch of n00bs when it came to earthquakes, and all of my previous seismic experience (time spent in Japan and in Southern California) usually involved my being somewhat intoxicated and/or romantically entangled, so it wasn't like I had much to add anyway.

    I didn't feel anything different, though.

    My world had been going topsy turvy all the time now. Though I did have one comment on the matter that I made to the people looking out into the hallway:

    Now you know, now you know what it's like to live in my brain.

    We had no injuries; the epicenter was far away from us.

    And we did all get back to work after that, and much later when an "emergency procedure" manual showed up in all our offices, even the page explaining what to do in case of volcanic eruption (call reception, don't leave the building or touch lava) had a slight edge to it beyond the obvious humor, since if we could feel an earthquake did lava seem so very out of the question?

    Of course those manuals are gone, so we have no idea what to do In Case of Lava....

    But alas, I digress again.

    Anyway, there you go -- my experience of the Nisqually intraplate earthquake of 2001. It was good it happened late enough that people were around or I may not have even noticed the ~46 seconds that everyone in the Pacific Northwest got a little bit of what it was like to be me....

    You're welcome, of course.

  • Sorting it all Out

    I serve at the pleasure of the customer (except maybe when they annoy me?)

    • 2 Comments

    Over in the Suggestion Box, regular reader Yuhing Bao asked:

    What do you think about going to you directly via going through PSS?

    He was referring to a conversation going on in the comments of a blog of mine from over 4 years ago titled Is MSLU still supported?.

    I was pointing out how in the specific case of MSLU at that time, if people asked a question of PSS it would make its way to me eventually, and if people asked me directly then it would also make its way to me.

    I know that some people are more comfortable with one way, some with the other. They both work, so really I was slightly annoyed when the same request got to me via many different channels but I really did not judge one channel as better than the other.

    I mean, I serve at the pleasure of the customer (except perhaps when they annoy me?), so why should I judge them for which way they like to talk unless their way is every way? :-)


    Now this is going back half a decade ago, when the landscape in PSS was pretty different.

    These days, when I send notes to specific folks in PSS about writing a KB article on a particular topic, they tell me that someone else does those now. It is no longer PSS writing them based on specific cases where they provided assistance (and KB articles became a way that the help could get to others, whether by another PSS engineer finding it or a customer finding it directly).

    And I suddenly realized it had been a while since I had received email from a PSS engineer asking me about a customer question.

    Over the past six months I had probably had more interaction with the VP I mentioned in Are you Mr. Kaplan? (and he is now in customer and partner advocacy) than I had with anyone in PSS.

    I know product support still exists, thouh now the only trace I see of it is in PSS folk asking questions of some of the distribution lists I'm on.

    They get answers and therefore customers still get answers.

    So perhaps this change, this shift, this (dare I say it?) re-organization has not impacted Product Support Services in a negative way.

    Though I know I am not really contributing a much as I used to.

    So while I would have used to have said "ask any way you like!" about MSLU or topics I know about, maybe now I'd say "if you think it is something I know about and you want an answer from me, you should probably ask here, via the Suggestion Box." Because otherwise I may never see the question, and I am probably less likely to be a significant contributor to the response.

    if I don't know the answer I'll point you to PSS. And if you ask the question via multiple routes I am less likely to even know about it so perhaps the annoyance factor isn't a factor for me anymore.

    So I guess you can just ask whoever and how-many-whoevers you wish to, based on your own personal preferences. This may be the most efficient org structure yet built for customer satisfaction via support.

    Well, at least until the next re-org....

  • Sorting it all Out

    Bytes and Characters and bugs and W's

    • 2 Comments

    There are times that I am very happy for some of the eccentricities of the way I look at code.

    And the way they keep me from certain kinds of bugs.

    Like a few hours ago when I happened to spot Matthew Wilson's memset() Considered Harmful - especially to those who (think they) know what they're doing! blog.

    Just the title at first.

    And two things immediately came to mind.

    The first thing?

    I know the problem he ran into.

    Did you see it too? Before you clicked on the link, I mean. :-)

    The second thing that came to mind?

    Too bad he didn't use wmemset; it would have saved some time here!

    The truth is that one of the biggest sources of bugs if one moves a lot between Unicode and non-Unicode programming is byte/character count problems. In fact that is one of the great things about wmemset, the way it (and its ilk) takes that particular variable out of the equation even if one doesn't happen to have std::fill_n() at one's disposal. :-)

    My trial by fire for all of this was MSLU; the constant need on a per-function basis to be thinking about the Unicode and the non-Unicode kept me on my toes here and really drilled the issues into me to think very very carefully about buffer sizes. And also to feel smugly superior to all the Win9x code that tended to pop up with "bugs" occasionally related to non-Unicode buffers that were twice the size they needed to be except on the CJK versions where those were the expected buffers (and no they were not thinking ahead brilliantly, they were just messing up byte/character counts in WideCharToMultiByte calls!).

    Now there are some flaws in the docs for wmemset and its ilk that I just noticed, like the security warning that really is two different problems between memset() and wmemset() and therefore deserves a bit of wordsmithing beyond a generic pointer to warnings about avoiding buffer overruns.

    And the suggestion that the .Net Framework equivalent for memset() and wmemset() is System::Buffer::SetByte?

    That's a keeper, for sure. I mean what better way to introduce byte/character mismatches into the .Net world so elegantly than that method and a cast or two? :-)

    Now of course the viewpoint doesn't make me invulnerable to all bugs, it just makes a certain class a bug a lot less likely....

  • Sorting it all Out

    Keeping myself up at night figuring out how to delight someone by making that special part of my package smaller

    • 4 Comments

    Conventional wisdom tells us that size matters, and unconventional (or more accurately, inappropriate?) wisdom tends to concur.

    Most of the time it refers to the idea that bigger is better.

    But there are some times that it matters in the opposite way -- where the smaller something is, the happier people seem to be.

    Like the other day in an interesting email conversation where somebody noted:

    However, it's a stretch to say TCHARs belong in the 1990s.  The PC I bought new in 2000 came with Windows 98, and it lasted five years.

    In the real world (i.e., outside Microsoft), you don't always have the luxury of ignoring older platforms.  The last product I worked on before joining Microsoft still has a small but significant Win 9x user base.  Most of the growth opportunity was outside the US, and in those countries, older OSes were even more commonplace.

    We investigated Unicows, but since the product was download only, and since the users on the older machines tended to be dial-up rather than broadband, we couldn't convince product management to let us add Unicows to the download.

    When I say "the product was download only", I meant the ISV's product, not the MSLU redistributable.  Making the ISV product dependent on Unicows would mean including the redistributable in the download.

    Since most of the customers who were on the Windows 9x/Me platforms also had slow dial-up connections, the cost was considered too high.

    Now I am a big fan of MSLU (the Microsoft Layer for Unicode on Windows 95, 98, and Me Systems), and not just because I was the developer on the project.

    After all, as I mentioned in Why/how MSLU came to be, and more:

    But when she laid out the idea of a layer that would be able to let people write Unicode applications that would do incredible things on Windows 2000 while still being functional on Win9x, I was amazed. When she asked whether I would I be interested in doing this project if she could get people behind the idea of making it happen, I think I said Hell yes! or something equally coherent. It sounded like an amazing project to be involved with!

    Notice that I was amazed before I had any idea that she was trying to figure out if I'd be interested in the job. It was just amazing on its own, and would have been so even if I had nothing to do with it.

    There are countless such projects that I have mentioned in the past that I had nothijg to do with other than as a fan. It is just my way!

    Anyway, I have a big [erson connection to the project, since it has beedn interesting to me since the first time someone laid the idea out in front of me.

    It really gave me pause to think that the download size was such a hassle, though.

    The full download that the developer would need to bring on their machine that you would get from here is 225KB (the site erroneously claims it is 261KB, I think because an older build was). But that download includes the license, redist information, PDB file, and DLL -- which is way more than any product would ever need to ship since only the DLL is needed.

    The DLL is a whooping 252KB, compressible down to a mere 108KB with WinZip legacy compression and 94.3KB with the maximum optimized compression.

    This small size was not technically one of the design goals, but other goals involving minimal resources required and minimal working set increase helped to indirectly lead to the same thing.

    Now perhaps we are talking dial up connections to Windows 95 and such an addition is just unacceptable, but I have a hard time taking it quite that far as to think that a decision is being made based on whether it would cause the download to be 100KB larger.

    But clearly it was.

    Unbelievably, I found myself wracking my brain figuring out where to re-enlist in the source project, thinking about the code itself and wondering what I could do to make it smaller, how I could magically delight a customer who is likely way past that now and who would probably not be interested anyway.

    So there I am, tilting windmills and trying to solve problem that can't be solved at this point.

    It just so happens that this all occurred at 2 AM, a time when I probably should have been sleeping.

    And the punchline?

    How often is it that a man is keeping himself up at night wondering how to delight someone by making that special part of his package smaller? :-)


    This boog brought to you by(U+2aaa, aka SMALLER THAN)

  • Sorting it all Out

    Tavultesoft is one of the company names mispronounced more often than Trigeminal

    • 13 Comments

    Regular readers may recall that I have mentioned Marc Durdin in the past, especially in posts like the recent The key to key messages is a key contribution, where I went on for a bit about the fact he impresses me professionally....

    I also enjoyed the Australian beer that he and Gary McMullan brought for me when I saw them last. I suppose that might be being happy with the two of them personally. :-)

    I vaguely recall the night when Marc and Peter Constable were ordering Thai food in Thai was also fun. If he were not spending so much of his life down under he'd be cool to hang out with, I imagine.

    And Marc's father John Durdin has probably forgotten more about Lao then I might ever have the opportunity to know ever if I moved to Laos tomorrow and spent the rest of my life there. And I am not just talking about his sorting efforts, which we wouldn't be as good as even if we were working properly. Note that this paragraph has nothing to do with anything, except to point out that his dad impresses me too!

    Anyway, I bring up Marc for a reason.

    The other day something very cool happened.

    Tavultesoft joined the Unicode Consortium as an Associate Member!

    Tavultesoft Pty Ltd.

    In their own words from their site:

    Tavultesoft is the developer of a market-leading keyboard mapping software, Keyman. Keyman brings a simple solution to the complexity of typing in a range of languages and scripts. It is the solution for languages that are either unsupported or only partly supported by the operating system. The Keyman product family includes keyboard design tools, Windows-based keyboard mapping, and web-based JavaScript keyboards. Keyman has attracted users from around the globe who both benefit from the software and the keyboard layouts available. Linguistic experts around the globe contribute their expertise and skills to develop keyboard layouts for both common languages and languages that otherwise would have no support on computers. Keyman is now in its 7th release since it was first developed in 1992.

    Now Keyman is a product that I think is cooler than MSKLC for several reasons, including the obvious such as the fact that it covers scenarios that MSKLC doesn't, such as Win9x.

    Though one of the things that really impresses me is that Marc Durdin of Tavultesoft actually dug in to all of the Text Services Framework interfaces and such and figured them out. Enough to produce a working prodcuct, and enough to be able to push back on Microsoft when they ran into bugs -- which more often than not were actual bugs and limitations in the Text Services Framework!

    The people who can dig in to complex components like this (the work of Rick Cameron of Crystal Decisions to support Uniscribe and also to support and encourage the extension of MSLU while it was in early beta under development and there was no information about it is another example) are impressive because the lack of samples and sometimes even documentation does not daunt them.

    They know that we are likely full of crap if we claim it's easy since we don't have samples out there, but they go in and figure out the hard stuff anyway.

    Plus the many things that Keyman can do -- that MSKLC and Text Based TSF TIPs can't -- make it fairly unique among such tools and required for sensible input methods in more languages than many experienced folks in this area can fathom....

    Anyway, enough gushing. Welcome, Tavultesoft, to Unicode, as an associate member!


    This blog brought to you by(U+0e9f, aka LAO LETTER FO SUNG)

  • Sorting it all Out

    Its the End[UpdateResource] of the world we know it

    • 4 Comments

    It was late last week when Maksim asked a very interesting question via email to one of those large aliases at Microsoft:

    SUBJECT: EndUpdateResource failing after adding cirtain number of items with UpdateResource

    Hi,

    It appears that there is a bug (or undocumented behavior anyway) with BeginUpdate/Update/EndUpdateResource functions.

    When I am adding more than certain number of resources this way, EndUpdateResource returns with error ERROR_INVALID_DATA. The exact count of items is not always the same and varies depending on the length of resource names and resource types that I have.

    After running several experiments I have discovered that that the problem occurs according to following formula:

    (Cumulative Resource Names Length) + (Resources Count) * 25 + (Cumulative Resource Types Length) + (Resource Types Count) * 13 > 2040

    Can someone please say if there is a bug and if my assumed formula is correct? Or may be there is some other workaround apart from doing EndUpdateResource after adding each resource.

    My source code is below, the dll where I updated resources is a simple dll without any code:

    #include "stdafx.h"
    #include <string>
    #include <iostream>

    using namespace std;

    wstring MakeLongName(size_t length) {
          int randomNumber = rand();
          TCHAR buffer[65];
          ZeroMemory(buffer, 65);
          _itot_s(randomNumber, buffer, 65, 10);
          wstring randomPart = buffer;
          length -= randomPart.length();
          wstring result;
          result.append(length, 'X');
          result.append(randomPart);
          return result;
    }

    int _tmain(int argc, _TCHAR* argv[]) {
          CopyFile(L".\\testdll.dll", L".\\testdll1.dll", FALSE);

          HANDLE hLibrary = BeginUpdateResource(L".\\testdll1.dll", TRUE);
          if(hLibrary==NULL) {
                cout << "Failed to BeginUpdateResource. Error: " << GetLastError() << endl;
                return 1;
          }

          for(long i = 0; i < 10; i++) {
                BYTE data[100];
                ZeroMemory(data, 100);
                wstring longName = MakeLongName(230);

                if(! UpdateResource(hLibrary, L"Y", longName.c_str(), MAKELANGID(LANG_NEUTRAL, SUBLANG_NEUTRAL), data, 100)) {
                      cout << "Failed to UpdateResource. Error: " << GetLastError() << endl;
                      EndUpdateResource(hLibrary, TRUE);
                      return 1;
                }
          }

          if(! EndUpdateResource(hLibrary, FALSE) ) {
                cout << "Failed to EndUpdateResource. Error: " << GetLastError() << endl;
                return 1;
          }
          return 0;
    }

    I had not seen this cone up before, but this is a function I have found interesting since all the way back when we the resource updating functions in MSLU (described here).

    The answer to this particular riddle came from developer Paul:

    EndUpdateResource fails if it cannot extend the .rsrc section of your DLL. I’ve seen this happen if the .rsrc section isn’t the last section in the image – and that’s frequently the case (a few experiments show that .reloc usually follows .rsrc using the Microsoft linker). Annoyingly, LINK.EXE always seems to insert a .reloc section, even if you have a resource-only DLL. (The formula you discovered is an approximation for “the .rsrc section cannot be extended”.)

    Now as to whether this is a bug of by design....

    It really is by design.

    Twice.

    Now I am not going to dig into the format of PE files, since for that you can look at:

    to get the lowdown here.

    So for the first by design we'll look to the linker.

    When the Microsoft Linker (LINK.EXE) does its work it makes a lot of sense that it makes the .reloc section last rather than the .rsrc section, because the latter is more or less gunk that is alread compiled by the Microsoft Resource Compiler (RC.EXE) and which it does no t really need to modify -- it just has to align, while the former is the section that it arguably has to do some of it hardest work in to have all of the relocation entries.

    Matt also has a less cynical reason he mentions in that second article:

    Working backwards from the end of the executable, if there is a .debug section in the OBJs, it's placed last in the executable. In the absence of a .debug section, the linker tries to put the .reloc section last because, in most cases, the Win32 loader won't need to read the relocation information. Cutting down the amount of the executable that needs to be read decreases the load time.

    Then for the second by design we'll look to the EndUpdateResource function and its cousins (BeginUpdateResource and UpdateResource), though really that first function I mentioned is the real bad boy here.

    While it does a bunch of work inside the .rsrc section, it doesn't start mucking around a whole bunch with the rest of the PE file. Reordering sections just fall a bit outside of its current beat, if you know what I mean.

    Paul had some thoughts about workarounds:

    If you have control over how “testdll1.dll” is created, you might be able to figure out how to manipulate the PE sections so that .rsrc always goes last. In my code, I was able to start with a hand-crafted resource-only PE file which had only a .rsrc section.

    Matt's first article gives some info on removing the .reloc section:

    If you do decide to remove relocations, there are three ways to do it. The easiest is to specify the /FIXED switch on the linker command line. Alternatively, you can run the REBASE program with the -f option on your executable. REBASE comes with the Win32 SDK. The third way to remove relocations is the new RemoveRelocations function in the Windows NT 4.0 IMAGEHLP.DLL. My sample code below shows how to use RemoveRelocations.

    Though to be honest this is something I try to avoid, especially with /FIXED, because I have seen multiple sources that suggest this to be a bad idea for two reasons:

    • If the file has to be relocated then it simply won't load, even if it's a resource-only DLL, unless you load it via LoadLibraryEx with the LOAD_LIBRARY_AS_DATAFILE type flags;
    • On debug builds, it seems that sometimes the Microsoft Linker still adds a .reloc section, even if you pass /FIXED, something I have not seen documented.

    Though your mileage may vary.

    And of course someone could write a tool to simply do the reordering of these two sections in the binary; the principal thing to worry about (and the easiest bit to mess up) is not aligning things properly, but that isn't too hard, so it might be worth just grabbing the source from Matt's PEDUMP (used in the last two articles on the list above) and the code to remove the .reloc section from the second one to use as a start and then working to just write the whole file out with these two sections reordered.

    Now if someone were to decide to fix it -- to unmark the by design flag on it -- whose job would it be?

    On the whole I'd say the fix should be in the EndUpdateResource function, for several reasons:

    • If my conjecture about the linker's operations is true, there is no need to make its work more complicated here;
    • There are very good reasons to not formally document or tie down the rules of image layout produced by the linker -- something that fixing this issue would do;
    • The potential performance benefit to putting the .reloc which is often not needed at the end and the .rsrc which is usually needed not at the end just makes sense;
    • The only people who might care about the section order are the people who call the EndUpdateResource function, so changing the rules for how everyting is built when only a small number of people would need it would be less than ideal;
    • The limitation itself is clearly in the EndUpdateResource function, and there are real benefits to having bugs fuxed where they are instead of architecting around them.

    Of course now we get to the really unfortunate aspect of all of this.

    In Windows, there are some components with specific owners, and others that are really considered to be very shared, with no specific owner who would be responsible for daoing major updates.

    Many times that "no owner" status comes in code that has not required changes in a long time.

    Code of that sort often finds new owners via the "Chess move" theory of development -- i.e. "you touched it, you own it", but the resource updating functions (BeginUpdateResource, UpdateResource, and EndUpdateResource) have proven quite resilient to this, with people who modify it managing to be able to avoid becoming owners except within the scope of their own changes.

    So finding someone to volunteer to own this particular change could prove to be a challenge (especially since one can fall back on the whole by design thing!).

     

    This blog brought to you by(U+32ae, aka CIRCLED IDEOGRAPH RESOURCE)

  • Sorting it all Out

    They weren't on crack; they were just on a new, unknown track!

    • 1 Comments

    Recently, it has been interesting to note how the microsoft.public.platformsdk.mslayerforunicode newsgroup has been getting traffic, but traffic that has nothing whatsoever to do with MSLU, the Microsoft Layer for Unicode on Windows 95/98/Me Systems.

    And the other day I got a mail from C++ MVP Mike that explains what is going on here!

    His mail:

    Did this screen always say "Unicode" when describing the mslayerforunicode newsgroup?  (scroll down on tree on the left also)

    http://www.microsoft.com/communities/newsgroups/en-us/default.aspx?dg=microsoft.public.platformsdk.mslayerforunicode&cat=en_US_5fcd6081-85f7-467c-9e45-91ff49835c3b&lang=en&cr=US

    may finally explain why we've seen non-MSLU based questions posted to this newsgroup in the past, and have unjustly scolded people for doing so :)  Or maybe the change was recent.  I'm just wondering if you remembered whether this always said "Unicode", on the tree and in the heading on the pane ("Discussions in Unicode")

    Well, they say a picture is worth 1000 words:

    Well, that explains it!

    I guess I'm old school, getting at newsgroups through NNTP style portals rather than HTML style ones. Which is why I never noticed this one.

    But it looks like this "simplification" layer is used with lots of the groups. It is just our bad luck that the one used in this particular group is so misleading....

    Or is this a call to repurpose the group? We have the bull by the horns, maybe we should just hang on and ride it?

    I mean, given the whole MSLU support question is playing itself out. and the lack of heavy activity on the international newsgroups in general.

    But then of course there is the name under NNTP (microsoft.public.platformsdk.mslayerforunicode) to contend with.

    Maybe no one uses NNTP anymore.

    What do you think?

     

    This post brought to you by ! (U+0021, a.k.a. EXCLAMATION MARK)

  • Sorting it all Out

    Is that character in the font or isn't it?

    • 4 Comments

    Regular reader Yaytay asks over in the Suggestion Box:

    How can I find out, from code, (reliably and completely:-) ) which fonts support a given character?

    I've tried using GetGlyphIndices, but there are still some fonts that return non-zero values for glyphs they don't have.

    My comment from your blog here:
    http://blogs.msdn.com/michkap/archive/2007/01/31/1563080.aspx

    The utility converts a regex into a list of unicode characters (based on their name) and then when you select one of those characters it displays that character in all installed fonts that support the character.

    It is based on the GetGlyphIndices function.

    Unfortunately some fonts return a non-zero value for a given character, but don't actually support it (displaying the default rectangle when used).

    Is there a more reliable way to determine support for a code-point in a font?

    When I've got this utility working more reliably I'll make it available if anyone wants it.

    Rather than using GetGlyphIndices, I always found myself using GetGlyphOutline, instead.

    The bias probably comes from the work I did in MSLU to support GetGlyphOutlineW on Win9x platforms, that I mentioned a few years back in Getting all of the localized names of a font, to date the blog that still gets the most U+fffd-filled spam comments....

    Though if GetGlyphIndices is mapping code points to the notdef glyph, then GetGlyphOutline might be as well. So this my or may not be a solution.

    Perhaps you could take the hint from Getting all of the localized names of a font and grab the CMAP directly.

    Or even better try by a different route -- via a ScriptGetCMap call? Though this could get expensive across all of Unicode, across all fonts.

    But as the PSDK topic describes all of the information about dealing with the default glyph (aka the NOTDEF glyph), at least the function has given things some thought:

    This function can be used to determine the characters in a run that are supported by the selected font. The application can scan the retrieved glyph buffer, looking for the default glyph to determine characters that are not available. The application should determine the default glyph index for the selected font by calling ScriptGetFontProperties.

    The return value for this function indicates the presence of any missing glyphs.

    Some code points can be rendered by a combination of glyphs, as well as by a single glyph, for example, 00C9; LATIN CAPITAL LETTER E WITH ACUTE. In this case, if the font supports the capital E glyph and the acute glyph, but not a single glyph for 00C9, ScriptGetCMap shows that 00C9 is unsupported. To determine the font support for a string that contains these kinds of code points, the application can call ScriptShape. If the function returns S_OK, the application should check the output for missing glyphs.

    Kind of gives a roadmap to how to think about the problem, and inspires some confidence that they are on the right track. :-)

     

    This post brought to you by (U+fffd, a.k.a. REPLACEMENT CHARACTER)

  • Sorting it all Out

    The hazards of appropriate cleanup

    • 0 Comments

    There was a mail thread that happened recently on one of those "if you aren't a fulltime employee then why the hell are you here?" kind of aliases, one that I belong to because by following along the problems (and occasionally looking at the remotes) I became much better at debugging.

    Plus sometimes I even have unique information about a problem (since my areas of expertise occasionally come up!). A few times I even ended up with International Fundamentals consulting work, helping people out with issues that I noticed. It isn't really ambulance chasing, but it is being somewhat near the ambulances in case the patients happen to need my help. :-)

    Anyway, they were talking about one of those interesting kind of bugs that pop up from time to time where somebody was delay loading a library during process exit and of course this was leading to other interesting problems.

    That horror story Raymond covered in Quick overview of how processes exit on Windows XP? That came up at one point, kind of a reminder about all the things that happened on thread tear-down that you just didn't want to bother with on process tear-down. Because they take too much time and may deadlock (plus in the case of this bug, actually did).

    And then someone asked me if I had this problem with the MSLU (Microsoft Layer for Unicode on Win9x Systems). He knew that we did some unload work....

    Actually, we don't do much here.

    What we almost did, that is the interesting part....

    We added a function to UNICOWS.LIB that you can find if you spelunk through the symbols (which some folks have) named:

    ___FreeAllLibrariesInMsluLoader

    And a sister function exported from the UNICOWS.DLL called:

    __FreeAllLibrariesInMsluDll

    These functions take no parameters and need none, either -- the former calls FreeLibrary on everything it called LoadLibrary on in the code inside the .LIB that is compiled into the application, and the latter does the same with all of the libraries loaded by the DLL, and clears all the function pointers.

    Technically, these actions are useful in the case of MSLU being completely unloaded, which is why they exist. There was even some discussion at the time about putting it in the DLL_PROCESS_DETACH code in MSLU's DllMain, which at the time just had TLS cleanup code and such.

    But in the end, the decision was to leave the call out, since most people did not need function pointers private to MSLU to be so religiously unloaded -- given that once MSLU is unloaded no one could be calling them anyway.

    In fact, the main benefit of them is for when the thing that loaded MSLU may have leaked out function pointers to MSLU functions -- which means someone else may try to call the DLL. If it still happens to be in memory but its TLS information is gone, trying to call into it can cause it to use random memory information as if it was valid handles and function pointers. And in such cases it is much better for the code to crash immediately (which it will when all the pointers have been changed to NULL).

    A rare scenario, obviously. But one that is important every once in a great while.

    And people have found these functions in the past; there are even components and applications that use them due to their wholesome, cleanupy sound.

    So I figure they at least deserve a mention, now.... :-)

    This blog brought to you by(U+1803, a.k.a. MONGOLIAN FULL STOP)

  • Sorting it all Out

    How do[es what] the common controls [call ]convert between ANSI and Unicode?

    • 0 Comments

    The other day, Raymond Chen blogged about How do the common controls convert between ANSI and Unicode?, in response to a question in his suggestion box:

    In the context of an ansi (not unicode) app: How do the common controls (listview for example) decide which code page to use when translating multibyte to widestring?

    I had to debug an ansi app that was displaying corrupt strings on a traditional chinese system because the dialog font was causing the listview to use a codepage other than the system ACP when translating multibyte to widechar.

    Although I would seldom if ever disagree with about anything that builds out of the Shell depot, in this particular case I know of two specific exceptions to the CP_ACP rule one generally sees, though the differences may have less of a direct relationship to the Shell/comctl32 code, meaning he might still be right within his domain. :-)

    The two other behaviors I have run across in various versions of the common controls:

    • Use of the thread code page (CP_THREAD_ACP)1.
    • Use of the code page associated with the font charset selected in a device context.

    I honestly don't know much about the first one, but I remember reports of bugs where changing the thread locale (which changes the thread code page) would change the behavior here, and particularly on the pre-6.0 controls there was a real ANSI-Plus thing going on here that tried to move beyond CP_ACP, so while I had no proof it was true I suspected it might be.

    The second one, I have more insight into since I had to debug it on a few occasions -- basically the text would not always be converted to Unicode at all; and the ANSI text is sent to GDI with a DC containing a font set to use a charset most associated with some other code page. GDI would then do its job to render and make choices there that it was kind of asked to, in a bizarre and not well understood sense.

    As a rule, any time GDI tries to get into NLS stuff, the results are predictable -- buggered, every time. Thus we have problems like the ones I pointed out in What the hell is wrong with TranslateCharsetInfo, anyway?. Between problems like that and the one discussed in Double Secret ANSI, part 2 (the brokenest one yet, sorry 'bout that!) and Sometimes when you say 'the fix is in' you mean it in a good way, one thing is clear: the GDI folks should consider taking a trip over to the NLS team and giving them all atomic wedgies.

    Just kidding, but you know what I mean.

    For the Common Controls, when I was doing MSLU work I ran across many cases where having the latest updates on Win9x would give a lot of GDI-influenced support of text where adding MSLU and a CP_ACP mechanism broke test applications until I changed the code to do something more like this to get the code page to convert with:

    UINT CpgFromHdc(HDC hdc) {
        int chs;
        CHARSETINFO csi;

        chs = GetTextCharset(hdc);
        if(TranslateCharsetInfo(&(DWORD)chs, &csi, TCI_SRCCHARSET))
            return(csi.ciACP);
        else
            return(g_acp);
    }

    So anyway, the CP_ACP rule should be the only rule. but there are way too many pieces of Windows that assume they know better what to use....

    On the other hand, so do I -- UNICODE! :-)

    1 - Now you know how I feel about this one if you've ever seen Nothing stinks worse than the thread locale, other than the thread code page. I think I was fairly unsubtle on my feelings.

    This blog brought to you by(U+0a36, aka GURMUKHI LETTER SHA)

  • Sorting it all Out

    Fight the Future? (#10 of ??), aka Looks like I wasn't mistaken

    • 3 Comments

    Content of Michael Kaplan's personal blog not approved by Microsoft (see disclaimer)!
    Regular readers should keep in mind that all I said in The End? still applies; the allusion to the X-Files continues for people who understand such references....

    Over eight months ago when I blogged Help prove that I am wrong, please, I honestly did hope that I was wrong and that this new microsoft.public.windows.international newsgroup would be a place where people who were not sure where to post would feel comfortable doing so.

    Created at the behest of some folks on the MUI team who as far as I can tell haven't ever actually posted there (though perhaps they are monitoring), and in a group that has to do date never received any actual traffic, I think that at least for now it is safe to say that I was right and that this group is not necessary for the state purpose of providing a new less confusing alternative to the other internationalization-esque newsgroups, since no one has actually posted there.

    There have been many new threads started by what appears to be new folk in other groups (mostly in microsoft.public.win32.progarmmer.international) and very few posts anywhere were uncertain or confused about where to post; people just seemed to have most past that issue without major quandries....

    This suggests that

    • No group was needed, or
    • The name is just as non-intuitive as prior attempts, or
    • Both.

    Perhaps it will catch on eventually, and if so I'll recant at that time.

    But for now, it is clear that this group appears to be the latest in the grand tradition of unnecessary and poorly named "globalization-type" newsgroups....

     

    This post brought to you by ! (U+0021, a.k.a. EXCLAMATION MARK)

  • Sorting it all Out

    How to avoid that problem of never being 'up to date'

    • 0 Comments

    I am never the type of superior elitist code snob who feels above people who improve on the things that I do. Or have done.

    And even if something is no longer officially supported by Microsoft, I am not the sort to just ignore people who are still trying to work productively with code that I have produced in the past....

    Like the other day over in the microsoft.public.platformsdk.mslayerforunicode newsgroup, Igor Solodovnikov wrote:

    To build your project with MSLU it is recommended (http://msdn.microsoft.com/msdnmag/issues/01/10/MSLU/default.aspx) to include following in the beginning of project's link list:

    /nod:kernel32.lib /nod:advapi32.lib /nod:user32.lib /nod:gdi32.lib /nod:shell32.lib /nod:comdlg32.lib /nod:version.lib /nod:mpr.lib /nod:rasapi32.lib /nod:winmm.lib /nod:winspool.lib /nod:vfw32.lib /nod:secur32.lib /nod:oleacc.lib /nod:oledlg.lib /nod:sensapi.lib
    unicows.lib
    kernel32.lib advapi32.lib user32.lib gdi32.lib shell32.lib comdlg32.lib version.lib mpr.lib rasapi32.lib winmm.lib winspool.lib vfw32.lib secur32.lib oleacc.lib oledlg.lib sensapi.lib
    <you own libraries>

    There is problem when you try to use MSLU in Visual Studio 2005: if you add recommended list of nod's and libraries to "Linker::Input::Additional dependencies" option in project's property ages then every time you will start debug session using "Debug->Start Debugging" or build your project using "Build->Build myprj" you will get something like this:

        ------ Build started: Project: myprj, Configuration: Debug Win32 ------
        Linking...
        Embedding manifest...
        Build Time 0:10
        Build log was saved at "
    file://c:\myprj\Debug\BuildLog.htm"
        myprj - 0 error(s), 0 warning(s)
        ========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========


    So i want to say that your project will never be in up-to-date state. Interestingly enough that this problem does not appear if you build problematic solution from command line using msbuild tool:

        msbuild mysolution.sln /p:Configuration=UDebug
     
    This means that building solution from command line is slightly differs from building from IDE.

    There is workaround for this problem:
    1. Include the following in "Linker::Input::Ignore Specific Library" option:
        kernel32.lib advapi32.lib user32.lib gdi32.lib shell32.lib comdlg32.lib version.lib mpr.lib rasapi32.lib winmm.lib winspool.lib vfw32.lib secur32.lib oleacc.lib oledlg.lib sensapi.lib
    2. Include the following in "Linker::Input::Additional dependencies" option:
        unicows.lib
        kernel32.lib advapi32.lib user32.lib gdi32.lib shell32.lib comdlg32.lib version.lib mpr.lib rasapi32.lib winmm.lib winspool.lib vfw32.lib secur32.lib oleacc.lib oledlg.lib sensapi.lib
        <you own libraries>


    Using such settings your project will be up-to-date when it should.

    That link at the beginning of what he wrote is to the article Cathy Wissink and I wrote in October 2001 (Develop Unicode Applications for Windows 9x Platforms with the Microsoft Layer for Unicode) and although it has obviously been many years and I can forget details in co-author situations as to who added what information, I am pretty sure the linker settings were part of what I contributed (with help from others behind the scenes, of course!).

    Anyway, like I said I have no problem with people suggesting enhancements to what I do.

    The only advantage that the original plan has over Igor's enhancement is that it is slightly easier to use one setting than two, but the benefit pales wshen compared not forcing rebuilds even when no rebuild is needed. Therefore I am happy to agree that this is a better idea.

    So if you are using MSLU and you build it in Visual Studio, feel free to avail yourself of this advice!

     

    This post brought to you by(U+ff55, aka FULLWIDTH LATIN SMALL LETTER U)

  • Sorting it all Out

    The 9.0 instructions for building MFC and the CRT to use MSLU?

    • 1 Comments

    Now if you look over on the side of the blog, you will see an expandable group entitled Rebuilding MFC and the CRT with MSLU, with links under it for MFC/CRT 6.0, MFC/CRT 7.0, MFC/CRT 7.1, and MFC/CRT 8.0.

    Now that Visual Studio 2008 (aka "Orcas") has now been released to manufacturing, people might consider it perfectly reasonable to be looking for the 9.0 instructions as well.

    Helpful MVP Mike (the one behind all of the previous versions of the instructions, with help from Ted W.), sent me a piece of mail about this not too long ago, actually!

    The mail read:

    For a brief moment, I had a thought: maybe I should do a 9.0 version of these instructions

    http://blogs.msdn.com/michkap/articles/478235.aspx

    then I quickly remembered that 9.0 (Visual C++ 2008) doesn't even support Windows 9x.   Something that doesn't even run on 9x under ANSI, should I try to get working under Unicode? No, methinks there is no demand for this one.  Haha, oh well, it was fun while it lasted. 

    You know what? He's right.

    Here is in fact a list of all of the related technologies that really no longer support Win9x:

    • Windows itself (ref)
    • MSLU itself (ref)
    • The 9.0 Visual C++ tools, including MFC and the CRT (ref)

    Given that it is not supported to put the tools on Win9x and that Win9x itself is no longer supported, building instructions for the 9.0 version does seem slightly unrealistic.

    They did show the easiest way ti deqal with a request they did not want to have to work on (in this case integrating MSLU support into the CRT and MFC) -- when in doubt, just stall until the request no longer makes sense!

    The plus side of all of this is that the need for non-Unicode support has been given a swift kick in the ass and it might be time to turn up the heat on better conversion of projects to Unicode in future versions of Visual Studio.... :-)

     

    All of the characters in Unicode have taken off for Grand Cayman for the Christmas holiday weekend
    (they are staying at the Marriott Grand Cayman Beach Hotel in case you are there and are curious at all the characters hanging out by the pool!)

Page 1 of 5 (68 items) 12345