• The Old New Thing

    Restoring symbols to a stack trace originally generated without symbols

    • 10 Comments

    Has this ever happened to you?

    litware!Ordinal3+0x6042
    litware!DllInstall+0x4c90
    litware!DllInstall+0x4b9e
    contoso!DllGetClassObject+0x93c3
    contoso!DllGetClassObject+0x97a9
    contoso!DllGetClassObject+0x967c
    contoso!DllGetClassObject+0x94d7
    contoso!DllGetClassObject+0x25ce
    contoso!DllGetClassObject+0x2f7b
    contoso!DllGetClassObject+0xad55
    contoso!DllGetClassObject+0xaec7
    contoso!DllGetClassObject+0xadf7
    contoso!DllGetClassObject+0x3c00
    contoso!DllGetClassObject+0x3b2a
    contoso!DllGetClassObject+0x462b
    USER32!UserCallWinProcCheckWow+0x13a
    USER32!DispatchMessageWorker+0x1a7
    contoso!DllCanUnloadNow+0x19b6
    contoso!DllGetClassObject+0xeaf2
    contoso+0x1d6c
    litware!LitImportReportProfile+0x11c4
    litware!LitImportReportProfile+0x1897
    litware!LitImportReportProfile+0x1a3b
    KERNEL32!BaseThreadInitThunk+0x18
    ntdll!RtlUserThreadStart+0x1d
    

    Ugh. A stack trace taken without working symbols. (There's no way that Dll­Get­Class­Object is a deeply recursive 60KB function. Just by casual inspection, you know that the symbols are wrong.)

    To see how to fix this, you just have to understand what the debugger does when it has no symbols to work from: It uses the symbols from the exported function table. For every address it wants to resolve, it looks for the nearest exported function whose address is less than or equal to the target value.

    For example, suppose CONTOSO.DLL has the following exported symbols:

    Symbol Offset
    Dll­Get­Class­Object 0x5132
    Dll­Can­Unload­Now 0xFB0B

    Look at it this way: The debugger is given the following information about your module: (Diagram not to scale.)

     
      Dll­Get­Class­Object Dll­Can­Unload­Now

    It needs to assign a function to every byte in the module. In the absence of any better information, it does it like this:

    ??? Dll­Get­Class­Object Dll­Can­Unload­Now

    In words, it assumes that every function begins at the location specified by the export table, and it ends one byte before the start of the next function. The debugger is trying to make the best of a bad situation.

    Suppose your DLL was loaded at 0x10000000, and the debugger needs to generate a symbolic name for the address 0x1000E4F5.

    First, it converts the address into a relative virtual address by subtracting the DLL base address, leaving 0xE4F5.

    Next, it looks to see what function "contains" that address. From the algorithm described above, the debugger concludes that the address 0xE4F5 is "part of" the Dll­Get­Class­Object function, which began at begins at 0x5132. The offset into the function is therefore 0xE4F5 - 0x5132 = 0x93C3, and it is reported in the debugger as contoso!Dll­Get­Class­Object+0x93c3.

    Repeat this exercise for each address that the debugger needs to resolve, and you get the stack trace above.

    Fine, now that you know how the bad symbols were generated, how do you fix it?

    You fix it by undoing what the debugger did, and then redoing it with better symbols.

    You need to find the better symbols. This is not too difficult if you still have a matching binary and symbol file, because you can just load up the binary into the debugger in the style of a dump file. Like Doron, you can then let the debugger do the hard work.

    C:> ntsd -z contoso.dll
    
    ModLoad: 10000000 10030000   contoso.dll
    

    Now you just ask the debugger, "Could you disassemble this function for me?" You give it the broken symbol+offset above. The debugger looks up the symbol, applies the offset, and then looks up the correct symbol when disassembling.

    0:000> u contoso!DllGetClassObject+0x93c3
    contoso!CReportViewer::ActivateReport+0xe9:
    10000e4f5 eb05            jmp     contoso!CReportViewer::ActivateReport+0xf0
    

    Repeat for each broken symbol in the stack trace, and you have yourself a repaired stack trace.

    litware!Ordinal3+0x6042 ← oops
    litware!CViewFrame::SetInitialKeyboardFocus+0x58
    litware!CViewFrame::ActivateViewInFrame+0xf2
    contoso!CReportViewer::ActivateReport+0xe9
    contoso!CReportViewer::LoadReport+0x12c
    contoso!CReportViewer::OnConnectionCreated+0x13f
    contoso!CViewer::OnConnectionEvent+0x7f
    contoso!CConnectionManager::OnConnectionCreated+0x85
    contoso!CReportFactory::BeginCreateConnection+0x87
    contoso!CReportViewer::CreateConnectionForReport+0x20d
    contoso!CViewer::CreateNewConnection+0x87
    contoso!CReportViewer::CreateNewReport+0x213
    contoso!CViewer::OnChangeView+0xec
    contoso!CReportViewer::WndProc+0x9a7
    contoso!CView::s_WndProc+0xf1
    USER32!UserCallWinProcCheckWow+0x13a
    USER32!DispatchMessageWorker+0x1a7
    contoso!CViewer::MessageLoop+0x24e
    contoso!CViewReportTask::RunViewer+0x12
    contoso+0x1d6c ← oops
    litware!CThreadTask::Run+0x40
    litware!CThread::ThreadProc+0xe5
    litware!CThread::s_ThreadProc+0x42
    KERNEL32!BaseThreadInitThunk+0x18
    ntdll!RtlUserThreadStart+0x1d
    

    Oops, our trick doesn't work for that first entry in the stack trace, the one with Ordinal3. What's up with that? There is no function called Ordinal3!

    If your module exports functions by ordinal without a name, then the debugger doesn't know what name to print for the function (since the name was stripped from the module), so it just prints the ordinal number. You will have to go back to your DLL's DEF file to convert the ordinal back to a function name. Or you can dump the exports from the DLL to see what functions match up with what ordinals. (Of course, for that trick to work, you need to have a matching PDB file in the symbol search path.)

    In our example, suppose litware.dll ordinal 3 corresponds to the function Lit­Debug­Report­Profile. We would then ask the debugger

    0:001> u litware!LitDebugReportProfile+0x6042
    litware!CViewFrame::FindInitialFocusControl+0x66:
    1000084f5 33db            xor     ebx,ebx
    

    Okay, that takes care of our first oops. What about the second one?

    In the second case, the address the debugger was asked to generate a symbol for came before the first symbol in the module. In our diagram above, it was in the area marked with question marks. The debugger has absolutely nothing to work with, so it just disassembles as relative to the start of the module.

    To resolve this symbol, you take the offset and add it to the base of the module as it was loaded into the debugger, which was reported in the ModLoad output:

    ModLoad: 10000000 10030000   contoso.dll
    

    If that output scrolled off the screen, you can ask the debugger to show it again with the help of the lmm command.

    0:001>lmm contoso*
    start    end        module name
    10000000 10030000   contoso    (export symbols)       contoso.dll
    

    Once you have the base address, you add the offset back and ask the debugger what's there:

    0:001> u 0x10000000+0x1d6c
    contoso!CViewReportTask::Run+0x102:
    100001d6c 50              push    eax
    

    Okay, now that we patched up all our oopses, we have the full stack trace with symbols:

    litware!CViewFrame::FindInitialFocusControl+0x66
    litware!CViewFrame::SetInitialKeyboardFocus+0x58
    litware!CViewFrame::ActivateViewInFrame+0xf2
    contoso!CReportViewer::ActivateReport+0xe9
    contoso!CReportViewer::LoadReport+0x12c
    contoso!CReportViewer::OnConnectionCreated+0x13f
    contoso!CViewer::OnConnectionEvent+0x7f
    contoso!CConnectionManager::OnConnectionCreated+0x85
    contoso!CReportFactory::BeginCreateConnection+0x87
    contoso!CReportViewer::CreateConnectionForReport+0x20d
    contoso!CViewer::CreateNewConnection+0x87
    contoso!CReportViewer::CreateNewReport+0x213
    contoso!CViewer::OnChangeView+0xec
    contoso!CReportViewer::WndProc+0x9a7
    contoso!CView::s_WndProc+0xf1
    USER32!UserCallWinProcCheckWow+0x13a
    USER32!DispatchMessageWorker+0x1a7
    contoso!CViewer::MessageLoop+0x24e
    contoso!CViewReportTask::RunViewer+0x12
    contoso!CViewReportTask::Run+0x102
    litware!CThreadTask::Run+0x40
    litware!CThread::ThreadProc+0xe5
    litware!CThread::s_ThreadProc+0x42
    KERNEL32!BaseThreadInitThunk+0x18
    ntdll!RtlUserThreadStart+0x1d
    

    Now the fun actually starts: Figuring out why there was a break in CView­Frame::Find­Initial­Focus­Control. Happy debugging!

    Bonus tip: By default, ntsd does not include line numbers when resolving symbols. Type .lines to toggle line number support.

  • The Old New Thing

    The relationship between module resources and resource-derived objects in 32-bit Windows

    • 10 Comments

    Last time, we saw how 16-bit Windows converted resources attached to an EXE or DLL file (which I called module resources for lack of a better term) to user interface resources. As a refresher:

    16-bit Resources
    Resource type Operation Result
    Icon Load­Icon, etc. Reference
    Cursor Load­Cursor, etc. Reference
    Accelerator Load­Accelerator, etc. Reference
    Dialog Create­Dialog, etc. Copy
    Menu Load­Menu, etc. Copy
    Bitmap Load­Bitmap, etc. Copy
    String Load­String Copy
    String Find­Resource Reference

    During the conversion from 16-bit Windows to 32-bit Windows, some of these rules changed. Specifically, icons, cursors, and accelerator tables are no longer references to the resource. Instead, the resource is treated as a template from which the actual user interface resource is constructed.

    32-bit Resources
    Resource type Operation Result
    Icon Load­Icon, etc. Copy*
    Cursor Load­Cursor, etc. Copy*
    Accelerator Load­Accelerator, etc. Copy*
    Dialog Create­Dialog, etc. Copy
    Menu Load­Menu, etc. Copy
    Bitmap Load­Bitmap, etc. Copy
    String Load­String Copy
    String Find­Resource Reference

    Uh-oh, what's up with those asterisks?

    Let's start with accelerator tables. In order to simulate the reference semantics of 16-bit accelerator tables, the copy is cached with a reference count, so that if you ask for the same accelerator table 1000 times, the first request creates a new accelerator table, and the other 999 requests just increment the reference count and return the same handle back. The result is that the window manager emulates reference semantics, but with an initial copy. When the reference count on an accelerator table drops to zero, then the resource is freed.

    Icons and cursors are the same, only weirder.

    If you pass the LR_SHARED flag, then the window manager simulates reference semantics by creating a copy of the icon or cursor the first time it is requested, and all subsequent requests with the LR_SHARED flag return the same handle back again.¹ The Load­Cursor and Load­Icon functions are just wrappers around Load­Image that pass LR_SHARED, so applications written to the old 16-bit API still work the 16-bit way. (Even today, a lot of applications rely on the old 16-bit behavior.)

    If you don't pass the LR_SHARED flag, then you get a brand new copy of the icon or cursor. Since the only way to get this behavior is to call the new-for-Win32 function Load­Image, there is no compatibility issue.

    Based on the above discussion, we can flesh out the table a bit more:

    32-bit Resources
    Resource type Operation Result
    Icon Load­Icon
    Load­Image with LR_SHARED
    Cached copy
    Load­Image without LR_SHARED Copy
    Cursor Load­Cursor
    Load­Image with LR_SHARED
    Cached copy
    Load­Image without LR_SHARED Copy
    Accelerator Load­Accelerator, etc. Cached copy
    Dialog Create­Dialog, etc. Copy
    Menu Load­Menu, etc. Copy
    Bitmap Load­Bitmap, etc. Copy
    String Load­String Copy
    String Find­Resource Reference

    Another way of looking at the above table is to break it into two tables, one for operations that had a 16-bit equivalent, and one for operations that are unique to Win32:

    32-bit Resource Creation Operations with 16-bit Equivalents
    Resource type Operation Result
    Icon Load­Icon Simulated reference
    Cursor Load­Cursor Simulated reference
    Accelerator Load­Accelerator, etc. Simulated reference
    Dialog Create­Dialog, etc. Copy
    Menu Load­Menu, etc. Copy
    Bitmap Load­Bitmap, etc. Copy
    String Load­String Copy
    String Find­Resource Reference


    32-bit Resource Creation Operations Without 16-bit Equivalents
    Resource type Operation Result
    Icon Load­Image with LR_SHARED Simulated reference
    Load­Image without LR_SHARED Copy
    Cursor Load­Image with LR_SHARED Simulated reference
    Load­Image without LR_SHARED Copy

    Now we can answer an old question: "Do icons created from resources depend on the underlying resource?"

    The answer is no, at least not in 32-bit Windows. The bits are extracted from the module resource data and converted into a icon object, and if you passed the LR_SHARED flag, it is added to the cache of previously-created icons.

    ¹ Update: If you read carefully, you'll realize that LR_SHARED stores the results in a cache and pays no attention to the size. The cache is keyed only by the resource module and ID; the size is ignored. This is why MSDN says "Do not use LR_SHARED for images that have nonstandard sizes."

    Suppose you load a resource with LR_SHARED and a nonstandard size. If you are the first person to load that resource, then the nonstandard size gets loaded and put into the cache. The next person to ask for that resource and who asks for a LR_SHARED copy will get the nonstandard-sized resource from the cache regardless of what size they actually wanted.

    Conversely, suppose a standard-size resource is already in the cache. You pass LR_SHARED and a nonstandard size. The cache returns you the original standard-size resource, ignoring the size you requested.

    To avoid this craziness, the rule is that any request for cached resources must use the standard size.

    This requirement wasn't a problem in 16-bit Windows because 16-bit Windows had no way of requesting a resource at a nonstandard size. And since LR_SHARED is a new flag introduced in 32-bit Windows, all code which uses it can be expected to understand the Win32 rules.

  • The Old New Thing

    The financial acumen of sea turtles

    • 10 Comments

    I dreamed that I was attending some sort of "how to be awesome" seminar where the presenter said, among other things, that a sea turtle, when left to thrive undisturbed, amasses $1 million in personal wealth within one year.

  • The Old New Thing

    Why does my radio button group selection get reset each time my window regains activation?

    • 10 Comments
    A customer reported (all incomplete information and red herrings preserved):

    We have an issue related to two radio buttons in a window. The code programmatically checks the second button by sending the BM_SET­CHECK message. We observe that if the user clicks somewhere else on the screen (so that our application loses focus), and then clicks on the taskbar icon to return to our application, the first radio button spontaneously gets selected.

    We watched all the messages in Spy++, and it appears that the radio button is receiving a WM_SET­FOCUS followed by a WM_SET­CHECK.

    Is this by design? If not, what should I be looking for in my code that is causing this erroneous selection change to occur?

    The incomplete information is that the customer didn't say how they created those radio buttons.

    The red herring is that the customer said that they had a problem with their window. This suggested that they were doing a custom window implementation (because if they were using the standard dialog implementation, they would have said dialog).

    But from the symptoms, it's clear that what's most likely happening is that the radio button is created as a BS_AUTO­RADIO­BUTTON. And automatic radio buttons select themselves automatically (hence the name) when they receive focus.

    That explains the message sequence of WM_SET­FOCUS followed by a WM_SET­CHECK: The automatic radio button receives focus, and in response it checks itself.

    Therefore, the next level of investigation is why the first radio button is getting focus when the window is activated.

    If the application window is a custom window, then the place to look is their window's activation and focus code, to see why focus is going to the first radio button instead of the second one. Perhaps it is putting focus on the first radio button temporarily, and then later realizes, "Oh wait, I really meant to put it on the second radio button." The fix would be to get rid of the temporary focus change and go straight to the second radio button.

    If the application window is a standard dialog, then we saw last time that the dialog manager restores focus to the window that had focus last, and that you could mimic the same behavior in your own code.

    It turns out that the customer was indeed using a standard dialog, in which case the problem is that they put the dialog into an inconsistent state: They checked the second radio button but left focus on the first radio button. This is a configuration that exists nowhere in nature, and therefore when the dialog manager tries to recreate it (given its lack of specialized knowledge about specific controls), it can't.

    The fix is to put focus on the second radio button as well as setting the check box. In fact, you can accomplish both by setting the focus to the second radio button (noting that there is a special process for setting focus in a dialog box) since you already are using automatic radio buttons.

    Here's a program that demonstrates the problem:

    // scratch.rc
    
    1 DIALOGEX 32, 32, 160, 38
    STYLE DS_MODALFRAME | DS_SHELLFONT | WS_POPUP | WS_VISIBLE |
          WS_CAPTION | WS_SYSMENU
    CAPTION "Test"
    FONT 9, "MS Shell Dlg"
    BEGIN
    CONTROL "First", 100, "Button",
            WS_GROUP | WS_TABSTOP | BS_AUTORADIOBUTTON, 4,  4, 152, 13
    CONTROL "Second", 101, "Button",BS_AUTORADIOBUTTON, 4, 20, 152, 13
    END
    
    // scratch.cpp
    
    #include <windows.h>
    #include <windowsx.h>
    
    INT_PTR CALLBACK DlgProc(
        HWND hdlg, UINT uMsg, WPARAM wParam, LPARAM lParam)
    {
     switch (uMsg) {
     case WM_INITDIALOG:
      SetFocus(GetDlgItem(hdlg, 100));
      CheckRadioButton(hdlg, 100, 101, 101);
      return FALSE;
     case WM_COMMAND:
      switch (GET_WM_COMMAND_ID(wParam, lParam)) {
      case 100:
      case 101:
        CheckRadioButton(hdlg, 100, 101,
                         GET_WM_COMMAND_ID(wParam, lParam));
        break;
      case IDCANCEL: EndDialog(hdlg, 0); break;
      }
     }
     return FALSE;
    }
    
    int WINAPI WinMain(HINSTANCE hinst, HINSTANCE hinstPrev,
                       LPSTR lpCmdLine, int nShowCmd)
    {
     DialogBox(hinst, MAKEINTRESOURCE(1), nullptr, DlgProc);
     return 0;
    }
    

    Observe that we set focus to the first button but check the second button. When the dialog regains focus, the second button will fire a WM_COMMAND because it thinks it was clicked on, and in response the dialog box moves the selection to the second button.

    The fix here is actually pretty simple: Let the dialog manager handle the initial focus. Just delete the Set­Focus call and return TRUE, which means, "Hey, dialog manager, you do the focus thing, don't worry about me."

    Another fix is to remove the code that updates the radio buttons in response to the WM_COMMAND message. (I.e., get rid of the entire case 100 and case 101 handlers.) Again, just let the dialog manager do the usual thing, and everything will work out just fine.

    It's great when you can fix a bug by deleting code.

  • The Old New Thing

    How do I programmatically create folders like My Pictures if they were manually deleted?

    • 10 Comments

    A corporate customer had a problem with their employees accidentally deleting folders like Videos and Pictures and are looking for a way to restore them, short of blowing away the entire user profile and starting over. They found some techniques on the Internet but they don't always work consistently or completely. What is the recommended way of recreating these missing folders?

    It turns out that the customer was asking a question that I answered many years ago, but looking at it from the other side.

    To recreate a folder, call SHGet­Folder­Path with the flag CSIDL_FLAG_CREATE, or call SHGet­Special­Folder­Path and pass fCreate = TRUE.

    If you are targeting Windows Vista or higher, the known-folder equivalent is calling SHGet­Known­Folder­Path, SHGet­Known­Folder­ID­List, or SHGet­Known­Folder­Item with the KF_FLAG_CREATE flag.

    (There is a CSIDL-to-KF conversion table in MSDN.)

  • The Old New Thing

    If I duplicate a handle, can I keep using the duplicate after closing the original?

    • 10 Comments

    A customer asked whether it was okay to use a duplicated handle even after the original handle was closed.

    Yes. That's sort of why you would duplicate it.

    Duplicating a handle creates a second handle which refers to the same underlying object as the original. Once that's done, the two handles are completely equivalent. There's no way to know which was the original and which is the duplicate. Either handle can be used to access the underlying object, and the underlying object is not torn down until all handles to it have been closed.

    One tricky bit here is that since you have two ways to refer to the same thing, changes made to the object via one handle will be reflected when observed through the other handle. That's because the changes you're making are to the object itself, not to the handle. For example, if you duplicate the handle to an event, then you can set the event via either handle.

    That may all sound obvious, but one thing to watch out for is the case of file handles: The current file position is a property of the file object, not the handle. Say you duplicate a file handle and give the original to one component and the duplicate to another. Now, when either component reads from or writes to the file, it's going to change the current position of the file object, and consequently may confuse the other component (who may not have expected the current position to be changing). Also, if the underlying file is a synchronous file handle, the file operations on the underlying file will be synchronized. If one component starts a read, the other component won't be able to access the file object until that read completes.

    If you want to create a second handle to a file that has its own file pointer and is not synchronized against the first file handle, you can use the Re­Open­File function to create a second file object with its own synchronization and its own file position, but which refers to the same underlying file.

    (Don't forget to get your sharing modes right! The second file object's access and sharing modes must be compatible with access and sharing modes of the original file object. Otherwise the call will fail with a sharing violation.)

  • The Old New Thing

    Marshaling won't get in your way if it isn't needed

    • 10 Comments

    I left an exercise at the end of last week's article: "Why is the RPC_X_NULL_REF_POINTER error raised only sometimes?"

    COM subscribes to the principle that if no marshaling is needed, then an interface pointer points directly at the object with no COM code in between.

    If the current thread is running in a single-threaded apartment, and it creates a COM object with thread affinity (also known as an "apartment-model object"; yes, the name is confusing), then the thread gets a pointer directly to the object. When you call p->Query­Interface(), you are calling directly into the Query­Interface implementation provided by the object.

    This principle has its pluses and minuses.

    People concerned with high performance pretty much insist that COM stay out of the way and get involved only when necessary. They consider it a plus that if there is no marshaling involved, then all pointers are direct pointers, and calls go straight to the target object without a single instruction of COM-provided code getting in the way.

    One downside of this is that every object is responsible for its own compatibility hacks. If there are bugs in the implementation of IUnknown::Query­Interface, then each object is on its own for working around them. There is no opportunity for the system to enforce correct behavior because there is no system code running. Each object becomes responsible for its own enforcement.

    Therefore, the answer to "Why is the RPC_X_NULL_REF_POINTER error raised only sometimes?" is "The marshaler is involved only sometimes."

    If the object being called belongs to the same apartment as the thread that is calling into it, then there is no marshaler, and the call goes directly to the object. Since there is no marshaler, the marshaler isn't around to enforce marshaling rules. It's up to the object to enforce marshaling rules, and if the object chooses not to, then you get into the cases where a method call works when the object is unmarshaled and fails when the object is marshaled.

  • The Old New Thing

    Using thread pool cleanup groups to clean up many things at once

    • 10 Comments

    Today's Little Program demonstrates thread pool cleanup groups. When you associate a thread pool item with a cleanup group, you can perform bulk operations on the group. That can save you a lot of bookkeeping.

    Remember that Little Programs do little to no error checking.

    #include <windows.h>
    #include <stdio.h> // horrors! Mixing stdio and C++!
    
    VOID
    CALLBACK
    Callback(
        PTP_CALLBACK_INSTANCE Instance,
        PVOID                 /* Parameter */,
        PTP_TIMER             /* Timer */
        )
    {
        // Say what time the callback ran.
        printf("%p at %d\n", Instance, GetTickCount());
    }
    
    int
    __cdecl
    main(int, char**)
    {
        // Create an environment that we use for our timers.
        TP_CALLBACK_ENVIRON environ;
        InitializeThreadpoolEnvironment(&environ);
    
        // Create a thread pool cleanup group and associate it
        // with the environment.
        auto cleanupGroup = CreateThreadpoolCleanupGroup();
        SetThreadpoolCallbackCleanupGroup(&environ,
                                          cleanupGroup,
                                          nullptr);
    
        // Say what time we started
        printf("Start: %d\n", GetTickCount());
    
        // Ask for a one-second delay
        LARGE_INTEGER dueTime;
        dueTime.QuadPart = -10000LL * 1000; // one second
        FILETIME ftDue = { dueTime.LowPart, dueTime.HighPart };
    
        // Create ten timers to run after one second.
        for (int i = 0; i < 10; i++) {
            auto timer = CreateThreadpoolTimer(Callback,
                                               nullptr,
                                               &environ);
            SetThreadpoolTimer(timer, &ftDue, 0, 500);
        }
    
        // Wait a while - the timers will run.
        Sleep(1500);
    
        // Clean up the group.
        CloseThreadpoolCleanupGroupMembers(cleanupGroup,
                                           FALSE,
                                           nullptr);
    
        // Close the group.
        CloseThreadpoolCleanupGroup(cleanupGroup);
    }
    

    There is some trickiness in building the FILETIME structure to specify that we want to run after a one-second delay. First, the value is negative to indicate a relative timeout. Second, we cannot treat the FILETIME as an __int64, so we use a LARGE_INTEGER as an intermediary.

    When we create the ten timers, we associate them with the environment, which is in turn associated with the cleanup group. This puts all the timers into the cleanup group, which is a good thing, because we didn't save the timer handles!

    When it's time to clean up the timers, we use Close­Thread­pool­Cleanup­Group­Members, which does the work of closing each individual timer in the cleanup group. This saves us the trouble of having to remember all the timers ourselves and manually closing each one.

    For our next trick, comment out the Sleep(1500); and run the program again. This time, the timers don't run at all. That's because we closed them before they reached their due time. We let the cleanup group do the bookkeeping for us.

  • The Old New Thing

    If you wonder why a function can't be found, one thing to check is whether the function exists in the first place

    • 10 Comments

    One of my colleagues was frustrated trying to get some code to build. "Is there something strange about linking variadic functions? Because I keep getting an unresolved external error for the function, but if I move the function definition to the declaration point, then everything works fine."

    // blahblah.h
    
    ... other declarations ...
    
    void LogWidget(Widget* widget, const char* format, ...);
    
    ...
    
    // widgetstuff.cpp
    ...
    #include "blahblah.h"
    ...
    
    // some code that calls LogWidget
    void foo(Widget* widget)
    {
     LogWidget(widget, "starting foo");
     ...
    }
    
    // and then near the end of the file
    
    void LogWidget(Widget* widget, const char* format, ...)
    {
        ... implementation ...
    }
    
    ...
    

    "With the above code, the linker complains that Log­Widget cannot be found. But if I move the implementation of Log­Widget to the top of the file, then everything builds fine."

    // widgetstuff.cpp
    ...
    #include "blahblah.h"
    ...
    
    // move the code up here
    void LogWidget(Widget* widget, const char* format, ...)
    {
        ... implementation ...
    }
    
    // some code that calls LogWidget
    void foo(Widget* widget)
    {
     LogWidget(widget, "starting foo");
     ...
    }
    
    ...
    

    "I tried putting an explicit calling convention in the declaration, I tried using extern "C", nothing seems to help."

    We looked at the resulting object file and observed that in the case where the error occurred, there was an external reference to Log­Widget but no definition. I asked, "Is the definition of the function #ifdef'd out by mistake? You can use this technique to find out."

    That was indeed the problem. The definition of the function was inside some sort of #ifdef that prevented it from being compiled.

    Sometimes, the reason a function cannot be found is that it doesn't exist in the first place.

  • The Old New Thing

    More notes on calculating constants in SSE registers

    • 10 Comments

    A few weeks ago I noted some tricks for creating special bit patterns in all lanes, but I forgot to cover the case where you treat the 128-bit register as one giant lane: Setting all of the least significant N bits or all of the most significant N bits.

    This is a variation of the trick for setting a bit pattern in all lanes, but the catch is that the pslldq instruction shifts by bytes, not bits.

    We'll assume that N is not a multiple of eight, because if it were a multiple of eight, then the pslldq or psrldq instruction does the trick (after using pcmpeqd to fill the register with ones).

    One case is if N ≤ 64. This is relatively easy because we can build the value by first building the desired value in both 64-bit lanes, and then finishing with a big pslldq or psrldq to clear the lane we don't like.

    ; set the bottom N bits, where N ≤ 64
    pcmpeqd xmm0, xmm0 ; FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    unsigned shift right
    64 − N bits
    unsigned shift right
    64 − N bits
    psrlq   xmm0, 64 - N ; 0000 0000 0FFF FFFF 0000 0000 0FFF FFFF
    unsigned shift right 64 bits
    psrldq  xmm0, 8 ; 0000 0000 0000 0000 0000 0000 0FFF FFFF
     
    ; set the top N bits, where N ≤ 64
    pcmpeqd xmm0, xmm0 ; FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    unsigned shift left
    64 − N bits
    unsigned shift left
    64 − N bits
    psllq   xmm0, 64 - N ; FFFF FFF0 0000 0000 FFFF FFF0 0000 0000
    unsigned shift left 64 bits
    pslldq  xmm0, 8 ; FFFF FFF0 0000 0000 0000 0000 0000 0000

    If N ≥ 80, then we shift in zeroes into the top and bottom half, but then use a shuffle to patch up the half that needs to stay all-ones.

    ; set the bottom N bits, where N ≥ 80
    pcmpeqd xmm0, xmm0 ; FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    unsigned shift right
    128 − N bits
    unsigned shift right
    128 − N bits
    psrlq   xmm0, 128 - N ; 0000 0000 0FFF FFFF 0000 0000 0FFF FFFF
    copy shuffle
    pshuflw xmm0, _MM_SHUFFLE(0, 0, 0, 0) ; 0000 0000 0FFF FFFF FFFF FFFF FFFF FFFF
     
    ; set the top N bits, where N ≥ 80
    pcmpeqd xmm0, xmm0 ; FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    unsigned shift left
    128 − N bits
    unsigned shift left
    128 − N bits
    psllq   xmm0, 128 - N ; FFFF FFF0 0000 0000 FFFF FFF0 0000 0000
    shuffle copy
    pshufhw xmm0, _MM_SHUFFLE(3, 3, 3, 3) ; FFFF FFFF FFFF FFFF FFFF FFF0 0000 0000

    We have N ≥ 80, which means that 128 - N ≤ 48, which means that there are at least 16 bits of ones left in low-order bits after we shift right. We then use a 4×16-bit shuffle to copy those known-all-ones 16 bits into the other lanes of the lower half. (A similar argument applies to setting the top bits.)

    This leaves 64 < N < 80. That uses a different trick:

    ; set the bottom N bits, where N ≤ 120
    pcmpeqd xmm0, xmm0 ; FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    unsigned shift right 8 bits
    psrldq  xmm0, 1 ; 00FF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    signed shift right
    120 − N bits
    signed shift right
    120 − N bits
    psrad  xmm0, 120 - N ; 0000 00FF FFFF FFFF FFFF FFFF FFFF FFFF

    The sneaky trick here is that we use a signed shift in order to preserve the bottom half. Unfortunately, there is no corresponding left shift that shifts in ones, so the best I can come up with is four instructions:

    ; set the top N bits, where 64 ≤ N ≤ 96
    pcmpeqd xmm0, xmm0 ; FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
    unsigned shift left
    96 − N bits
    unsigned shift left
    96 − N bits
    psllq   xmm0, 96 - N ; FFFF FFFF FFF0 0000 FFFF FFFF FFF0 0000
    shuffle
    pshufd  xmm0, _MM_SHUFFLE(3, 3, 1, 0) ; FFFF FFFF FFFF FFFF FFFF FFFF FFF0 0000
    unsigned shift left 32 bits
    pslldq  xmm0, 4 ; FFFF FFFF FFFF FFFF FFFF FF00 0000 0000

    We view the 128-bit register as four 32-bit lanes. split the shift into two steps. First, we fill Lane 0 with the value we ultimately want in Lane 1, then we patch up the damage we did to Lane 2, then we do a shift the 128-bit value left 32 places to slide the value into position and zero-fill Lane 0.

    Note that a lot of the ranges of N overlap, so you often have a choice of solutions. There are other three-instruction solutions I didn't bother presenting here. The only one I couldn't find a three-instruction solution for was setting the top N bits where 64 < N < 80.

    If you find a three-instruction solution for this last case, share it in the comments.

Page 371 of 457 (4,568 items) «369370371372373»