• The Old New Thing

    The wonderful world of shell bind context strings

    • 6 Comments

    Some time ago, we saw how the IBindCtx parameter to IShell­Folder::Parse­Display­Name can be used to modify how the parse takes place. More generally, the IBindCtx parameter to a function is a catch-all miscellaneous options parameter.

    The interesting part of the bind context is all the stuff that has been added to it via the IBindCtx::Register­Object­Param method. You can attach arbitrary objects to the bind context, using a string to identify each one.

    Some bind context parameters are like Boolean parameters that simply change some default behavior of the operation. For these operations, the object that is associated with the bind context string is not important; the important thing is that there is something associated with it. Traditionally, you just connect a dummy object that implements just IUnknown.

    In the most general case, the object associated with a bind context string implements some operation-specific interface. For example, the STR_BIND_DELEGATE_CREATE_OBJECT bind context string expects you to associate an object that implements the ICreate­Object interface, because the whole point of the STR_BIND_DELEGATE_CREATE_OBJECT bind context string is to say, "Hey, I want to create objects in a nonstandard way," so you need to tell it what that nonstandard way is.

    At the other extreme, you may have a chunk of data that you want to associate with the bind context string. Since bind contexts want to associate objects, you need to wrap the data inside a COM object. We saw this earlier when we had to create an object that implemented the IFile­System­Bind­Data interface in order to babysit a WIN32_FIND_DATA structure.

    Rather than having to create a separate interface for each data type (hello, IObject­With­Folder­Enum­Mode), and rather than going to the opposite extreme and just using IStream to pass arbitrary unstructured data, the shell folks decided to take a middle-ground approach: Use a common interface that still has a modicum of type safety, namely, IProperty­Bag. Other nice things about this approach is that there are a lot of pre-existing helper functions for property bags and property variants. Also, you need to attach only one object instead of a whole bunch of tiny little ones.

    Under this new regime (which took hold in Windows 8), the bind context has an associated property bag, and you put your data in that property bag.

    In pictures:

      IBindCtx::Register­Object­Param IProperty­Bag::Write
    IBindCtx Boolean parameter IUnknown
    Interface parameter object with custom interface
    STR_PROPERTY­BAG_PARAM IPropertyBag Property bag DWORD parameter VT_UI4
    Property bag string parameter VT_BSTR
    Property bag Boolean parameter VT_BOOL

    If you want a Boolean-style parameter to be true, then set it in the bind context with a dummy object that implements IUnknown. If you want a Boolean-style parameter to false, then omit it from the bind context entirely.

    To set an interface-style parameter, set it in the bind context with an object that implements the desired interface.

    To set a property bag-based parameter, set it in the property bag with the appropriate variant type.

    Here are the bind context strings defined up through Windows 8.1 and the way you set them into the bind context.

    Bind context string Model Operation
    STR_AVOID_DRIVE_RESTRICTION_POLICY Boolean Binding
    STR_BIND_DELEGATE_CREATE_OBJECT Interface ICreateObject Binding
    STR_BIND_FOLDER_ENUM_MODE Interface IObjectWith­FolderEnumMode Parsing
    STR_BIND_FOLDERS_READ_ONLY Boolean Parsing
    STR_BIND_FORCE_FOLDER_SHORTCUT_RESOLVE Boolean Binding
    STR_DONT_PARSE_RELATIVE Boolean Parsing
    STR_DONT_RESOLVE_LINK Boolean Binding
    STR_ENUM_ITEMS_FLAGS Property bag: VT_UI4 Binding for enumeration
    STR_FILE_SYS_FIND_DATA Interface IFileSys­BindData or IFileSys­BindData2 Parsing
    STR_FILE_SYS_BIND_DATA_WIN7_FORMAT Boolean Parsing
    STR_GET_ASYNC_HANDLER Boolean GetUIObjectOf
    STR_GPS_BEST­EFFORT Boolean Binding for IProperty­Store
    STR_GPS_DELAY­CREATION Boolean Binding for IProperty­Store
    STR_GPS_FAST­PROPERTIES­ONLY Boolean Binding for IProperty­Store
    STR_GPS_HANDLER­PROPERTIES­ONLY Boolean Binding for IProperty­Store
    STR_GPS_NO_OPLOCK Boolean Binding for IProperty­Store
    STR_GPS_OPEN­SLOW­ITEM Boolean Binding for IProperty­Store
    STR_IFILTER_FORCE_TEXT_FILTER_FALLBACK Boolean Binding for IFilter
    STR_IFILTER_LOAD_DEFINED_FILTER Boolean Binding for IFilter
    STR_INTERNAL_NAVIGATE Boolean Loading history
    STR_INTERNET­FOLDER_PARSE_ONLY_URLMON_BINDABLE Boolean Parsing
    STR_ITEM_CACHE_CONTEXT Interface IBindCtx Parsing and initiailzing
    STR_NO_VALIDATE_FILE­NAME_CHARS Boolean Parsing
    STR_PARSE_ALLOW_INTERNET_SHELL_FOLDERS Boolean Parsing
    STR_PARSE_AND_CREATE_ITEM Interface IParse­And­Create­Item Parsing
    STR_PARSE_DONT_REQUIRE_VALIDATED_URLS Boolean Parsing
    STR_PARSE_EXPLICIT_ASSOCIATION_SUCCESSFUL Property bag: VT_BOOL Parsing
    STR_PARSE_PARTIAL_IDLIST Interface IShell­Item Parsing
    STR_PARSE_PREFER_FOLDER_BROWSING Boolean Parsing
    STR_PARSE_PREFER_WEB_BROWSING Boolean Parsing
    STR_PARSE_PROPERTY­STORE Interface IProperty­Bag Parsing
    STR_PARSE_SHELL_PROTOCOL_TO_FILE_OBJECTS Boolean Parsing
    STR_PARSE_SHOW_NET_DIAGNOSTICS_UI Boolean Parsing
    STR_PARSE_SKIP_NET_CACHE Boolean Parsing
    STR_PARSE_TRANSLATE_ALIASES Boolean Parsing
    STR_PARSE_WITH_EXPLICIT_ASSOCAPP Property bag: VT_BSTR Parsing
    STR_PARSE_WITH_EXPLICIT_PROGID Property bag: VT_BSTR Parsing
    STR_PARSE_WITH_PROPERTIES Interface IProperty­Store Parsing
    STR_PROPERTYBAG_PARAM Interface IProperty­Bag holds property bag parameters
    STR_SKIP_BINDING_CLSID Interface IPersist Parsing and binding

    There are some oddities in the above table.

    • All of the STR_GPS_* values would be more conveniently expressed as a single VT_UI4 property bag-based value. (Exercise: Why isn't it?)
    • The STR_ITEM_CACHE_CONTEXT bind context parameter is itself another bind context! The idea here is that you, the caller, are enabling caching during the parse, and the inner bind context acts as the cache.
    • The STR_PARSE_EXPLICIT_ASSOCIATION_SUCCESSFUL value is unusual in that it is something set by the parser and passed back to the caller.
    • As we have been discussing, STR_PROPERTY­BAG_PARAM is a bind context string that doesn't mean anything on its own. Rather, it provides a property bag into which more parameters can be stored.

    Next time, I'll write some helper functions to make all this slightly more manageable.

  • The Old New Thing

    Why does my synchronous overlapped ReadFile return FALSE when the end of the file is reached?

    • 16 Comments

    A customer reported that the behavior of Read­File was not what they were expecting.

    We have a synchronous file handle (not created with FILE_FLAG_OVERLAPPED), but we issue reads against it with an OVERLAPPED structure. We find that when we read past the end of the file, the Read­File returns FALSE even though the documentation says it should return TRUE.

    They were kind enough to include a simple program that demonstrates the problem.

    #include <windows.h>
    
    int __cdecl wmain(int, wchar_t **)
    {
     // Create a zero-length file. This succeeds.
     HANDLE h = CreateFileW(L"test", GENERIC_READ | GENERIC_WRITE,
                   0, nullptr, CREATE_ALWAYS,
                   FILE_ATTRIBUTE_NORMAL, nullptr);
    
     // Read past EOF.
     char buffer[10];
     DWORD cb;
     OVERLAPPED o = { 0 };
     ReadFile(h, buffer, 10, &cb, &o); // returns FALSE
     GetLastError(); // returns ERROR_HANDLE_EOF
    
     return 0;
    }
    

    The customer quoted this section from The documentation for Read­File:

    Considerations for working with synchronous file handles:

    • If lpOverlapped is NULL, the read operation starts at the current file position and Read­File does not return until the oepration is complete, and the system updates the file pointer before Read­File returns.
    • If lpOverlapped is not NULL, the read operation starts at the offset that is specified in the OVERLAPPED structure and Read­File does not return until the read operation is complete. The system updates the OVERLAPPED offset before Read­File returns.
    • When a synchronous read operation reads the end of a file, Read­File returns TRUE and sets *lpNumberOfBytesRead to zero.

    and then added

    According to the third bullet point, the Read­File should return TRUE, but in practice it returns FALSE and the error code is ERROR_HANDLE_EOF.

    The problem here is that there are two concepts here, and they confusingly both use the word synchronous.

    • A synchronous file handle is a handle opened without FILE_FLAG_OVERLAPPED. All I/O to a synchronous file handle is serialized and synchronous.
    • A synchronous I/O operation is an I/O issued with lpOverlapped == NULL.

    The sample program issues an asynchronous read against a synchronous handle. The third bullet point applies only to synchronous reads.

    To reduce confusion, the documentation would have been clearer if it hadn't switched terminology midstream.

    • If lpOverlapped is NULL, the read operation starts at the current file position and Read­File does not return until the oepration is complete, and the system updates the file pointer before Read­File returns.
    • If lpOverlapped is not NULL, the read operation starts at the offset that is specified in the OVERLAPPED structure and Read­File does not return until the read operation is complete. The system updates the OVERLAPPED offset before Read­File returns.
    • If lpOverlapped is NULL and the read operation reads the end of a file, Read­File returns TRUE and sets *lpNumberOfBytesRead to zero.

    We asked what the customer was doing that caused them to trip over this confusion in the documentation.

    The customer's original code opened a file (synchronously) and read from it (synchronously). The customer is parallelizing the computation in a way that will read that single file from multiple threads. A single file pointer is therefore not suitable, because different threads will want to read from different positions.

    One idea would be to have each thread call Create­File so that each handle has its own file position. Unfortunately, this won't work for the customer because the sharing mode on the file handle denies read sharing.

    The solution they came up with was to open the file synchronously (without FILE_FLAG_OVERLAPPED) but to read asynchronously (by using an OVERLAPPED structure). The OVERLAPPED structure lets you specify where you want to read from, so multiple threads can issue reads against the file position they want.

    This solution works, but the customer is concerned because this hybrid model is not well-documented in MSDN. They found a blog entry that discusses it, but even that blog entry does not discuss what happens in the multithreaded case.) In particular, they are seeing that the end-of-file behavior acts according to asynchronous rather than synchronous rules.

    Any advice you have on how we can pursue this model would be appreciated. Another concern is that since we do not set the hEvent in the OVERLAPPED structure, the file handle itself is used as the signal that I/O has completed, and this will cause problems if multiple I/O's are active simultaneously.

    The problem is that the customer confused the two senses of synchronous, one when applied to files and one when applied to I/O operations. Since they opened a synchronous file handle, all I/O operations are serialized and execute synchronously. Passing an OVERLAPPED structure issues an asynchronous I/O, but since the underlying handle is synchronous, the I/O is serialized and synchronous. The customer's code therefore is not actually performing I/O asynchronously; its requests for asynchronous I/O is overridden by the fact that the underlying handle is synchronous.

    The hybrid model doesn't actually realize any gains of asynchronous I/O. The use of the OVERLAPPED structure merely provides the convenience of combining the seek and read operations into a single call. Since the benefit is rather meager, the hybrid model is not commonly used, and consequently it is not covered in depth in the documentation. (The facts are still there, but there is relatively little discussion and elaboration.)

    Based on this feedback, the customer considered switching to using an asynchronous file handle and setting the hEvent in the OVERLAPPED structure so that each thread can wait for its specific I/O to complete. In the end, however, they decided to stick with the hybrid model because switching to an asynchronous handle was too disruptive to their code base. They are satisfied with the OVERLAPPED technique that lets them perform the equivalent of an atomic Set­File­Pointer + Read­File (even if the I/O is synchronous and serialized).

  • The Old New Thing

    Why does the copy dialog give me the incorrect total size of the files being copied?

    • 31 Comments

    If you try to copy a bunch of files to a drive that doesn't have enough available space, you get an error message like this:

    1 Interrupted Action

    There is not enough space on Removable Disk (D:). You need an additional 1.50 GB to copy these files.

    ▭  Removable Disk (D:)
    Space free: 2.50 GB
    Total size: 14.9 GB
    Try again Cancel

    "But wait," you say. "I'm only copying 5GB of data. Why does it say Total size: 14.9 GB?"

    This is a case of information being presented out of context and resulting in mass confusion.

    Suppose you saw the information like this:

    Computer
    ◢ Hard Disk Drives (1)   
     
    ▭  Windows (C:)
    Space free: 31.5 GB
    Total size: 118 GB
    ◢ Drives with Removable Storage (1)   
     
    ▭  Removable Disk (D:)
    Space free: 2.50 GB
    Total size: 14.9 GB

    In this presentation, it is clear that Total size refers to the total size of the drive itself.

    So the original dialog is not saying that the total size of data being copied is 14.49 GB. It's trying to say that the total size of the removable disk is 14.9 GB.

    Mind you, the presentation is very confusing since the information about the removable disk is presented without any introductory text. It's just plopped there on the dialog without so much as a hello.

    I'm not sure how I would fix this. Maybe reordering the text elements would help.

    1 Interrupted Action

    There is not enough space on Removable Disk (D:).

    ▭  Removable Disk (D:)
    Space free: 2.50 GB
    Total size: 14.9 GB

    You need an additional 1.50 GB to copy these files.

    Try again Cancel

    However, the design of the dialog may not allow the information tile to be inserted into the middle of the paragraph. It might be restricted to a layout where you can have text, followed by an information tile, followed by buttons. In that case, maybe it could go

    1 Interrupted Action

    You need an additional 1.50 GB to copy these files. There is not enough space on Removable Disk (D:).

    ▭  Removable Disk (D:)
    Space free: 2.50 GB
    Total size: 14.9 GB
    Try again Cancel

    But like I said, I'm not sure about this.

  • The Old New Thing

    Why does Outlook use a semicolon to separate multiple recipients by default?

    • 35 Comments

    Microsoft Outlook by default uses a semicolon to separate multiple recipients. You can change this to a comma, but why is the semicolon the default?

    Microsoft Outlook was originally positioned as a business product, and many businesses complained that the use of a comma as a separator created havoc because they have a policy of setting names in the address book as "Last, First".

    In 2000, the Outlook folks tried to change the default, but the outcry from corporations made them go back to having the semicolon be the default separator.

    Besides, there are a lot of people who have commas in their names, such as Martin Luther King, Jr.

  • The Old New Thing

    Why does a single integer assignment statement consume all of my CPU?

    • 21 Comments

    Here's a C++ class inspired by actual events. (Yes, the certificate on that Web site is broken.) It is somebody's attempt to create a generic value type, similar to VARIANT.

    class Value
    {
    public:
     Value(Type type) : m_type(V_UNDEFINED) { }
    
     Type GetType() const { return m_type; }
     void SetType(Type type) { m_type = type; }
    
     int32_t GetInt32() const
     {
      assert(GetType() == V_INT32);
      return *reinterpret_cast<const int32_t *>(m_data);
     }
    
     void SetInt32(int32_t value)
     {
      assert(GetType() == V_INT32);
      *reinterpret_cast<int32_t *>(m_data) = value;
     }
    
     // GetChar, SetChar, GetInt64, SetInt64, etc.
    
    private:
     char m_data[sizeof(int64_t)];
     char m_type;
    };
    
    ...
    
    Value CalculateTheValue()
    {
     int32_t total;
     // ... a bunch of computation ...
    
     Value result;
     result.SetType(V_INT32);
     result.SetInt32(total);
     return result;
    }
    

    Profiling showed that over 80% of the time spent by Calculate­The­Value was inside the Set­Int32 method call, in particular on the line

      *reinterpret_cast<int32_t *>(m_data) = value;
    

    Why does it take so much time to store an integer to memory, dwarfing the actual computation to calculate that integer?

    Alignment.

    Observe that the underlying data for the Value class is declared as a bunch of chars. Since a char is just a byte, it has no alignment restrictions. On the other hand, data types like int32_t typically do have alignment restrictions. For example, accessing a 32-bit value is usually more efficient if the value is stored in memory starting at a multiple of 4.

    How much more efficient depends on the processor and the data type.

    Of the processors that allow unaligned memory access, the penalty can be zero, or only 10% or maybe 100%.

    Many processor architectures are less forgiving of misaligned data access and raise an alignment exception if you break the rules. When such an exception occurs, the operating system might choose to terminate the application. Or the operating system may choose to emulate the instruction and fix up the misaligned access. The program runs much slower, but at least it still runs. (In Windows, the decision how to respond to the alignment exception depends on whether the process asked for alignment faults to be forgiven. See SEM_NO­ALIGNMENT­FAULT­EXCEPT.)

    It appears that the original program is in the last case: An alignment exception occurred, and the operating system handled it by manually reading the four bytes from m_data[0] through m_data[4] and assembling them into a 32-bit value, then resuming execution of the original program.

    Dispatching the exception, parsing out the faulting instruction, emulating it, then resuming execution. That is all very slow. Probably several thousand instruction cycles. This can easily dwarf the actual computation performed by Calculate­The­Value.

    Okay, but why is the result variable unaligned?

    Since, as we noted a while back, the way the Value class is defined requires only byte alignment, the compiler is not constrained to align it in any particular way. If there were a int16_t local variable in the Calculate­The­Value function, the compiler might choose to arrange its stack frame like this:

    • Start at an aligned address X.
    • Put int32_t total at X+0 through X+3.
    • Put int16_t whatever at X+4 through X+5.
    • Put Value result at X+6 through X+22.

    Since X is a multiple of 4, X+6 is not a multiple of 4, so the m_data member is misaligned and incurs an alignment fault at every access.

    What's more, since the Value class has an odd number of total bytes, if you create an array of Values, you are guaranteed that three quarters of the elements will be misaligned.

    The solution is to fix the declaration of the Value class so that the alignment requirements are made visible to the compiler. Instead of jamming all the data into a byte blob, use a discriminated union. That is, after all, what you are trying to emulate in the first place.

    class Value
    {
    public:
     Value(Type type) : m_type(V_UNDEFINED) { }
    
     Type GetType() const { return m_type; }
     void SetType(Type type) { m_type = type; }
    
     int32_t GetInt32() const
     {
      assert(GetType() == V_INT32);
      return m_data.m_int32;
     }
    
     void SetInt32(int32_t value)
     {
      assert(GetType() == V_INT32);
      m_data.m_int32 = value;
     }
    
     // GetChar, SetChar, GetInt64, SetInt64, etc.
    
    private:
     union
     {
      char    m_char;
      int32_t m_int32;
      int64_t m_int64;
      // etc.
     } m_data;
     char m_type;
    };
    

    Exercise: One guess as to the cause of the problem is that the assignment statement is incurring paging. Explain why this is almost certainly not the reason.

    Bonus chatter: I'm ignoring RVO here. If you are smart enough to understand RVO, you should also be smart enough to see that RVO does not affect the underlying analysis. It just shifts the address calculation to the caller.

  • The Old New Thing

    Why does CreateFile take a long time on a volume handle?

    • 32 Comments

    A customer reported that on Windows XP, a call to Create­File was taking a really, really long time if it was performed immediately after a large file copy. They were kind enough to include a demonstration program:

    #include <windows.h>
    
    int main(int argc, char **argv)
    {
     HANDLE h = CreateFile("\\\\.\\D:",
                           GENERIC_READ | GENERIC_WRITE,
                           FILE_SHARE_WRITE | FILE_SHARE_READ,
                           NULL,
                           OPEN_EXISTING,
                           FILE_ATTRIBUTE_NORMAL,
                           NULL);
     Sleep(5000);
     return 0;
    }
    

    If this program is run on its own, the Create­File completes quickly. But if you copy 1.7GB of data immediately before running the program, then Create­File takes longer. The customer would like to know the reason for this issue and whether there is a way to avoid it.

    The reason is that you just copied a lot of data, so there is a lot of dirty data in the disk cache that is waiting to get flushed out. And when you create the volume handle, Windows needs to flush out all that data so that the volume handle sees a consistent view of the volume. Flushing out 1.7GB of data can take a while.

    There is no way to avoid this problem because the speed of data transfer to the drive is limited by the drive hardware. It will take N seconds to transfer 1.7GB of data, so the time between the start of the file copy operation and the successful opening of the volume handle will be N seconds. If you want the Create­File to go faster, you could do a Flush­File­Buffers on the file being copied so that the cost of writing the data gets charged to the copy operation rather than the Create­File, but that's just creative accounting. You didn't actually make any money; you just moved it around.

    Now, a lot of programs open a volume handle but don't actually read from it or write to it, such as the sample program above. Therefore, newer versions of Windows (I don't know exactly whether it was Windows Vista or Windows 7) defer the flush until somebody actually tries to use the handle for reading or writing. So at least for the sample program above, the Create­File will complete quickly. However, the first read or write operation will be slow.

    Again, the total time doesn't change. All that changes is where the cost of the flush is incurred.

  • The Old New Thing

    Where can I find the standard asynchronous stream?

    • 8 Comments

    In the documentation for XmlLite, one of the features called out is that XmlLite is a non-blocking parser. If the input stream returns E_PENDING, then XmlLite propagates that status to its caller, and a subsequent request to XmlLite to parse will resume where it left off.

    That documentation calls out two scenarios in which this can happen, the second of which is

    2. The input Stream is a standard asynchronous stream. The E_PENDING HRESULT may be raised when the data is temporarily unavailable on the network. In this case, you need to try again later in a callback or after some interval of time.

    A customer was kind of confused by this explanation. "Where do I get a standard asynchronous stream so I can use it in scenario 2?"

    The documentation here is trying to be helpful by expanding on the original statement that XmlLite is a non-blocking parser and providing examples of how you can take advantage of this non-blocking behavior. The normative statement is the one that says, "XmlLite propagates the E_PENDING from the input stream to its caller, and a subsequent request to read data from the XmlLite parser will resume where it left off." The rest is informational, but it seems that the informational text was more confusing than helpful.

    The informational text is trying to say, "Here are some examples where you can take advantage of this behavior." The first scenario is an example where you provided an IStream that returns E_PENDING when it wants to force the XmlLite parser to stop parsing. You might do this, for example, if you have out-of-band data in your XML stream. The stream would return E_PENDING when it encounters the out-of-band data, and this causes the XmlLite parser to stop parsing and return E_PENDING. You can then process the out-of-band data, and then when you are ready to resume parsing, you reissue the call that returned E_PENDING so the parser can resume where it left off.

    The second scenario is an example where you provided an IStream that returns E_PENDING to indicate that there is more data in the stream, but it is not available right now. For example, the stream may be the result of a streaming download, and the next chunk of the download hasn't arrived yet. Instead of blocking the read, the stream returns E_PENDING to say, "There is more data, but I can't provide it right now. Go do something else for a while." The download stream presumably has some way of notifying when the next download chunk is ready. Your program can subscribe to that notification, and when it is received, you can resume parsing with XmlLite.

    The adjective "standard" here in the phrase "a standard asynchronous stream" does not refer to a specific reference implementation. It's using the word "standard" in the sense of "regularly and widely used, seen, or accepted; not unusual or special." (This was subtly implied by the use of the indefinite rather than the definite article, but that use of the indefinite could be interpreted to mean "an instance of the standard asynchronous stream".) In other words, the opening sentence is saying, "The input Stream is any asynchronous stream that behaves in the usual manner."

    By analogy, consider the sentence "This service is available from a standard touch-tone phone." This doesn't mean "There is a specific model of touch-tone phone that is the standard touch-tone phone, and you must use that one." It's just saying "Any touch-tone phone (that conforms to the standard) will work."

  • The Old New Thing

    Microspeak: landing (redux)

    • 9 Comments

    In a meeting, my colleague Martyn Lovell said, "The plan is shifting and hasn't landed anywhere yet."

    This was generally understood to mean "The plan is shifting and the issue is not yet settled."

    I don't know if this is true Microspeak, or Martyn was just making up a little metaphor on the fly. But I filed it away anyway because of the interesting collision with another Microspeak use of the word landing.

  • The Old New Thing

    Limiting the bottom byte of an XMM register and clearing the other bytes

    • 9 Comments

    Suppose you have a value in an XMM register and you want to limit the bottom byte to a particular value and set all the other bytes to zero. (Yes, I needed to do this.)

    One way to do this is to apply the two steps in sequence:

    ; value to truncate/limit is in xmm0
    
    ; First, zero out the top 15 bytes
        pslldq  xmm0, 15
        psrldq  xmm0, 15
    
    ; Now limit the bottom byte to N
        mov     al, N
        movd    xmm1, eax
        pminub  xmm0, xmm1
    

    But you can do it all in one step by realizing that min(x, 0) = 0 for all unsigned values x.

    ; value to truncate/limit is in xmm0
        mov     eax, N
        movd    xmm1, eax
        pminub  xmm0, xmm1
    

    In pictures:

    xmm0 xmm1 xmm0
    ? min 0 = 0
    ? min 0 = 0
    ? min 0 = 0
    ? min 0 = 0
    ? min 0 = 0
    ? min 0 = 0
    ? min 0 = 0
    ? min 0 = 0
    ? min 0 = 0
    ? min 0 = 0
    ? min 0 = 0
    ? min 0 = 0
    ? min 0 = 0
    ? min 0 = 0
    ? min 0 = 0
    x min N = min(x, N)

    In intrinsics:

    __m128i min_low_byte_and_set_upper_bytes_to_zero(__m128i x, uint8_t N)
    {
     return _mm_min_epi8(x, _mm_cvtsi32_si128(N));
    }
    
  • The Old New Thing

    Finding the leaked object reference by scanning memory: Example

    • 16 Comments

    An assertion failure was hit in some code.

        // There should be no additional references to the object at this point
        assert(m_cRef == 1);
    

    But the reference count was 2. That's not good. Where is that extra reference and who took it?

    This was not code I was at all familiar with, so I went back to first principles: Let's hope that the reference was not leaked but rather that the reference was taken and not released. And let's hope that the memory hasn't been paged out. (Because debugging is an exercise in optimism.)

    1: kd> s 0 0fffffff 00 86 ec 00
    04effacc  00 86 ec 00 c0 85 ec 00-00 00 00 00 00 00 00 00  ................ // us
    0532c318  00 86 ec 00 28 05 00 00-80 6d 32 05 03 00 00 00  ....(....m2..... // rogue
    

    The first hit is the reference to the object from the code raising the assertion. The second hit is the interesting one. That's probably the rogue reference. But who is it?

    1: kd> ln 532c318
    1: kd>
    

    It does not report as belong to any module, so it's not a global variable.

    Is it a reference from a stack variable? If so, then a stack trace of the thread with the active reference may tell us who is holding the reference and why.

    1: kd> !process -1 4
    PROCESS 907ef980  SessionId: 2  Cid: 06cc    Peb: 7f4df000  ParentCid: 0298
        DirBase: 9e983000  ObjectTable: a576f560  HandleCount: 330.
        Image: contoso.exe
    
            THREAD 8e840080  Cid 06cc.0b78  Teb: 7f4de000 Win32Thread: 9d04b3e0 WAIT
            THREAD 91e24080  Cid 06cc.08d8  Teb: 7f4dd000 Win32Thread: 00000000 WAIT
            THREAD 8e9a3580  Cid 06cc.09f8  Teb: 7f4dc000 Win32Thread: 9d102cc8 WAIT
            THREAD 8e2be080  Cid 06cc.0878  Teb: 7f4db000 Win32Thread: 9d129978 WAIT
            THREAD 82c08080  Cid 06cc.0480  Teb: 7f4da000 Win32Thread: 00000000 WAIT
            THREAD 90552400  Cid 06cc.0f5c  Teb: 7f4d9000 Win32Thread: 9d129628 WAIT
            THREAD 912c9080  Cid 06cc.02ec  Teb: 7f4d8000 Win32Thread: 00000000 WAIT
            THREAD 8e9e8680  Cid 06cc.0130  Teb: 7f4d7000 Win32Thread: 9d129cc8 READY on processor 0
            THREAD 914b8b80  Cid 06cc.02e8  Teb: 7f4d6000 Win32Thread: 9d12d568 WAIT
            THREAD 9054ab00  Cid 06cc.0294  Teb: 7f4d5000 Win32Thread: 9d12fac0 WAIT
            THREAD 909a2b80  Cid 06cc.0b54  Teb: 7f4d4000 Win32Thread: 00000000 WAIT
            THREAD 90866b80  Cid 06cc.0784  Teb: 7f4d3000 Win32Thread: 93dbb4e0 RUNNING on processor 1
            THREAD 90cfcb80  Cid 06cc.08c4  Teb: 7f3af000 Win32Thread: 93de0cc8 WAIT
            THREAD 90c39a00  Cid 06cc.0914  Teb: 7f3ae000 Win32Thread: 00000000 WAIT
            THREAD 90629480  Cid 06cc.0bc8  Teb: 7f3ad000 Win32Thread: 00000000 WAIT
    

    Now I have to dump the stack boundaries to see whether the address in question lies within the stack range.

    1: kd> dd 7f4de000 l3
    7f4de000  ffffffff 00de0000 00dd0000
    1: kd> dd 7f4dd000 l3
    7f4dd000  ffffffff 01070000 01060000
    ...
    1: kd> dd 7f4d7000 l3
    7f4d7000  ffffffff 04e00000 04df0000 // our stack
    ...
    

    The rogue reference did not land in any of the stack ranges, so it's probably on the heap. Fortunately, since it's on the heap, it's probably part of some larger object. And let's hope (see: optimism) that it's an object with virtual methods.

    0532c298  73617453
    0532c29c  74654d68
    0532c2a0  74616461
    0532c2a4  446e4961
    0532c2a8  00007865
    0532c2ac  00000000
    0532c2b0  76726553 USER32!_NULL_IMPORT_DESCRIPTOR  (USER32+0xb6553)
    0532c2b4  44497265
    0532c2b8  45646e49
    0532c2bc  41745378 contoso!CMumble::CMumble+0x4c
    0532c2c0  00006873
    0532c2c4  00000000
    0532c2c8  4e616843
    0532c2cc  79546567
    0532c2d0  4e496570
    0532c2d4  00786564
    0532c2d8  2856662a
    0532c2dc  080a9b87
    0532c2e0  00f59fa0
    0532c2e4  05326538
    0532c2e8  00000000
    0532c2ec  00000000
    0532c2f0  0000029c
    0532c2f4  00000001
    0532c2f8  00000230
    0532c2fc  fdfdfdfd
    0532c300  45ea1370 contoso!CFrumble::`vftable'
    0532c304  45ea134c contoso!CFrumble::`vftable'
    0532c308  00000000
    0532c30c  05b9a040
    0532c310  00000002
    0532c314  00000001
    0532c318  00ec8600
    

    Hooray, there is a vtable a few bytes before the pointer, and the contents of the memory do appear to match a CFrumble object, so I think we found our culprit.

    I was able to hand off the next stage of the investigation (why is a Frumble being created with a reference to the object?) to another team member with more expertise with Frumbles.

    (In case anybody cared, the conclusion was that this was a variation of a known bug.)

Page 6 of 446 (4,453 items) «45678»