• The Old New Thing

    Common gotchas when writing your own p/invoke


    If you're looking to get into some p/invoke action, you'd be well-served to check out the pinvoke wiki to see if somebody else has done it too. If what you need isn't there, you may end up forced to write your own, and here are some gotchas I've seen people run into:

    • C++ bool and Win32 BOOLEAN are not the same as C# bool (aka System.Boolean). In Win32, BOOL is a 4-byte type, and BOOLEAN is a 1-byte type. [See also MadQ's remarks about VARIANT_BOOL.] Meanwhile, C++ bool is not standardized by Win32, so the size will vary based on your compiler, but most compilers use a 1-byte value. And then C# is even weirder: The bool is a 1-byte type, but it marshals as a 4-byte type by default.
    • Win32 char is not the same as C# char (aka System.Char). In C#, char is a Unicode character (two bytes), whereas in C/C++ under Win32 it is an ANSI character (one byte).
    • Win32 long is not the same as C# long (aka System.Int64). In C#, long is 64-bit value, whereas in C/C++ under Win32 it is a 32-bit value.
    • If memory is allocated and freed across the interop boundary, make sure both sides are using the same allocator. It is my understanding that the CLR uses CoTaskMemAlloc/CoTaskMemFree by default. If your Win32 function doesn't use CoTaskMemAlloc, you'll have to teach the CLR which allocator you really want.
    • When laying out structures, you have to watch out for alignment.

    That last one is particularly gnarly on 64-bit systems, where alignment requirements are less forgiving than on x86. The structure declarations on pinvoke.net tend to ignore 64-bit issues. For example, the declaration of the INPUT structure (as of this writing—it's a wiki so it's probably changed by the time you read this) reads as follows:

    [StructLayout(LayoutKind.Explicit)]struct INPUT {
      [FieldOffset(0)] int type;
      [FieldOffset(4)] MOUSEINPUT mi;
      [FieldOffset(4)] KEYBDINPUT ki;
      [FieldOffset(4)] HARDWAREINPUT hi;

    This structure layout is correct for 32-bit Windows, but it's incorrect for 64-bit Windows.

    Let's take a look at that MOUSEINPUT structure, for starters.

    typedef struct tagMOUSEINPUT {
        LONG    dx;
        LONG    dy;
        DWORD   mouseData;
        DWORD   dwFlags;
        DWORD   time;
        ULONG_PTR dwExtraInfo;

    In 64-bit Windows, the LONG and DWORD members are four bytes, but the dwExtraInfo is a ULONG_PTR, which is eight bytes on a 64-bit machine. Since Windows assumes /Zp8 packing, the dwExtraInfo must be aligned on an 8-byte boundary, which forces four bytes of padding to be inserted after the time to get the dwExtraInfo to align properly. And in order for all this to work, the MOUSEINPUT structure itself must be 8-byte aligned.

    Now let's look at that INPUT structure again. Since the MOUSEINPUT comes after the type, there also needs to be padding between the type and the MOUSEINPUT to get the MOUSEINPUT back to an 8-byte boundary. In other words, the offset of mi in the INPUT structure is 8 on 64-bit Windows, not 4.

    Here's how I would've written it:

    // This generates the anonymous union
    [StructLayout(LayoutKind.Explicit)] struct INPUT_UNION {
      [FieldOffset(0)] MOUSEINPUT mi;
      [FieldOffset(0)] KEYBDINPUT ki;
      [FieldOffset(0)] HARDWAREINPUT hi;
    [StructLayout(LayoutKind.Sequential)] struct INPUT {
      int type;
      INPUT_UNION u;

    I introduce a helper structure to represent the anonymous union that is the second half of the Win32 INPUT structure. By doing it this way, I let somebody else worry about the alignment, and it'll be correct for both 32-bit and 64-bit Windows.

    static public void Main()
      Console.WriteLine(Marshal.OffsetOf(typeof(INPUT), "u"));

    On a 32-bit system, this prints 4, and on a 64-bit system, it prints 8. The downside is that you have to type an extra u. when you access the mi, ki or hi members.

    input i;
    i.u.mi.dx = 0;

    (I haven't checked what the PInvoke Interop Assistant comes up with for the INPUT structure.)

  • The Old New Thing

    Data breakpoints are based on the linear address, not the physical address


    When you ask the debugger to set a read or write breakpoint, the breakpoint fires only if the address is read from or written to by the address you specify. If the memory is mapped to another address and modified at that other address, then your breakpoint won't see it.

    For example, if you have multiple views on the same data, then modifications to that data via alternate addresses will not trigger the breakpoint.

    The hardware breakpoint status is part of the processor context, which is maintained on a per-thread basis. Each thread maintains its own virtualized hardware breakpoint status. You don't notice this in practice because debuggers are kind enough to replicate the breakpoint state across all threads in a process so that the breakpoint fires regardless of which thread triggers it. But that replication typically doesn't extend beyond the process you're debugging; the debugger doesn't bother replicating your breakpoints into other processes! This means that if you set a write breakpoint on a block of shared memory, and the write occurs in some other process, your breakpoint won't fire since it's not your process that wrote to it.

    When you call into kernel mode, there is another context switch, this time between user mode and kernel mode, and the kernel mode context of course doesn't have your data breakpoint. Which is a good thing, because if that data breakpoint fired in kernel mode, how is your user-mode debugger expected to be able to make any sense of it? The breakpoint fired when executing code that user mode doesn't have permission to access, and it may have fired while the kernel mode code owned an important critical section or spinlock, a critical section the debugger itself may very well need. Imagine if the memory were accessed by the keyboard driver. Oops, now your keyboard processing has been suspended. Even worse, what if the memory were accessed by a a hardware interrupt handler? Hardware interrupt handlers can't even access paged memory, much less allow user-mode code to run.

    This "program being debugged takes a lock that the debugger itself needs" issue isn't usually a problem when a user-mode debugger debugs a user-mode process, because the locks held by a user-mode process typically affect only that process. If a process takes a critical section, sure that may deadlock the process, but the debugger is not part of the process, so it doesn't care.

    Of course, the "debugger is its own world" principle falls apart if the debugger is foolish enough to require a lock that the program being debugged also uses. Debugger authors therefore must be careful to avoid these sorts of cross-process dependencies. (News flash: Writing a debugger is hard.) You can still run into trouble if the program being debugged has done something with global consequences like create a fullscreen topmost window (thereby covering the debugger) or installed a global keyboard hook (thereby interfering with typing). If you've tried debugging a system service, you may have run into this sort of cross-process deadlock. For example, if you debug the service that is responsible for the networking client, and the debugger tries to access the network (for example, to load symbols), you've created a deadlock since the debugger needs to access the network, which it can't do because the networking service is stopped in the debugger.

    Hardware debugging breakpoints are a very convenient tool for chasing down bugs, but you have to understand their limitations.

    Additional reading: Data breakpoint oddities.

  • The Old New Thing

    Actually, FlagsAttribute can't do more; that's why it's an attribute


    A few years ago, Abhinaba wondered why FlagsAttribute didn't also alter the way enumeration values are auto-assigned.

    Because attributes don't change the language. They are instructions to the runtime environment or (in rarer cases) to the compiler. An attribute can instruct the runtime environment to treat the function or class in a particular way. For example, you can use an attribute to tell the runtime environment that you want the program entry point to run in a single-threaded apartment, to tell the runtime environment how to look up your p/invoke function, or to tell the compiler to suppress a particular class of warnings.

    But changing how values for enumerations are assigned, well that actually changes the language. An attribute can't change the operator precedence tables. An attribute can't change the way overloaded functions are resolved. An attribute can't change the statement block tokens from curly braces to square braces. An attribute can't change the IL that gets generated. The code still compiles to the same IL; the attribute just controls the execution environment, such as how the JIT compiler chooses to lay out a structure in memory.

    Attribute or not, enumerations follow the same rule for automatic assignment: An enumeration symbol receives the value one greater than the previous enumeration symbol.

  • The Old New Thing

    2008 Q1 link clearance: Microsoft blogger edition

  • The Old New Thing

    2007 year-end link clearance


    A few random links that I've collected over the last six months.

    And then the obligatory plug for my column in TechNet Magazine, which, despite the fact that Microsoft's name is on the magazine cover, does not establish the official Microsoft position on anything.

  • The Old New Thing

    Don't be helpless: You can put things together, it doesn't have to be a single command


    Humans are distinguished among all animal species by their advanced development of and heavy reliance on tools. Don't betray your ancestors. Use those tools you have.

    For example, during the debugging of a thread pool problem, it looked like somebody did a PostThreadMessage to a thread pool thread and left the message unprocessed after the thread pool function returned. Who could it have been? Well, one idea was to see if there were any DLLs in the system which called both QueueUserWorkItem and PostThreadMessage.

    I did a little legwork and contributed the following analysis to the mail thread:

    Of all the DLLs loaded into the process, the following call PostThreadMessage:

    SHLWAPI.dll 77D72436 221 PostThreadMessageA
    SHELL32.dll 77D78596 222 PostThreadMessageW
    ole32.dll 77D78596 222 PostThreadMessageW
    ... (list trimmed; you get the idea) ...

    Of those DLLs, these also call QueueUserWorkItem:

    ... (list trimmed; you get the idea) ...

    Astounded, somebody wanted to know how I came up with that list.

    Nothing magic. You have the tools, you have a brain, so connect the dots.

    The lm debugger command lists all the DLLs loaded into the process. Copy the output from the debugger window and paste it into a text file. Now write a little script that takes each line of the text file and does a link /dump /imports on the corresponding DLL. I happen to prefer perl for this sort of thing, but you can use a boring batch file if you like.

    for /f %i in (dlls.txt) do ^
    @echo %i & link /dump /imports %i | findstr PostThreadMessage

    Scrape the results off the screen, prune out the misses, and there you have it.

    "I tried that, but the result wasn't in the same format as what you posted."

    Well, yeah. There's no law that says that I can't manually reformat the data before presenting it in an email message. Since there were only a dozen hits, it's not worth writing a script to do that type of data munging. Typing "backspace, home, up-arrow" twelve times is a lot faster than writing a script to take the output of the above batch file and turn it into the output I used in the email message.

    Another boring batch file filters the list to those DLLs that also call QueueUserWorkItem. Writing it (or a script in your favorite language) is left as an exercise.

    No rocket science here. Just taking a bunch of tools and putting them together to solve a problem. That's what your brain is for, after all.

  • The Old New Thing

    More Start menu fine-tuning: Choosing a better representative for a frequently-run program


    If you paid really close attention to the way a representative shortcut is selected for a program, you may have noticed a problem with it. Here's the rule again:

    If there are multiple shortcuts to the same program, then the most-frequently-used shortcut is selected as the one to appear on the front page of the Start menu.

    Suppose there are two shortcuts to Notepad on the All Programs section of the Start menu, one is the standard Notepad shortcut that comes with Windows, and the other is a shortcut whose command line is notepad.exe C:\Program Files\LitWare Inc\Release Notes.txt. Now suppose the user opens a text document on the desktop. Notepad runs, it "earns a point", and suppose that this gives Notepad enough points to appear on the front page of the Start menu. Which Notepad shortcut do we show?

    According to the rule stated above, we will choose either the standard Notepad shortcut or the LitWare Release Notes shortcut, depending on which one you've run most frequently. If it's the latter, then you'll have the puzzling result that opening a text document on the desktop causes the LitWare Release Notes shortcut to show up on the front page of the Start menu. It's perfectly logical and completely baffling at the same time.

    In Windows Vista, another tweak was added to the algorithm by which a shortcut is chosen to represent a program on the front page of the Start menu: If the user hasn't run any of a program's shortcuts from the Start menu, a shortcut that doesn't have any command line parameters is preferred over one that does.

    This tweak causes the Start menu to favor the standard Notepad shortcut over the LitWare Release Notes shortcut. It also means that, for example, a shortcut to Litware.exe is preferred over a shortcut of the form Litware.exe -update.

    Note: I was not present at the Windows Vista Start menu design meetings, so I have no insight into the rationale behind its design. Sorry.

  • The Old New Thing

    Sensor development kits were flying off the shelves


    After the Sensor and Location Platform PDC presentation, people were stopping by the booth and grabbing sensor hardware and development kits like they were candy. Then again, to geeks, this stuff is candy.

    (And technically, they weren't flying off shelves. They were flying out of bins. Well, and technically they weren't flying either.)

    Other notes from the last day of the 2008 PDC:

    • PDC content honcho Mike Swanson announced that PDC 2008 sessions and keynotes are now up for download, and will be available indefinitely. Available in Silverlight for live streaming, or in iTunes format, WMV, high-quality WMV (this one includes video of the speakers), and Zune for offline viewing. In response, Nigel Parker created the PDC session firehose.
    • Sometimes, when you make the sign too big, nobody can see it. Someone came up to me asking where the Azure booth was. I pointed at the insanely huge banner. "Probably somewhere over there."
    • The reason why room 406A is on the third floor but you press 2 to get there: The room numbers at the convention center are numbered based on room clusters. The 100's in one cluster, the 400's are in another cluster, etc. The "4" in 406 doesn't mean fourth floor; it means fourth cluster. And why do you push 2 to get to the third floor? Because Level 1 is a split level which occupies two floors. Therefore, in the elevator, there are two buttons to get to Level 1, depending on whether you want the upper half or lower half. Cluster 4 is on Level 2, so the button for that is 3.
    • A video of carrots. If you had come to my PDC talk, you'd understand. (The carrot joke elicited a mild chuckle from the main room, but I'm told that in the overflow room it totally killed. I'm not sure what that says about people in the overflow room. It really wasn't that great a joke, people.)
    • At Ask the Experts, I started out at one of the Windows 7 tables, and we speculated as to what the most obscure technical question would be. One of my colleagues suggested that it would be a BitLocker question. And what would you know, the very first person to come to our table had a BitLocker question. Later in the evening, I moved to another Windows 7 table and posed the same question. At the second table, we figured that the most obscure technical question would be about the dispatcher spinlock. About fifteen minutes later, somebody stopped by with a question about the dispatcher spinlock. I decided to stop playing this game. It was too much like a Twilight Zone episode.
    • One attendee was shy about speaking English (not being a native speaker) and came up with a clever solution: The attendee walked up to our table, said, "I have some questions, can you read them?" and then opened a laptop computer where a Word document awaited with the four questions already typed up in a large font so everybody could read them. We answered the questions, and everybody seemed pleasantly surprised with how well this approach worked.
    • At the end of the conference the Channel 9 folks held a giveaway for their giant beanbags. Like anybody would be able to take a giant beanbag home on the airplane. (Second prize: Two beanbags!)
    • James Senior has the PDC by the numbers. I have trouble believing the number of bananas consumed, though. Twenty-five bananas in four days? Perhaps we should start calling James Monkey Boy.
  • The Old New Thing

    How can I find all objects of a particular type?


    More than one customer has asked a question like this:

    I'm looking for a way to search for all instances of a particular type at runtime. My goal is to invoke a particular method on each of those instances. Note that I did not create these object myself or have any other access to them. Is this possible?

    Imagine what the world would be like if it were possible.

    For starters, just imagine the fun you could have if you could call typeof(Secure­String).Get­Instances(). Vegas road trip!

    More generally, it breaks the semantics of App­Domain boundaries, since grabbing all instances of a type lets you get objects from another App­Domain, which fundamentally violates the point of App­Domains. (Okay, you could repair this by saying that the Get­Instances method only returns objects from the current App­Domain.)

    This imaginary Get­Instances method might return objects which are awaiting finalization, which violates one of the fundamental assumptions of a finalizer, namely that there are no references to the object: If there were, then it wouldn't be finalized! (Okay, you could repair this by saying that the Get­Instances method does not return objects which are awaiting finalization.)

    On top of that, you break the syncRoot pattern.

    class Sample {
     private object syncRoot = new object();
     public void Method() {
      lock(syncRoot) { ... };
    If it were possible to get all objects of a particular class, then anybody could just reach in and grab your private sync­Root and call Monitor.Enter() on it. Congratuations, the private synchronization object you created is now a public one that anybody can screw with, defeating the whole purpose of having a private syncRoot. You can no longer reason about your syncRoot because you are no longer in full control of it. (Yes, this can already be done with reflection, but at least when reflecting, you know that you're grabbing somebody's private field called sync­Root, so you already recognize that you're doing something dubious. Whereas with Get­Instances, you don't know what each of the returned objects is being used for. Heck, you don't even know if it's being used! It might just be garbage lying around waiting to be collected.)

    More generally, code is often written on the expectation that an object that you never give out a reference to is not accessible to others. Consider the following code fragment:

    using (StreamWriter sr = new StreamWriter(fileName)) {

    If it were possible to get all objects of a particular class, you may find that your customers report that they are getting an Object­Disposed­Exception on the call to Write­Line. How is that possible? The disposal doesn't happen until the close-brace, right? Is there a bug in the CLR where it's disposing an object too soon?

    Nope, what happened is that some other thread did exactly what the customer was asking for a way to do: It grabbed all existing Stream­Writer instances and invoked Stream­Writer.Close on them. It did this immediately after you constructed the Stream­Writer and before you did your sr.Write­Line(). Result: When your sr.Write­Line() executes, it finds that the stream was already closed, and therefore the write fails.

    More generally, consider the graffiti you could inject into all output files by doing

    foreach (StreamWriter sr in typeof(StreamWriter).GetInstances()) {
     sr.Write("Kilroy was here!");

    or even crazier

    foreach (StringBuilder rb in typeof(StringBuilder).GetInstances()) {
     sb.Insert(0, "DROP TABLE users; --");

    Now no String­Builder is safe—the contents of any String­Builder can be corrupted at any time!

    If you could obtain all instances of a type, the fundamental logic behind computer programming breaks down. It effectively becomes impossible to reason about code because anything could happen to your objects at any time.

    If you need to be able to get all instances of a class, you need to add that functionality to the class itself. (GC­Handle or Weak­Reference will come in handy here.) Of course, if you do this, then you clearly opted into the "anything can happen to your object at any time outside your control" model and presumably your code operates accordingly. You made your bed; now you get to lie in it.

    (And I haven't even touched on thread safety.)

    Bonus reading: Questionable value of SyncRoot on Collections.

  • The Old New Thing

    What are these strange cmp [ecx], ecx instructions doing in my C# code?


    When you debug through some managed code at the assembly level, you'll find a whole lot of seemingly pointless instructions that perform a comparison but ignore the result. What's the point of comparing two values if you don't care what the result is?

    In C++, invoking an instance method on a NULL pointer results in undefined behavior. In other words, if you do it, the compiler is allowed to do anything it wants. And what most compilers do is, um, nothing. They don't take any special steps if the this pointer is NULL; they just generate code on the assumption that it isn't. In practice, this often means that everything seems to run just fine until you access a member variables or call a virtual functions, and then you crash.

    The C# language, by comparison, is quite explicit about what happens if you invoke an instance method on a null object reference:

    The value of E is checked to be valid. If the value of E is null, a System.NullReferenceException is thrown and no further steps are executed.

    The null reference exception must be thrown before the method can be called. That's what the strange cmp [ecx], ecx comparison is for.¹ The compiler doesn't actually care what the result of the comparison is; it just wants to raise an exception if ecx is null. If ecx is null, the attempt to dereference it (in order to perform the comparison) will raise an access violation, which the runtime inspects and turns into a NullReferenceException.

    The test is usually against the ecx register since the CLR happens to use² the fastcall calling convention, which for instance methods passes the this pointer in the ecx register. The pointer the compiler wants to test is going to wind up in the ecx register sooner or later,³ so it's not surprising that the test, when it happens, is made against the ecx register.

    Nitpicker's Corner

    ¹Although this statement is written as if it were a fact, it is actually my interpretation based on observation and thinking about how language features are implemented. It is not an official position of the CLR team nor Microsoft Corporation, and that interpretation may ultimately prove incorrect.

    ²"Happens to use" means that this is an implementation detail, not a contractual guarantee.¹

    ³Unless the call is optimized. For example, the function might be inlined.

Page 8 of 461 (4,601 items) «678910»