• The Old New Thing

    You just have to accept that the file system can change

    • 31 Comments

    A customer who is writing some sort of code library wants to know how they should implement a function that determines whether a file exists. The usual way of doing this is by calling GetFileAttributes, but what they've found is that sometimes GetFileAttributes will report that a file exists, but when they get around to accessing the file, they get the error ERROR_DELETE_PENDING.

    The lesser question is what ERROR_DELETE_PENDING means. It means that somebody opened the file with FILE_SHARE_DELETE sharing, meaning that they don't mind if somebody deletes the file while they have it open. If the file is indeed deleted, then it goes into "delete pending" mode, at which point the file deletion physically occurs when the last handle is closed. But while it's in the "delete pending" state, you can't do much with it. The file is in limbo.

    You just have to be prepared for this sort of thing to happen. In a pre-emptively multi-tasking operating system, the file system can change at any time. If you want to prevent something from changing in the file system, you have to open a handle that denies whatever operation you want to prevent from happening. (For example, you can prevent a file from being deleted by opening it and not specifying FILE_SHARE_DELETE in your sharing mode.)

    The customer wanted to know how their "Does the file exist?" library function should behave. Should it try to open the file to see if it is in delete-pending state? If so, what should the function return? Should it say that the file exists? That it doesn't exist? Should they have their function return one of three values (Exists, Doesn't Exist, and Is In Funky Delete State) instead of a boolean?

    The answer is that any work you do to try to protect users from this weird state is not going to solve the problem because the file system can change at any time. If a program calls "Does the file exist?" and the file does exist, you will return true, and then during the execution of your return statement, your thread gets pre-empted and somebody else comes in and puts the file into the delete-pending state. Now what? Your library didn't protect the program from anything. It can still get the delete-pending error.

    Trying to do something to avoid the delete-pending state doesn't accomplish anything since the file can get into that state after you returned to the caller saying "It's all clear." In one of my messages, I wrote that it's like fixing a race condition by writing

    // check several times to try to avoid race condition where
    // g_fReady is set before g_Value is set
    if (g_fReady && g_fReady && g_fReady && g_fReady && g_fReady &&
        g_fReady && g_fReady && g_fReady && g_fReady && g_fReady &&
        g_fReady && g_fReady && g_fReady) { return g_Value; }
    

    The compiler folks saw this message and got a good chuckle out of it. One of them facetiously suggested that they add code to the compiler to detect this coding style and not optimize it away.

  • The Old New Thing

    If you're going to reformat source code, please don't do anything else at the same time

    • 36 Comments

    I spend a good amount of my time doing source code archaeology, and one thing that really muddles the historical record is people who start with a small source code change which turns into large-scale source code reformatting.

    I don't care how you format your source code. It's your source code. And your team might decide to change styles at some point. For example, your original style guide may have been designed for the classic version of the C language, and you want to switch to a style guide designed for C++ and its new // single-line comments. Your new style guide may choose to use spaces instead of tabs for indentation, or it may dictate that opening braces go on the next line rather than hanging out at the end of the previous line, or you may have a new convention for names of member variables. Maybe your XML style guidelines changed from using

    <element attribute1="value1" attribute2="value2" />
    
    to
    <element
             attribute1="value1"
             attribute2="value2"
    />
    

    Whatever your changes are, go nuts. All I ask is that you restrict them to "layout-only" check-ins. In other words, if you want to do some source code reformatting and change some code, please split it up into two check-ins, one that does the reformatting and the other that changes the code.

    Otherwise, I end up staring at a diff of 1500 changed lines of source code, 1498 of which are just reformatting, and 2 of which actually changed something. Finding those two lines is not fun.

    And the next time somebody is asked to do some code archaeology, say to determine exactly when a particular change in behavior occurred, or to give an assessment of how risky it would be to port a change, the person who is invited to undertake the investigation may not be me. It may very well be you.

  • The Old New Thing

    Why can't I declare a type that derives from a generic type parameter?

    • 18 Comments

    A lot of questions about C# generics come from the starting point that they are just a cutesy C# name for C++ templates. But while the two may look similar in the source code, they are actually quite different.

    C++ templates are macros on steroids. No code gets generated when a template is "compiled"; the compiler merely hangs onto the source code, and when you finally instantiate it, the actual type is inserted and code generation takes place.

    // C++ template
    template<class T>
    class Abel
    {
    public:
     int DoBloober(T t, int i) { return t.Bloober(i); }
    };
    

    This is a perfectly legal (if strange) C++ template class. But when the compiler encounters this template, there are a whole bunch of things left unknown. What is the return type of T::Bloober? Can it be converted to an int? Is T::Bloober a static method? An instance method? A virtual instance method? A method on a virtual base class? What is the calling convention? Does T::Bloober take an int argument? Or maybe it's a double? Even stranger, it might accept a Canoe which gets converted from an int by a converting constructor. Or maybe it's a function that takes two parameters, but the second parameter has a default value.

    Nobody knows the answers to these questions, not even the compiler. It's only when you decide to instantiate the template

    Abel<Baker> abel;
    

    that these burning questions can be answered, overloaded operators can be resolved, conversion operators can be hunted down, parameters can get pushed on the stack in the correct order, and the correct type of call instruction can be generated.

    In fact, the compiler doesn't even care whether or not Baker has a Bloober method, as long as you never call Abel<Baker>::DoBloober!

    void f()
    {
     Abel<int> a; // no error!
    }
    
    void g()
    {
     Abel<int> a;
     a.DoBloober(0, 1); // error here
    }
    
    Only if you actually call the method does the compiler start looking for how it can generate code for the DoBloober method.

    C# generics aren't like that.

    Unlike C++, where a non-instantiated template exists only in the imaginary world of potential code that could exist but doesn't, a C# generic results in code being generated, but with placeholders where the type parameter should be inserted.

    This is why you can use generics implemented in another assembly, even without the source code to that generic. This is why a generic can be recompiled without having to recompile all the assemblies that use that generic. The code for the generic is generated when the generic is compiled. By comparison no code is generated for C++ templates until the template is instantiated.

    What this means for C# generics is that if you want to do something with your type parameter, it has to be something that the compiler can figure out how to do without knowing what T is. Let's look at the example that generated today's question.

    class Foo<T>
    {
     class Bar : T
     { ... }
    }
    

    This is flagged as an error by the compiler:

    error CS0689: Cannot derive from 'T' because it is a type parameter
    

    Deriving from a generic type parameter is explicitly forbidden by 25.1.1 of the C# language specification. Consider:

    class Foo<T>
    {
     class Bar : T
     {
       public void FlubberMe()
       {
         Flubber(0);
       }
     }
    }
    

    The compiler doesn't have enough information to generate the IL for the FlubberMe method. One possibility is

    ldarg.0        // "this"
    ldc.i4.0    // integer 0 - is this right?
    call T.Flubber // is this the right type of call?
    

    The line ldc.i4.0 is a guess. If the method T.Flubber were actually void Flubber(long l), then the line would have to be ldc.i4.0; conv.i8 to load an 8-byte integer onto the stack instead of a 4-byte integer. Or perhaps it's void Flubber(object o), in which case the zero needs to be boxed.

    And what about that call instruction? Should it be a call or callvirt?

    And what if the method returned a value, say, string Flubber(int i)? Now the compiler also has to generate code to discard the return value from the top of the stack.

    Since the source code for a generic is not included in the assembly, all these questions have to be answered at the time the generic is compiled. Besides, you can write a generic in Managed C++ and use it from VB.NET. Even saving the source code won't be much help if the generic was implemented in a language you don't have the compiler for!

  • The Old New Thing

    Why do structures get tag names even if there is a typedef?

    • 17 Comments

    As we noted last time, structure tags are different from the typedef name as a historical artifact of earlier versions of the C language. But what about just leaving out the name entirely?

    typedef struct {
     ...
    } XYZ;
    

    One problem with this approach is that it becomes impossible to make a forward reference to this structure because it has no name. For example, if you wanted to write a prototype for a function that took one of these structures, and you could not be sure that the header file defining the XYZ type definition has already been included, you can still refer to the structure by its tag name.

    // in header file A
    typedef struct tagXYZ {
     ...
    } XYZ;
    
    // in header file B
    BOOL InitializeFromXYZ(const struct tagXYZ *pxyz);
    

    The two header files can be included in either order because header file B uses a forward reference to the XYZ structure. Naturally, you would hope that people would include header file A before header file B, but there can be cases where it is not practical. (For example, header file A may contain definitions that conflict with something else that the program needs, or header file A may change its behavior based on what has already been #define'd, and you don't want to include it before the application has a chance to set up those #defines.)

    But a more important reason to avoid anonymous types is that it creates problems for MIDL.

    Okay, it doesn't actually create problems for MIDL. MIDL handles it just fine, but the way MIDL handles it creates problems for you, for when you create an anonymous type in MIDL, such as an anonymous structure above, or an anonymous enumeration like this:

    typedef enum { ... } XYZ;
    

    MIDL auto-generates a name for you. For example, the above enumeration might end up in the generated header file as

    typedef enum __MIDL___MIDL_itf_scratch_0000_0001
    {
        ...
    } XYZ;
    

    The kicker is that the auto-generated name changes if you change the IDL file. And since typedefs are just shorthand for the underlying type (rather than a type in and of themselves), the name saved in the PDB is the unwieldy __MIDL___MIDL_itf_scratch_0000_0001. Try typing that into the debugger, yuck.

    Furthermore, having the name change from build to build means that you have to make sure code libraries are all built from exactly the same header file versions, even if the changes are ostensibly compatible. For example, suppose you compile a library with a particular version of the header file, and then you add a structure to the MIDL file which has no effect on the functions and structures that the library used. But still, since you changed the MIDL file, this changes the auto-generated symbol names. Now you compile a program with the new header file and link against the library. Result: A whole bunch of errors, because the library, say, exports a function that expects its first parameter to be a __MIDL___MIDL_itf_scratch_0000_0001 (because the library was built from the older MIDL-generated header file), but your program imports a function that expects its first parameter to be a __MIDL___MIDL_itf_scratch_0001_0002 (because you compiled with the newer MIDL-generated header file).

    What's more, when you update the header file, your source control system will recognize hundreds of changes, since the MIDL compiler generated a whole different set of names which no longer match the names from the previous version of the header file, even though you didn't change the structure! This isn't fatal, but it makes digging through source code history more of an ordeal since the "real changes" are buried amidst hundreds of lines of meaningless changes.

    Now, this particular rule of thumb is not universally adhered-to in Windows header files, in large part, I believe, simple because people aren't aware of the potential for mischief. But maybe now that I wrote them up, people might start paying closer attention.

  • The Old New Thing

    The implementation of iterators in C# and its consequences (part 2)

    • 49 Comments

    Now that you have the basic idea behind iterators under your belt, you can already answer some questions on iterator usage. Here's a scenario based on actual events:

    I have an iterator that is rather long and complicated, so I'd like to refactor it. For illustrative purposes, let's say that the enumerator counts from 1 to 100 twice. (In real life, of course, the iterator will not be this simple.)

    IEnumerable<int> CountTo100Twice()
    {
     int i;
     for (i = 1; i <= 100; i++) {
      yield return i;
     }
     for (i = 1; i <= 100; i++) {
      yield return i;
     }
    }
    

    As we learned in Programming 101, we can pull common code into a subroutine and call the subroutine. But when I do this, I get a compiler error:

    IEnumerable<int> CountTo100Twice()
    {
     CountTo100();
     CountTo100();
    }
    
    void CountTo100()
    {
     int i;
     for (i = 1; i <= 100; i++) {
      yield return i;
     }
    }
    

    What am I doing wrong? How can I move the "count to 100" into a subroutine and call it twice from the CountTo100Twice function?

    As we saw last time, iterators are not coroutines. The technique above would have worked great had we built iterators out of, say, fibers instead of building them out of state machines. As state machines, all yield return statements must occur at the "top level". So how do you iterate with the help of subroutines?

    You make the subroutine its own iterator and suck the results out from the main function:

    IEnumerable<int> CountTo100Twice()
    {
     foreach (int i in CountTo100()) yield return i;
     foreach (int i in CountTo100()) yield return i;
    }
    
    IEnumerable<int> CountTo100()
    {
     for (i = 1; i <= 100; i++) {
      yield return i;
     }
    }
    

    Exercise: Consider the following fragment:

     foreach (int i in CountTo100Twice()) {
      ...
     }
    

    Explain what happens on the 150th call to MoveNext() in the above loop. Discuss its consequences for recursive enumerators (such as tree traversal).

  • The Old New Thing

    Why do we even have the DefWindowProc function?

    • 8 Comments

    Some time ago, I looked at two ways of reimplementing the dialog procedure (method 1, method 2). Commenter "8" wondered why we have a DefWindowProc function at all. Couldn't window procedures have followed the dialog box model, where they simply return FALSE to indicate that they want default processing to occur? Then there would be no need to export the DefWindowProc function.

    This overlooks one key pattern for derived classes: Using the base class as a subroutine. That pattern is what prompted people to ask for dialog procedures that acted like window procedures. If you use the "Return FALSE to get default behavior" pattern, window procedures would go something like this:

    BOOL DialogLikeWndProc(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
    {
     switch (uMsg) {
     ... handle messages and return TRUE ...
     }
     // We didn't have any special processing; do the default thing
     return FALSE;
    }
    

    Similarly, subclassing in this hypothetical world would go like this:

    BOOL DialogLikeSubclass(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
    {
     switch (uMsg) {
     ... handle messages and return TRUE ...
     }
     // We didn't have any special processing; let the base class try
     CallDialogLikeWindowProc(PrevDialogLikeWndProc, hwnd, uMsg, wParam, lParam);
    }
    

    This works as long as what you want to do is override the base class behavior entirely. But what if you just want to augment it? Calling the previous window procedure is analogous to calling the base class implementation from a derived class, and doing so is quite common in object-oriented programming, where you want the derived class to behave "mostly" like the base class. Consider, for example, the case where we want to allow the user to drag a window by grabbing anywhere in the client area:

    LRESULT CALLBACK CaptionDragWndProc(
        HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
    {
     LRESULT lres;
    
     switch (uMsg) {
     case WM_NCHITTEST:
      lres = DefWindowProc(hwnd, uMsg, wParam, lParam);
      if (lres == HTCLIENT) lres = HTCAPTION;
      return lres;
     ...
     }
     return DefWindowProc(hwnd, uMsg, wParam, lParam);
    }
    

    We want our hit-testing to behave just like normal, with the only exception that clicks in the client area should be treated as clicks on the caption. With the DefWindowProc model, we can do this by calling DefWindowProc to do the default processing, and then modifying the result on the back end. If we had use the dialog-box-like model, there would have been no way to call the "default handler" as a subroutine in order to make it to the heavy lifting. We would be forced to do all the work or none of it.

    Another avenue that an explicit DefWindowProc function opens up is modifying messages before they reach the default handler. For example, suppose you have a read-only edit control, but you want it to look like a normal edit control instead of getting the static look. You can do this by modifying the message that you pass to DefWindowProc:

    ...
     case WM_CTLCOLORSTATIC:
      if (GET_WM_CTLCOLOR_HWND(wParam, lParam) == m_hwndEdit)
      {
       // give it the "edit" look
       return DefWindowProc(hwnd, WM_CTLCOLOREDIT, wParam, lParam);
      }
      ...
    

    Another common operation is changing one color attribute of an edit control while leaving the others intact. For this, you can use DefWindowProc as a subroutine and then tweak the one attribute you want to customize.

     case WM_CTLCOLORSTATIC:
      if (GET_WM_CTLCOLOR_HWND(wParam, lParam) == m_hwndDanger)
      {
       // Start with the default color attributes
       LRESULT lres = DefWindowProc(hwnd, uMsg, wParam, lParam);
       // Change text color to red; leave everything else the same
       SetTextColor(GET_WM_CTLCOLOR_HDC(wParam, lParam), RGB(255,0,0));
       return lres;
      }
      ...
    

    Getting these types of operations to work with the dialog box model would be a significantly trickier undertaking.

  • The Old New Thing

    Why does a new user get stuff on their Start menu right off the bat?

    • 34 Comments

    In the initial designs for the Start menu, the list of most-frequently-used programs on the Start menu would be completely empty the first time you opened it. This was perfectly logical, since you hadn't run any programs at all yet, so nothing was frequently-used because nothing had been used at all! Perfectly logical and completely stupid-looking.

    Imagine the disappointment of people who just bought a computer. They unpack it, plug everything in, turn it on, everything looks great. Then they open the Start menu to start using their computer and they get... a blank white space. "Ha-ha! This computer can't do anything! You should have bought a Mac!"

    (In usability-speak, this is known as "the cliff": You're setting up a new computer, everything looks like it's going great, and then... you're staring at a blank screen and have no idea what to do next. The learning curve has turned into a precipice.)

    The original design attempted to make this initially-blank Start menu less stark by adding text that said, roughly, "Hey, sure, this space is blank right now, but as you run programs, they will show up here, trust me."

    Great work there. Now it's not stupid any more. Now it's stupid and ugly.

    It took a few months to figure out how to solve this problem, and ultimately we decided upon what you see in Windows XP: For brand new users, we create some "artificial points" so that the initial Start menu has a sampling of fun programs on it. The number of artificial points is carefully chosen so that they are enough points to get the programs onto the Start menu's front page, but not so many points that they overwhelm the "real" points earned by programs users run themselves. (I believe the values were chosen so that a user needs to run a program only twice to get it onto the front page on the first day.)

    Note that these "artificial points" are not given if the user was upgraded from Windows 2000. In that case, the points that the Windows 2000 Start menu used for Intellimenus were used to seed the Windows XP point system. In that way, the front page of the Start menu for an upgraded uses already reflects the programs that the user ran most often on Windows 2000.

    In the initial release of Windows XP, the "artificial points" were assigned so that the first three of the six slots on the most-frequently-used programs list were chosen by Windows and the last three by the computer manufacturer. If your copy of Windows XP was purchased at retail instead of preinstalled by the computer manufacturer, or if the computer manufacturer declined to take advantage of the three slots offered to it (something that never happened in practice), then Windows took two of the three slots that had been offered to the computer manufacturer, leaving the last slot blank. That way, the very first program you ran showed up on your Start menu immediately.

    In Windows XP Service Pack 1, the assignment of the six slots changed slightly. Two of the slots were assigned by Windows, one by the United States Department of Justice, and the last three by the computer manufacturer. (Again, if you bought your copy of Windows XP at retail, then two of the computer manufacturer slots were assigned by Windows and the last was left blank.)

  • The Old New Thing

    Clean-up functions can't fail because, well, how do you clean up from a failed clean-up?

    • 29 Comments

    Commenter Matt asks how you're supposed to handle failures in functions like fclose or CloseHandle. Obviously, you can't. If a clean-up function fails, there's not much you can do because, well, how do you clean up from a failed clean-up?

    These clean-up functions fall into the category of "Must not fail for reasons beyond the program's control." If a program tries to close a file and it gets an error back, what can it do? Practically speaking, nothing. The only way a clean-up function can fail is if the program fundamentally screws up, say by attempting to close something that was never open or otherwise passing an invalid parameter. It's not like a program can try to close the handle again (or worse go into loop closing the handle repeatedly until it finally closes).

    Remember this when writing your own clean-up functions. Assuming the parameters are valid, a clean-up function must succeed.

    (I will now ruin my rhetorical flourish by yammering about the fclose function because if I don't, people will bring it up in the comments anyway. The fclose function does extra work before closing, and that extra work may indeed run into problems, but the important thing is that when fclose returns, the stream is well and truly closed.)

    Addendum: Once again I wish to emphasize that while it may be possible for functions like fclose to run into errors while they are closing the stream, the point is that the result of the call to fclose is always a closed stream. I think most of the comments are losing sight of the point of my article and rat-holding on the various ways fclose can run into errors. That's not the same as failing. It never fails; it always succeeds, but possibly with errors.

  • The Old New Thing

    The old-fashioned theory on how processes exit

    • 27 Comments

    Life was simpler back in the old days.

    Back in the old days, processes were believed to be in control of their threads. You can see this in the "old fashioned" way of exiting a process, namely by exiting all the threads. This method works only if the process knows about all the threads running in it and can get each one to clean up when it's time for the process to exit.

    In other words, the old-fashioned theory was that when a process wanted to exit, it would do something like this:

    • Tell all its threads that the show is over.
    • Wait for each of those threads to finish up.
    • Exit the main thread (which therefore exits the process).

    Of course, that was before the introduction of programming constructions that created threads that the main program didn't know about and therefore had no control over. Things like the thread pool, RPC worker threads, and DLLs that create worker threads (something still not well-understood even today).

    The world today is very different. Next time, we'll look at how this simple view of processes and threads affects the design of how processes exit.

    Still, you learned enough today to be able to solve this person's problem.

  • The Old New Thing

    Is DEP on or off on Windows XP Service Pack 2?

    • 11 Comments

    Last time, we traced an IP_ON_HEAP failure to a shell extension that used an older version of ATL which was not DEP-friendly. But that led to a follow-up question:

    Why aren't we seeing this same crash in the main program as in the shell extension? That program uses the same version of ATL, but it doesn't crash.

    The reason is given in this chart. Notice that the default configuration is OptIn, which means that DEP is off for all processes by default, but is on for all Windows system components. That same part of the page describes how you can change to OptOut so that the default is to turn on DEP for all processes except for the ones you put on the exception list. There's more information on this excerpt from the "Changes to Functionality in Microsoft Windows XP Service Pack 2" document.

    The program that comes with the shell extension is not part of Windows, so DEP is disabled by default. But Explorer is part of Windows, so DEP is enabled for Explorer by default. That's why only Explorer encounters this problem.

    (This little saga does illustrate the double-edged sword of extensibility. If you make your system extensible, you allow other people to add features to it. On the other hand, you also allow other people to add bugs to it.)

    The saga of the DEP exception is not over, however, because it turns out I've been lying to you. More information tomorrow.

Page 6 of 453 (4,526 items) «45678»