November, 2010

  • The Old New Thing

    But who's going to set up their own email server?


    Many many years ago, back in the days when Microsoft's email address had exclamation points, an internal tool was developed to permit Microsoft employees to view and update their Benefits information from the comfort of their very own offices. Welcome to the paperless office!

    One of my friends noticed an odd sentence in the instructions for using the tool: "Before running the program, make sure you are logged onto your email server."

    "That's strange," my friend thought. "Why does it matter that you're logged onto your email server? This tool doesn't use email."

    Since my friend happened at the time to be a tester for Microsoft's email product, he tried a little experiment. He created a brand new email server on one of his test machines and created an account on it called billg. He then signed onto that email server and then ran the tool.

    Welcome, Bill Gates. Here are your current Benefits selections...

    "Uh-oh," my friend thought. "This is a pretty bad security hole." The tool apparently performed authentication by asking your email server, "Hey, who are you logged in as?" The answer that came back was assumed to be an accurate representation of the user who is running the tool. The back-end server itself was not secured at all; it relied on the client application to do the security checks.

    My friend sent email to the vice president of Human Resources informing him of this problem. "You need to shut down this tool immediately. I have found a security hole that allows anybody to see anybody else's Benefits information."

    The response from the vice president of Human Resources was calm and reassuring. "My developers tell me that the tool is secure. Just enjoy the convenience of updating your Benefits information electronically."

    Frustrated by this, my friend decided to create another account on his test email server, namely one corresponding to the vice president of Human Resources. He then sent the vice president another email message.

    "Please reconsider your previous decision. Your base salary is $xxx and your wife's name is Yyyy. Would you like me to remind you one week before your son's tenth birthday? It's coming up next month."

    A reply was quickly received. "We're looking into this."

    Shortly thereafter, the tool was taken offline "for maintenance."

    Bonus reading: JenK shares her experience with the same incident.

  • The Old New Thing

    Consequences of using variables declared __declspec(thread)


    As a prerequisite, I am going to assume that you understand how TLS works, and in particular how __declspec(thread) variables work. There's a quite thorough treatise on the subject by Ken Johnson (better known as Skywing), who comments quite frequently on this site. The series starts here and continues for a total of 8 installments, ending here. That last page also has a table of contents so you can skip over the parts you already know to get to the parts you don't know.

    Now that you've read Ken's articles...

    No, wait I know you didn't read them and you're just skimming past it in the hopes that you will be able to fake your way through the rest of this article without having read the prerequisites. Well, okay, but don't be surprised when I get frustrated if you ask a question that is answered in the prerequisites.

    Anyway, as you learned from Part 5 of Ken's series, the __declspec(thread) model, as originally envisioned, assumed that all DLLs which use the feature would be present at process startup, so that all the _tls_index values can be computed and the total sizes of each module's TLS data can be calculated before any threads get created. (Well, okay, the initial thread already got created, but that's okay; we'll set up that thread's TLS before we execute any application code.)

    If you loaded a __declspec(thread)-dependent module dynamically, bad things happened. For one, TLS data was not set up for any pre-existing threads, since those threads were initialized before your module got loaded. Windows doesn't have a time machine where it can go back in time to when those threads were initialized and pre-reserve space for the TLS variables your new module needed. Nope, your module is just out of luck with respect to those pre-existing threads, and if it tries to use __declspec(thread) variables, it'll find that its TLS slot never got initialized, and there's no data there to access.

    Unfortunately, there's an even worse problem, which Ken quite ably elaborates on in Part 6: The _tls_index variable inside the module arrived after the train left the station. All those TLS indices were assigned at process initialization. When it loads dynamically, the _tls_index variable just sits there, and nobody bothers to initialize it, leaving it at its default value of zero. (Too bad the compiler didn't initialize it to TLS_OUT_OF_INDEXES.) As a result, the module thinks that its TLS variables are at slot zero in the TLS array, leading to what Ken characterizes as "one of the absolute worst possible kinds of problems to debug": Two modules both think they are the rightful owners of the same data, each with a different concept of what that data is supposed to be. It'd be like if there was a bug in HeapAllocate where it returned the same pointer to two separate callers. Each caller would use the memory, cheerfully believing that the values the code writes to the memory will be there when it comes back.

    What truly frightens me is that there's at least one person who considers this horrific data corruption bug a feature. webcyote calls this bug "sharing all variables between the EXE and the DLL" and complained that fixing the bug breaks programs that "depend on the old behavior". That's like saying "We found that if we use this exact pattern of memory allocations, we can trick HeapAllocate into allocating the same memory twice, so we will have our EXE allocate some memory, then perform the magic sequence of allocations, and then load the DLL, and then the DLL will call HeapAllocate to allocate some memory, and it will get the same pointer back, and now the EXE and DLL can share memory."


    Mind you, this crazy "EXE and DLL sharing thread variables" trick is extremely fragile. You have to intentionally delay loading the DLL until after process startup. (If you load it as part of an explicit dependency, then you don't trigger the bug and the DLL gets its own set of variables as intended.) And then you have to make sure that the EXE and DLL declare exactly the same variables in exactly the same order and link the OBJ files in exactly the right sequence, so that all the offsets match. Oh, and you have to make sure your DLL is loaded only into the EXE with which it is in cahoots. If you load it into any other EXE, it will start corrupting that EXE's thread variables. (Or, if the EXE doesn't use thread variables, it'll corrupt some other random DLL's thread variables.)

    If the feature had been intended to be used in this insane way, they would have been called "shared variables" instead of "thread variables". No wait, they would have been called "thread variables that sometimes end up shared under conditions outside your DLL's control."

    I wonder if Webcyote also drives a manual transmission and just slams the gear stick into position without using the clutch. Yes, you can do it if you are really careful and get everything to align just right, but if you mess up, your transmission explodes and spews parts all over the road.

    Don't abuse a bug in the loader. If you want shared variables, then create shared variables. Don't create per-thread variables and then intentionally trigger a bug that causes them to overlay each other by mistake. That's such a crazy idea that it probably never occurred to anyone that somebody would actually build a system that relies on it!

    Exercise: A customer ran into a problem with the "inadvertently sharing variables between the EXE and the DLL" bug. Here is the message from the customer liaison:

    My customer has a DLL that uses static thread local storage (__declspec(thread)), and he wants to use this DLL from his C# program. Unfortunately, he is running into the limitation when running on Windows XP that DLLs which use static thread local storage crash when they try to access their thread variables. The customer cannot modify the DLL. What do you recommend?

    Update: Commenter shf gives the most complete answer.

  • The Old New Thing

    The curse of the current directory


    The current directory is both a convenience and a curse. It's a convenience because it saves you a lot of typing and enables the use of relative paths. It's a curse because of everything else.

    The root cause of this curse is that the Windows NT family of operating systems keeps open a handle to the process's current directory. (Pre-emptive Yuhong Bao comment: The Windows 95 series of operating systems, on the other hand, did not keep the current directory open, which had its own set of problems not relevant to this discussion.)

    The primary consequence of this curse is that you can't delete a directory if it is the current directory of a running process. I see people stumble upon this all the time without realizing it.

    I am trying to delete a directory X, but when I try, I get the error message The process cannot access the file because it is being used by another process.. After some hunting around, I found that directory X is being held open by someapp.exe. Why the heck is someapp.exe holding my directory open, and how do I get it to stop?

    The value of someapp.exe changes over time, but the underlying problem is the same. And when this happens, people tend to blame someapp.exe for stupidly holding a directory open.

    Most of the time, someapp.exe is just a victim of the curse of the current directory.

    First, let's take the case where someapp.exe is explorer.exe. Why is the current directory of Explore set to this directory?

    Well, one reason might be another curse of the current directory, namely, that the current directory is a process-wide setting. If a shell extension decided to call SetCurrentDirectory, then that changes the current directory for all of Explorer. And if that shell extension doesn't bother to call SetCurrentDirectory a second time to reset the current directory to what it was, then the current directory gets stuck at the new directory, and Explorer has now been conned into changing its current directory permanently to your directory.

    Mind you, the shell extension might have tried to do the right thing by setting the current directory back to its original location, but the attempt might have failed:

    GetCurrentDirectory(Old) // returns C:\Previous
    SetCurrentDirectory(New) // changes to C:\Victim
    .. do stuff ..
    SetCurrentDirectory(Old) // changes to C:\Previous - fails?

    That second call to SetCurrentDirectory can fail if, while the shell extension is busy doing stuff, the directory C:\Previous is deleted. Now the shell extension can't change the directory back, so it's left stuck at C:\Victim, and now you can't delete C:\Victim because it is Explorer's new current directory.

    (The preferred behavior, by the way, is for the shell extension not to call SetCurrentDirectory in the first place. Just operate on full paths. Since the current directory is a process-wide setting, you can't be sure that some other thread hasn't called SetCurrentDirectory out from under you.)

    Mind you, making the current directory a per-thread concept doesn't solve this problem completely, because the current directory for the thread (if such a thing existed) would still have a handle open until the thread exited. But if the current directory had been a per-thread concept, and if the thread were associated with an Explorer window, then closing that window would at least encourage that thread to exit and let you unstick the directory. That is, unless you did a Terminate­Thread, in which case the handle would be leaked and your attempt to release the handle only ensures that it never happens. (Note to technology hypochondriacs: This paragraph was a hypothetical and consequently will be completely ineffective at solving your problem.)

    The story isn't over yet, but I'll need to digress for a bit in order to lay the groundwork for the next stage of the curse.

    Bonus chatter: Hello, people. "The story isn't over yet." Please don't try to guess the next chapter in the story.

  • The Old New Thing

    Why does the Win32 Time service require the date to be correct before it will set the time?


    Public Service Announcement: Daylight Saving Time ends in most parts of the United States this weekend.

    Andy points out that if you attempt to synchronize your clock when the date is set incorrectly, the operation fails with the error message "An error occurred while Windows was synchronizing with For security reasons, Windows cannot synchronize with the server because your date does not match. Please fix the date and try again." He wonders what the security risk is.

    First of all, for people who are trying to solve the problem, the solution is to follow the steps in the error message. Set your date to the correct date, then try again. If that doesn't help, also set the time to something close to the correct time. Once your time gets close, the time server can nudge it the rest of the way.

    Back to the original question: What is the security risk being defended against, here?

    At first glance, you might think that the server is attempting to defend itself against a client whose time is set incorrectly, but actually the potential attack is in reverse: Your computer is protecting itself against a rogue time server.

    The Kerberos authentication protocol relies heavily on all participants agreeing on what time it is (with some slop tolerance). If somebody manages to fool the client into synchronizing its time against a rogue server (for example, by using a DNS poisoning attack), the attacker can use that invalid date (typically a backdate) as a foothold for the next level of attacks.

    The default configuration for the Windows Time service is to reject attempts to change the clock on domain-joined machines by more than 15 hours. You can change the configuration settings by following the instructions in this KB article (which happens also to have been the source material for most of this article).

  • The Old New Thing

    Why does the common file dialog change the current directory?


    When you change folders in a common file dialog, the common file dialog calls Set­Current­Directory to match the directory you are viewing. (Don't make me bring back the Nitpicker's Corner.)

    Okay, the first reaction to this is, "What? I didn't know it did that!" This is the other shoe dropping in the story of the curse of the current directory.

    Now the question is, "Why does it do this?"

    Actually, you know the answer to this already. Many programs require that the current directory match the directory containing the document being opened.

    Now, it turns out, there's a way for you to say, "No, I'm not one of those lame-o programs. I can handle current directory being different from the document directory. Don't change the current directory when using a common file dialog." You do this by passing the OFN_NO­CHANGE­DIR flag. (If your program uses the IFile­Dialog interface, then NO­CHANGE­DIR is always enabled. Hooray for progress.)

    But now that you know about this second curse, you can actually use it as a counter-curse against the first one.

    If you determine that a program is holding a directory open, and you suspect that it is the victim of the curse of the current directory, you can go to that program and open a common file dialog. (For example, Save As.) From that dialog, navigate to some other directory you don't plan on removing, say, the root of the drive, or your desktop. Then cancel the dialog.

    Since the common file dialog changes the current directory, you have effectively injected a Set­Current­Directory call into the target process, thereby changing it from the directory you want to remove. Note, however, that this trick works only if the application in question omits the OFN_NO­CHANGE­DIR flag when it calls Get­Save­File­Name.

    In Explorer, you can easily call up a common file dialog by typing Win+R then clicking Browse, and in versions of Windows up through Windows XP, Explorer didn't pass the OFN_NO­CHANGE­DIR flag.

  • The Old New Thing

    Your debugging code can be a security vulnerability: Loading optional debugging DLLs without a full path


    Remember, the bad guys don't care that your feature exists just for debugging purposes. If it's there, they will attack it.

    Consider the following code:

    DOCLOADINGPROC g_pfnOnDocLoading;
    void LoadDebuggingHooks()
     HMODULE hmodDebug = LoadLibrary(TEXT("DebugHooks.dll"));
     if (!hmodDebug) return;
     g_pfnOnDocLoading = (DOCLOADINGPROC)
                   GetProcAddress(hmodDebug, "OnDocLoading");
    HRESULT LoadDocument(...)
     if (g_pfnOnDocLoading) {
       // let the debugging hook replace the stream

    When you need to debug the program, you can install the DebugHooks.dll DLL into the application directory. The code above looks for that DLL and if present, gets some function pointers from it. For illustrative purposes, I've included one debugging hook. The idea of this example (and it's just an example, so let's not argue about whether it's a good example) is that when we're about to load a document, we call the OnDocLoading function, telling it about the document that was just loaded. The OnDocLoading function wraps the IStream inside another object so that the contents of the document can be logged byte-by-byte as it is loaded, in an attempt to narrow down exactly where document loading fails. Or it can be used for testing purposes to inject I/O errors into the document loading path to confirm that the program behaves properly under those conditions. Use your imagination.

    But this debugging code is also a security vulnerability.

    Recall that the library search path searches directories in the following order:

    1. The directory containing the application EXE.
    2. The system32 directory.
    3. The system directory.
    4. The Windows directory.
    5. The current directory.
    6. The PATH.

    When debugging your program, you install DebugHooks.dll into the application directory so that it is found in step 1. But when your program isn't being debugged, the search in step 1 fails, and the search continues in the other directories. The DLL is not found in steps 2 through 4, and then we reach step 5: The current directory.

    And now you're pwned.

    Your application typically does not have direct control over the current directory. The user can run your program from any directory, and that directory ends up as your current directory. And then your LoadLibrary call searches the current directory, and if a bad guy put a rogue DLL in the current directory, your program just becames the victim of code injection.

    This is made particularly dangerous when your application is associated with a file type, because the user can run your application just by double-clicking an associated document.

    When you double-click a document, Explorer sets the current directory of the document handler application to the directory that contains the document being opened. This is necessary for applications which look around in the current directory for supporting files. For example, consider a hypothetical application LitWare Writer associated with *.LIT files. A LitWare Writer document ABC.LIT file is really just the representative for a family of files, ABC.LIT (the main document), ABC.LTC (the document index and table of contents), ABC.LDC (the custom spell check dictionary for the document), ABC.LLT (the custom document layout template), and so on. When you open the document C:\PROPOSAL\ABC.LIT, LitWare Writer looks for the other parts of your document in the current directory, rather than in C:\PROPOSAL. To help these applications find their files, Explorer specifies to the CreateProcess function that it should set the initial current directory of LitWare Writer to C:\PROPOSAL.

    Now, you might argue that programs like LitWare Writer (which look for the ancillary files of a multi-file document in the current directory instead of the directory containing the primary file of the multi-file document) are poorly-written, and I would agree with you, but Windows needs to work even with poorly-written programs. (Pre-emptive snarky comment: Windows is itself a poorly-written program.) There are a lot of poorly-written programs out there, some of them industry leaders in their market (see above pre-emptive snarky comment) and if Windows stopped accommodating them, people would say it was the fault of Windows and not the programs.

    I can even see in my mind's eye the bug report that resulted in this behavior being added to the MS-DOS Executive:

    "This program has worked just fine in MS-DOS, but in Windows, it doesn't work. Stupid Windows."

    Customers tend not to be happy with the reply, "Actually, that program has simply been lucky for the past X years. The authors of the program never considered the case where the document being opened is not in the current directory. And it got away with it, because the way you opened the document was to use the chdir command to move to the directory that contained your document, and then to type LWW ABC.LIT. If you had ever done LWW C:\PROPOSAL\ABC.LIT you would have run into the same problem. The behavior is by design."

    In response to "The behavior is by design" is usually "Well, a design that prevents me from getting my work done is a crappy design." or a much terser "No it's not, it's a bug." (Don't believe me? Just read Slashdot.)

    So to make these programs work in spite of themselves, the MS-DOS Executive sets the current directory of the program being launched to the directory containing the document itself. This was not an unreasonable decision because it gets the program working again, and it's not like the program cared about the current directory it inherited from the MS-DOS Executive, since it had no control over that either!

    But it means that if you launched a program by double-clicking an associated document, then unless that program takes steps to change its current directory, it will have the document's containing folder as its current directory, which prevents you from deleting that directory.

    Bonus chatter: I wrote this series of entries nearly two years ago, and even then, I didn't consider this to be anything particularly groundbreaking, but apparently some people rediscovered it a few months ago and are falling all over themselves to claim credit for having found it first. It's like a new generations of teenagers who think they invented sex. For the record, here is some official guidance. (And just to be clear, that's official guidance on the current directory attack, not official guidance on sex.)

    History chatter: Why is the current directory even considered at all? The answer goes back to CP/M. In CP/M, there was no PATH. Everything executed from the current directory. The rest is a chain of backward compatibility.

  • The Old New Thing

    Why doesn't the End Task button end my task immediately?


    Commenter littleguru asks, "Why does the End Now button not kill the process immediately?"

    When you click the End Now button, the process really does end now, but not before a brief message from our sponsor.

    When you kill a hung application, Windows Error Reporting steps in to record the state of the hung application so it can be submitted to the mother ship (with your permission). If you are running Windows XP or Windows Vista, you can briefly see a process called dumprep.exe or WerFault.exe; these are the guys who are doing the data collection.

    After being uploaded to Microsoft, these failure reports are studied to determine why the application stopped responding and what could be done to fix it. I've been asked to do quite a few of these analyses myself, and sometimes it's something pretty mundane (an application sends a cross-thread message while holding a critical section, and the thread can't receive the message because it's stuck waiting for the critical section that the sender is holding—classic deadlock), and sometimes it's something pretty weird (application has a bug if the number of sound output devices is not equal to one). Whatever the reason, I write up my analysis, and the people who are in charge of such things make arrangements for the information to be sent back to the vendors who wrote the application (assuming the vendors are registered with Winqual).

    If you don't want Windows Error Reporting to collect application crash and hang reports, you can disable it from the Group Policy Editor under Windows Error Reporting. Of course, if you do this, then you don't get to vote on which program crashes and failures Microsoft should work on fixing.

    Note: This entry is an experiment: I mentioned Windows Error Reporting and WHQL. If people complain about digital certificate authorities, that'll just confirm my bias against returning to those old debugging stories.

    Update: Experimental results obtained. No more stories involving Windows Error Reporting and WHQL.

  • The Old New Thing

    How full does a hard drive have to get before Explorer will start getting concerned?


    The answer depends on which "hard drive almost full" warning you're talking about.

    Note that these boundaries are merely the current implementation (up until Windows 7). Future versions of Windows reserve the right to change the thresholds. The information provided is for entertainment purposes only.

    The thermometer under the drive icon in My Computer uses a very simple algorithm: A drive is drawn in the warning state when it is 90% full.

    The low disk space warning balloon is more complicated. The simplified version is that it warns of low disk space on drives bigger than about 3GB when free disk space drops below 200MB. The warnings become more urgent when free disk space drops below 80MB, 50MB, and finally 1MB. (For drives smaller than 3GB, the rules are different, but nobody—to within experimental error—has hard drives that small anyway, so it's pretty much dead code now.)

    These thresholds cannot be customized, but at least you can turn off the low disk space balloons.

  • The Old New Thing

    If you measure something, people will change their behavior to address the measurement and not the thing the measurement is intended to measure


    We all know that once you start measuring something, people will change the way they behave. We hope that the change is for the better, but that's not always the case, and that's especially true if you are using the metrics as a proxy for something else: People will manipulate the metric without necessarily affecting the thing that your metric is trying to measure.

    I was reminded of this topic when I read a story in The Daily WTF of a manager who equated number of checkins with productivity.

    One metric that nearly all software products use to gauge productivity and product progress is the number of bugs outstanding and the number of bugs fixed. Of course, not all bugs are created equal: Some are trivial to fix; others are substantial. But if you believe that the difficulty distribution of bugs, while not uniform, is at least unbiased, then the number of bugs is roughly proportional to the amount of work. The bug count is just a rough guide, of course. Everybody works together, with programmers promising not to manipulate the metrics, and managers promising not to misinterpret them.

    At least that's how it's supposed to work.

    (All that text up to this point is useless. When you're telling a story, you have to include a lot of useless text in order to motivate or set the scene for the actual story that comes up next or just to make the story sound like an actual story instead of just a sequence of events. What amazes me is that so many people seem to focus on the "literary throat-clearing" and miss the actual story!)

    A friend of mine told me about a project from many years (and jobs) ago. Things were pretty hectic, people were working late, it was a stressful time. The bug statistics were gathered by an automated process that ran at 4am, and every day, management would use those statistics as one factor in assessing the state of the project.

    My friend was wrapping up another late night at the office after polishing off a few bugs, and as a final gesture, re-ran the bug query to enjoy the satisfaction of seeing the number of bugs go down.

    Except it went up.

    What happened is that another member of the project was also working late, and that other member had a slightly different routine for wrapping up at the end of the day: Run the query and look at the number next to your name. If it is higher than you would like, then take some of your bugs and transfer them to the other members of the team. Choose a victim, add a comment like "I think this is a problem with the XYZ module" (where XYZ is the module the victim is responsible for), and reassign the bug to your victim. It helps if you choose victims who already have a lot of bugs, so they might not even notice that you slipped them another one.

    By following this simple nightly routine, you get management off your case for having too many outstanding bugs. In fact, they might even praise you for your diligence, since you never seem to be behind on your work.

    Of course, management looks at these manipulated numbers and gets a false impression of the state of the project. But if you're not one of those "team player" types, then that won't matter to you.

    And if that describes you, then I don't want you working on my project.

  • The Old New Thing

    One possible reason why ShellExecute returns SE_ERR_ACCESSDENIED and ShellExecuteEx returns ERROR_ACCESS_DENIED


    (The strangely-phrased subject line is for search engine optimization.)

    A customer reported that when they called ShellExecute, the function sometimes fails with SE_ERR_ACCESSDENIED, depending on what they are trying to execute. (If they had tried ShellExecuteEx they would have gotten the error ERROR_ACCESS_DENIED.)

    After a good amount of back-and-forth examing file type registrations, a member of the development team had psychic insight to ask, "Are you calling it from an MTA?"

    "Yes," the customer replied. "ShellExecute is being called from a dedicated MTA thread. Would that cause the failure?"

    Why yes, as a matter of fact, and it's called out in the documentation for ShellExecute.

    Because ShellExecute can delegate execution to Shell extensions (data sources, context menu handlers, verb implementations) that are activated using Component Object Model (COM), COM should be initialized before ShellExecute is called. Some Shell extensions require the COM single-threaded apartment (STA) type.

    As a general rule, shell functions require STA. Recall that MTA implies no user interface. If you try to use an apartment-threaded object from your MTA thread, a marshaller is required, and if no such marshaller exists, the call fails.

    This also explains why the failure occurs only for certain file types: If handling the file type happens not to involve creating a COM object, then the MTA/STA mismatch situation never occurs.

Page 1 of 3 (24 items) 123