Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

What's wrong with this code, part 6

What's wrong with this code, part 6

  • Comments 45

Today, let’s look at a trace log writer.  It’s the kind of thing that you’d find in many applications; it simply does a printf and writes its output to a log file.  In order to have maximum flexibility, the code re-opens the file every time the application writes to the filelog.  But there’s still something wrong with this code.

This “what’s wrong with this code” is a little different.  The code in question isn’t incorrect as far as I know, but it still has a problem.  The challenge is to understand the circumstances in which it doesn’t work.

/*++
 * LogMessage
 *      Trace output messages to log file.
 *
 * Inputs:
 *      FormatString - printf format string for logging.
 *
 * Returns:
 *      Nothing
 *     
 *--*/
void LogMessage(LPCSTR FormatString, ...)
#define LAST_NAMED_ARGUMENT FormatString
{
    CHAR outputBuffer[4096];
    LPSTR outputString = outputBuffer;
    size_t bytesRemaining = sizeof(outputBuffer);
    ULONG bytesWritten;
    bool traceLockHeld = false;
    HANDLE traceLogHandle = NULL;
    va_list parmPtr;                    // Pointer to stack parms.
    EnterCriticalSection(&g_TraceLock);
    traceLockHeld = TRUE;
    //
    // Open the trace log file.
    //
    traceLogHandle = CreateFile(TRACELOG_FILE_NAME, FILE_APPEND_DATA, FILE_SHARE_READ, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    if (traceLogHandle == INVALID_HANDLE_VALUE)
    {
        goto Exit;

    }
    //
    // printf the information requested by the caller onto the buffer
    //
    va_start(parmPtr, FormatString);
    StringCbVPrintfEx(outputString, bytesRemaining, &outputString, &bytesRemaining, 0, FormatString, parmPtr);
    va_end(parmPtr);
    //

    // Actually write the bytes.
    //
    DWORD lengthToWrite = static_cast<DWORD>(sizeof(outputBuffer) - bytesRemaining);
    if (!WriteFile(traceLogHandle, outputBuffer, lengthToWrite, &bytesWritten, NULL))
    {
        goto Exit;
    }
    if (bytesWritten != lengthToWrite)
    {
        goto Exit;
    }
Exit:
    if (traceLogHandle)
    {
        CloseHandle(traceLogHandle);
    }
    if (traceLockHeld)
    {
        LeaveCriticalSection(&g_TraceLock);
        traceLockHeld = FALSE;
    }
}

One hint: The circumstance I’m thinking of has absolutely nothing to do with out of disk space issues. 

As always, answers and kudos tomorrow.

 

  • Skywing:

    Yes, I misread the code, my head somehow made the variable global since the lock was global otherwise the variable didn't make much sense....

    My bad.
  • Well, I believe you mix to different issues together, whoever would find it useful to check the validity of the memory a pointer points to? As long as it doesn't point to NULL which most of all access violation I ever encounter are. People tend to go to extreme in both direction, checking nothing, checking everything, the way to go is probably in the middle.

    But anyhow, yes those are the troubles you might get into, and eventually it will lead you to a crash that would have happened earlier anyways. With proper diagnose funtions built in to you exception handling it will not be hard to find, so I will still disagree with you, it is useful and it is possible to do it right.

    C++ exception handling in windows does catch hardware exceptions(SEH) such as access violation(if you want it to), I will not vouch that it won't screw up the OS handling of that exception, if you want to be safe doing SEH go with try/finally to achieve resource guarding.

    However for the thread stack guard page, it is indeed a bit tricky, and the guard bit of the page is cleared the first time it is triggered.

    But then again if you catch exceptions without handling them, or at least have logic to figure out if you want to catch it or not, then you might get into trouble. But then again, you are already in trouble as the application is about to crash, what to do, what to do. For the thread stack issue, I personally would either relocate the stack(if I expect the code running under the try/catch block to do something like that, or merely abort that thread. That is handle the exception silently by ending the thread and cleaning up its resources, letting the rest of the process continue running, and maybe restart the thread(sure you might lose some memory and other resources).

    Let's take a real world example, consider that you have an application that loads plugin dlls for running different kinds of telephony protocols (DTMF, pulse, tones, SIA etc). The application loading them provides an interface and a framework to answer the ISDN calls and send tones or what have you.

    Also consider that this is a 24/7 system and that the customer can write their own modules to plug into this system. Just because you have one misbehaving plugin (each plugin runs in its own thread) does not mean you want to crash the system. All you want to do is recover from the problem, kill the thread(if you deam the problem serious enough), start a new one, log vastely and live with the fact that you lost some resources (perhaps a handle or two and some memory), and let the customer reboot the system at the convience. Report the problem to the writer of the module for proper analysis.

    Also imagine that this happens in the middle of the night with no immediate support on site and that this is actually a 911 alarm central.

    You could argue that things should be tested better, you could argue that it is good to find faults immidiately, I will only agree to the extend that you should find the fault as early as possible, but on a live site it is better to function, however poorly, than not to function at all.

    If one thread goes bazooka, who cares, as long as the 400 other on going alarm calls can continue?

    Later, when you want to diagnose this odd behavior (let's say it is odd like in the post above), all you do is run your system in your _lab_ in your debugger catching all exceptions before the application, run through the "traffic" scenario(you have logs right?) and wait for the exception to happen. Or merely check your exception log to see if you had any odd exceptions occuring that could take you to this fault.

    Now instead imagine that you didn't catch that exception, this is an alarm dispatcher central remember, the sender of the alarm that caused the system to crash will surely try again, it must get through, it will repeat and rinse the system all night, the 400 other calls per minute will not get handled and suddenly alot of people didn't get through to 911 with life threatening issues. I do not believe they will settle for an explanation "It was easier for us to diagnose this way"

    I agree with you, use exceptions with care, and try to know what you are doing. But please do make sure that your application do not crash. You can use any insane amount of ASSERT and aborts you want in a debug build, but your retail build should let the user of the program make the decision to restart it or not. He or she doesn't care if it is easier to find unless he or she can get immediate support.

    Take the Visual Studio 2003 for instance, it crashes so many times it is annoying on a day's work due to for instance failing plugins, or odd behaviour from a source control provider. Me as a user will blame the studio, even if it is the plug in or SCC that causes it to crash.

    Internet explorer used to do the same if a plugin failed it would crash, which of course in return gave the IE the reputation of crashing.

    Much like the BSOD reputation of Windows mostly comes from badly written device drivers than windows itself(however I will agree that this one is a few magnitudes harder to guard against =).

    I hope you get my point however, that unlikely events do occur, be paranoid, know your safe passage out in case of an emergency. To me it is as silly as not knowing the fire escape routes and instead in case of a fire just drop down and die on the spot(with a note saying "This couldn't happen"). Or not having some extra fuel and spare tire in the trunk of your car, sure it might help, but it might just take you those few extra miles to get to proper maintance place.

    In the case of this Log function, it is important to remember that you are only logging, logging should never cause the system to crash... Even if you think it is stupid to let a strangely formatted printf string get through because someone used a %s instead of a %d. It is always better to analyse that in the lab than on customer sites...

    Those are my pennies.
  • Niclas, you bring up enough interesting points that I want to take them in turn in a separate post, so I'm going to defer responding to that part until somewhat later.
  • Although this was not an intended answer, in my experience it is one: When CreateFile fails it can return either 0 or -1. I don't recall which of these values is INVALID_HANDLE_VALUE, but it doesn't matter. A test for failure has to do <= 0 instead of == INVALID_HANDLE_VALUE.

    Another unintended answer: Any handle might happen to be the straw that breaks the camel's back, yielding a BSOD.
    http://support.microsoft.com/?scid=kb;ja;195857
    (According to the article MS only reproduced the problem when the handles are for registry keys, but the description implies that the problem can happen with any kind of handle.)
  • IIRC, that bug is only present on Win9x, and only then when opening handles to drivers.
  • I agree with you Larry, these are very interesting topics, sadly and usually it becomes a religious war, especially for parameter validation.

    I wish there was a simple rule above all, my simple rule is, never crash. But then again I wouldn't want to take that to the extreme either, sometimes you have no options but to restart the application.

    There are cons and pros with everything and I eagerly await your answer, as it is always fun to get insight into more intricate problems =), besides you have a ton of interesting points.


    I would also like to add that of course even if the application mentioned in my example would reboot all night until support personal got there (usually within the hour depending on how sleepy they are) it would redirect all calls to a backup system if it notices that it is thrashing a certain protocol. The backup system usually is a low capacity hardware based receiver.

    What we wanted to prevent was that one protocol could group punish the rest of the system =). And basically we know, even if we try to write fault free code, we know that we will make mistakes, so it is safer to also make it fault tolerant.
  • Regarding exceptions, I totally agree with you Larry, that an API like this should not swallow unexpected exceptions.

    Deciding on life-or-death issues for the whole process in the face of unexpected exceptions is the responsibility of a higher-level function.

    Not something that a low-level function like LogMessage should do.

    But, I think that to be perfectly safe you should still use __try __finally to avoid leaving open handles and held critical sections.
    Each function should guarantee that it performs its own cleanup properly even under unexpected circumstances.

    You could rewrite the function like this:

    void LogMessage(LPCSTR FormatString, ...)
    #define LAST_NAMED_ARGUMENT FormatString
    {
    CHAR outputBuffer[4096];
    LPSTR outputString = outputBuffer;
    size_t bytesRemaining = sizeof(outputBuffer);
    ULONG bytesWritten;
    bool traceLockHeld = false;
    HANDLE traceLogHandle = NULL;
    va_list parmPtr; // Pointer to stack parms.
    EnterCriticalSection(&g_TraceLock);
    __try {
    //
    // Open the trace log file.
    //
    traceLogHandle = CreateFile(TRACELOG_FILE_NAME, FILE_APPEND_DATA, FILE_SHARE_READ, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    if (traceLogHandle == INVALID_HANDLE_VALUE)
    // return inside a __try block passes control to the __finally block
    return;
    //
    // printf the information requested by the caller onto the buffer
    //
    va_start(parmPtr, FormatString);
    StringCbVPrintfEx(outputString, bytesRemaining, &outputString, &bytesRemaining, 0, FormatString, parmPtr);
    va_end(parmPtr);
    //
    // Actually write the bytes.
    //
    DWORD lengthToWrite = static_cast<DWORD>(sizeof(outputBuffer) - bytesRemaining);
    if (!WriteFile(traceLogHandle, outputBuffer, lengthToWrite, &bytesWritten, NULL))
    // return inside a __try block passes control to the __finally block
    return;
    }
    }
    __finally {
    if (traceLogHandle != INVALID_HANDLE_VALUE) {
    CloseHandle(traceLogHandle);
    }
    LeaveCriticalSection(&g_TraceLock);
    }
    }


    Imagine that the user of LogMessage tries to log a message with a badly formatted printf-string - perhaps using %s instead of %d for example.

    Your original code would have left the critical section held which would mean that no other thread in the process could log anything.

  • Just to elaborate a little on my standpoint regarding exception handling:

    I think that Niclas is right that there are definitely processes that have a valid reason to try to keep running even in the face of unexpected exceptions.

    The correct way to do that is to put a try/catch around the block of code that could cause unexpected exceptions. It could be around a call to third-party code, or perhaps all the code in a worker thread could be encapsulated in a try/catch.

    The high-level exception handler should of course log as much information as it can about the exception, and then it could try to continue.

    The problem is that unless the lower-level code is written in an exception-safe way (using __finally for example), the process might still hang because of issues like abondoned critical sections and open files.

    It is the responsibility of lower-level functions to use __finally to perform any necessary cleanup in the face of unexpected exceptions. Low-level functions should almost never catch unexpected exceptions. They should only catch exceptions that they explicitly know they might cause during normal operations. Imagine for example that CreateFile would have thrown an exception if the file was open by another process. In that case, LogMessage should handle the FileAlreadOpenException and ignore it. It should absolutely not swallow any other exceptions.

    It is the responsibility of higher-level functions to use try/catch to decide if the process should be kept alive or not.

  • buffer overflow?!?!?!
  • "Because OutputDebugString only outputs stuff when there's a debugger attached."

    This isn't true. And since I don't remember the details of how to get at this without a debugger, I will just say go to www.sysinternals.com and look at the debugview it doesn't attach to a process, it just intercepts the messages.


    I actually wrote my own version of debugview before discovering the tool, perhaps I will dedicate some blog space to how to achieve this.


    Evan
    http://EvanFreeman.blogspot.com

  • Evan, there needs to be SOME application running to intercept the debug trace calls. They don't get queued up somewhere waiting for them to be written.

    So if the user isn't running the trace monitoring tool, they don't see the output.

    Which means that we lose an opportunity to post-mortem a failure.

  • I will just add another thought to this, since I never really pointed it out.

    I do agree that lower level functions should not do exception handling, it is the responsibility of the higher level logic.

    However in this case, the log function is indeed acquiring a lock, which it must not ever forget to release, or it will block all threads in the entire process. That is why this is an exception to the rule. And API:s that handle internal resources must also make sure to supervise and care for them.

    Which is why I proposed using SEH handling with try/finally, but it can also be achieved with C++ exception handling, which ismore portable.

    And about performance of a logger you have a few options, they all have pros and cons.

    One way is to merely queue the log call, and let another low prio thread handle the actual logging. However if you do that, you might not get the logs during a failure(if you don't care for exception handling properly).

    My own thoughts though are that you might as well do all the work right away, since logging should be an exception or failure case, which means we most likely have time to log.


    There are however instances when you do want a seperate thread doing the actual logging, for instance if you are logging from a message pump to a window that message pump is handling. If you log anything before the message pump starts you will hang (if you are using synchronous signalling).
  • 9/10/2004 12:14 AM Niclas Lindgren

    > since logging should be an exception or
    > failure case,

    Not really. In the Windows world, remember that even Windows 95 would write boot logs reporting successful driver start-ups, because at the point where the log ended the user might be able to guess that the failure came after that point. In the modern Windows world, it is often prudent to log every successful login, and maybe successful access to files and registry keys etc. In the Internet world, consider that it is often prudent to log every packet that goes in or out.
Page 3 of 3 (45 items) 123