November, 2012

  • The Old New Thing

    The debugger lied to you because the CPU was still juggling data in the air

    • 28 Comments

    A colleague was studying a very strange failure, which I've simplified for expository purpose.

    The component in question has the following basic shape, ignoring error checking:

    // This is a multithreaded object
    class Foo
    {
    public:
     void BeginUpdate();
     void EndUpdate();
    
     // These methods can be called at any time
     int GetSomething(int x);
    
     // These methods can be called only between
     // BeginUpdate/EndUpdate.
     void UpdateSomething(int x);
    
    private:
     Foo() : m_cUpdateClients(0), m_pUpdater(nullptr) { ... }
    
     LONG m_cUpdateClients;
    
     Updater *m_pUpdater;
    };
    

    There are two parts of the Foo object. One part that is essential to the object's task, and another part that is needed only when updating. The parts related to updating are expensive, so the Foo object sets them up only when an update is active. You indicate that an update is active by calling Begin­Update, and you indicate that you are finished updating by calling End­Update.

    // Code in italics is wrong
    void Foo::BeginUpdate()
    {
     LONG cClients = InterlockedIncrement(&m_cUpdateClients);
     if (cClients == 1) {
      // remember, error checking has been elided
      m_pUpdater = new Updater();
     }
     // else, we are already initialized for updating,
     // so nothing to do
    }
    
    void Foo::EndUpdate()
    {
     LONG cClients = InterlockedDecrement(&m_cUpdateClients);
     if (cClients == 0) {
      // last update client has disconnected
      delete m_pUpdater;
      m_pUpdater = nullptr;
     }
    }
    

    There are a few race conditions here, and one of them manifested itself in a crash. (If two threads call Begin­Update at the same time, one of them will increment the client count to 1 and the other will increment it to 2. The one which increments it to 1 will get to work initializing m_pUpdater, whereas the second one will run ahead on the assumption that the updater is fully-initialized.)

    What we saw in the crash dump was that Update­Something tries to use m_pUpdater and crashed on a null pointer. What made the crash dump strange was that if you actually looked at the Foo object in memory, the m_pUpdater was non-null!

        mov ecx, [esi+8] // load m_pUpdater
        mov eax, [ecx]   // load vtable -- crash here
    

    If you actually looked at the memory pointed-to by ESI+8, the value there was not null, yet in the register dump, ECX was zero.

    Was the CPU hallucinating? The value in memory is nonzero. The CPU loaded a value from memory. But the value it read was zero.

    The CPU wasn't hallucinating. The value it read from memory was in fact zero. The reason why you saw the nonzero value in memory was that in the time it took the null pointer exception to be raised, then caught by the debugger, the other thread managed to finish calling new Updater(), store the result back into memory, and then return back to its caller and proceed as if everything were just fine. Thus, when the debugger went to capture the memory dump, it captured a non-zero value in the dump, and the code which updated m_pUpdater was long gone.

    This type of race condition is more likely to manifest on multi-core machines, because on those types of machines, the two CPUs can have different views of memory. The thread doing the initialization can update m_pUpdater in memory, and other CPUs may not find out about it until some time later. The updated value was still in flight when the crash occurred. Before the debugger can get around to capturing the m_pUpdater member in the crash dump, the in-flight value lands, and what you see in the crash dump does not match what the crashing CPU saw.

  • The Old New Thing

    Various ways of performing an operation asynchronously after a delay

    • 23 Comments

    Okay, if you have a UI thread that pumps messages, then the easiest way to perform an operation after a delay is to set a timer. But let's say you don't have a UI thread that you can count on.

    One method is to burn a thread:

    #define ACTIONDELAY (30 * 60 * 1000) // 30 minutes, say
    
    DWORD CALLBACK ActionAfterDelayProc(void *)
    {
     Sleep(ACTIONDELAY);
     Action();
     return 0;
    }
    
    BOOL PerformActionAfterDelay()
    {
     DWORD dwThreadId;
     HANDLE hThread = CreateThread(NULL, 0, ActionAfterDelayProc,
                                   NULL, 0, &dwThreadId);
     BOOL fSuccess = hThread != NULL;
     if (hThread) {
      CloseHandle(hThread);
     }
     return fSuccess;
    }
    

    Less expensive is to borrow a thread from the thread pool:

    BOOL PerformActionAfterDelay()
    {
     return QueueUserWorkItem(ActionAfterDelayProc, NULL,
                              WT_EXECUTELONGFUNCTION);
    }
    

    But both of these methods hold a thread hostage for the duration of the delay. Better would be to consume a thread only when the action is in progress. For that, you can use a thread pool timer:

    void CALLBACK ActionAfterDelayProc(void *lpParameter, BOOLEAN)
    {
     HANDLE *phTimer = static_cast<HANDLE *>(lpParameter);
     Action();
     DeleteTimerQueueTimer(NULL, *phTimer, NULL);
     delete phTimer;
    }
    
    BOOL PerformActionAfterDelay()
    {
     BOOL fSuccess = FALSE;
     HANDLE *phTimer = new(std::nothrow) HANDLE;
     if (phTimer != NULL) {
      if (CreateTimerQueueTimer(
         phTimer, NULL, ActionAfterDelayProc, phTimer,
         ACTIONDELAY, 0, WT_EXECUTEONLYONCE)) {
       fSuccess = TRUE;
      }
     }
     if (!fSuccess) {
      delete phTimer;
     }
     return fSuccess;
    }
    

    The timer queue timer technique is complicated by the fact that we want the timer to self-cancel, so it needs to know its handle, but we don't know the handle until after we've scheduled it, at which point it's too late to pass the handle as a parameter. In other words, we'd ideally like to create the timer, and then once we get the handle, go back in time and pass the handle as the parameter to Create­Timer­Queue­Timer. Since the Microsoft Research people haven't yet perfected their time machine, we solve this problem by passing the handle by address: The Create­Timer­Queue­Timer function fills the address with the timer, so that the callback function can read it back out.

    In practice, this additional work is no additional work at all, because you're already passing some data to the callback function, probably an object or at least a pointer to a structure. You can stash the timer handle inside that object. In our case, our object is just the handle itself. If you prefer to be more explicit:

    struct ACTIONINFO
    {
     HANDLE hTimer;
    };
    
    void CALLBACK ActionAfterDelayProc(void *lpParameter, BOOLEAN)
    {
     ACTIONINFO *pinfo = static_cast<ACTIONINFO *>(lpParameter);
     Action();
     DeleteTimerQueueTimer(NULL, pinfo->hTimer, NULL);
     delete pinfo;
    }
    
    BOOL PerformActionAfterDelay()
    {
     BOOL fSuccess = FALSE;
     ACTIONINFO *pinfo = new(std::nothrow) ACTIONINFO;
     if (pinfo != NULL) {
      if (CreateTimerQueueTimer(
         &pinfo->hTimer, NULL, ActionAfterDelayProc, pinfo,
         ACTIONDELAY, 0, WT_EXECUTEONLYONCE)) {
       fSuccess = TRUE;
      }
     }
     if (!fSuccess) {
      delete pinfo;
     }
     return fSuccess;
    }
    

    The threadpool functions were redesigned in Windows Vista to allow for greater reliability and predictability. For example, the operations of creating a timer and setting it into action are separated so that you can preallocate your timer objects (inactive) at a convenient time. Setting the timer itself cannot fail (assuming valid parameters). This makes it easier to handle error conditions since all the errors happen when you preallocate the timers, and you can deal with the problem up front, rather than proceeding ahead for a while and then realizing, "Oops, I wanted to set that timer but I couldn't. Now how do I report the error and unwind all the work that I've done so far?" (There are other new features, like cleanup groups that let you clean up multiple objects with a single call, and being able to associate an execution environment with a library, so that the DLL is not unloaded while it still has active thread pool objects.)

    The result is, however, a bit more typing, since there are now two steps, creating and setting. On the other hand, the new threadpool callback is explicitly passed the PTP_TIMER, so we don't have to play any weird time-travel games to get the handle to the callback, like we did with Create­Timer­Queue­Timer.

    void CALLBACK ActionAfterDelayProc(
        PTP_CALLBACK_INSTANCE, PVOID, PTP_TIMER Timer)
    {
     Action();
     CloseThreadpoolTimer(Timer);
    }
    
    BOOL PerformActionAfterDelay()
    {
     BOOL fSuccess = FALSE;
     PTP_TIMER Timer = CreateThreadpoolTimer(
                          ActionAfterDelayProc, NULL, NULL);
     if (Timer) {
      LONGLONG llDelay = -ACTIONDELAY * 10000LL;
      FILETIME ftDueTime = { (DWORD)llDelay, (DWORD)(llDelay >> 32) };
      SetThreadpoolTimer(Timer, &ftDueTime, 0, 0); // never fails!
      fSuccess = TRUE;
     }
     return fSuccess;
    }
    

    Anyway, that's a bit of a whirlwind tour of some of the ways of arranging for code to run after a delay.

  • The Old New Thing

    Why are there both FIND and FINDSTR programs, with unrelated feature sets?

    • 35 Comments
    Jonathan wonders why we have both find and findstr, and furthermore, why the two programs have unrelated features. The find program supports UTF-16, which findstr doesn't; on the other hand, the findstr program supports regular expressions, which find does not.

    The reason why their feature sets are unrelated is that the two programs are unrelated.

    The find program came first. As I noted in the article, the find program dates back to 1982. When it was ported to Windows NT, Unicode support was added. But nobody bothered to add any features to it. It was intended to be a straight port of the old MS-DOS program.

    Meanwhile, one of my colleagues over on the MS-DOS team missed having a grep program, so he wrote his own. Developers often write these little tools to make their lives easier. This was purely a side project, not an official part of any version of MS-DOS or Windows. When he moved to the Windows 95 team, he brought his little box of tools with him, and he ported some of them to Win32 in his spare time because, well, that's what programmers do. (This was back in the days when programmers loved to program anything in their spare time.)

    And that's where things stood for a long time. The official find program just searched for fixed strings, but could do so in Unicode. Meanwhile, my colleague's little side project supported regular expressions but not Unicode.

    And then one day, the Windows 2000 Resource Kit team said, "Hey, that's a pretty cool program you've got there. Mind if we include it in the Resource Kit?"

    "Sure, why not," my colleague replied. "It's useful to me, maybe it'll be useful to somebody else."

    So in it went, under the name qgrep.

    Next, the Windows Resource Kit folks said, "You know, it's kind of annoying that you have to go install the Resource Kit just to get these useful tools. Wouldn't it be great if we put the most useful ones in the core Windows product?" I don't know what sort of cajoling was necessary, but they convinced the Windows team to add a handful of Resource Kit programs to Windows. Along the way, qgrep somehow changed its name to findstr. (Other Resource Kit programs kept their names, like where and diskraid.)

    So there you have it. You can think of the find and findstr programs as examples of parallel evolution.

  • The Old New Thing

    Security vulnerability reports as a way to establish your l33t kr3|)z

    • 29 Comments

    There is an entire subculture of l33t l4x0rs who occasionally pop into our world, and as such have to adapt their communication style to match their audience. Sometimes the adaptation is incomplete.

    I have appended a file exploit.pl which exploits a vulnerability
    in XYZ version N.M.  The result is a denial of service.
    The perl script generates a file, which if double-clicked,
    results in a crash in XYZ.
    
    S00PrA\/\/e$Um#!/usr/bin/perl
    
    system('cls');
    system('color c');
    system('title XYZ DOS Exploit');
    print('
    ----------------------------------------------------
    ****************************************************
    *              __                      $           *
    *   --        |  |     __             $$$          *
    *  |     - -  |__|    |  |           $     | |     *
    *   --  | | | |       |__| \  /\  /   $$$  | |     *
    *     |  - -  |   r   |  |  \/  \/ e     $  -  m   *
    *   --                |  |            $$$          *
    *                                      $           *
    ****************************************************
    ----------------------------------------------------
    ');
    
    sleep 2;
    system('cls');
    print('
    ----------------------------------------------------
    ****************************************************
    *                                      $           *
    *   --                |  |            $$$          *
    *     |  - -  |   L   |__|  /\  /\ 6     $  -  w   *
    *   --  | | | |__     |  | /  \/  \   $$$  | |     *
    *  |     - -  |  |    |__|           $     | |     *
    *   --        |__|                    $$$          *
    *                                      $           *
    ****************************************************
    ----------------------------------------------------
    
    The exploit!
    ');
    sleep 2;
    
    $theexploit = "\0";
    
    open(file, ">exploit.xyz");
    print(file $theexploit);
    
    system('cls');
    print('
    ----------------------------------------------------
    ****************************************************
    *              __                      $           *
    *   --        |  |     __             $$$          *
    *  |     - -  |__|    |  |           $     | |     *
    *   --  | | | |       |__| \  /\  /   $$$  | |     *
    *     |  - -  |   r   |  |  \/  \/ e     $  -  m   *
    *   --                |  |            $$$          *
    *                                      $           *
    ****************************************************
    ----------------------------------------------------
    
    DONE!
    
    Double-click exploit.xyz in XYZ and KABLOOEEYYY!
    ');
    
    sleep 3;
    
    system('cls');
    print('
    ----------------------------------------------------
    ****************************************************
    *              __                      $           *
    *   --        |  |     __             $$$          *
    *  |     - -  |__|    |  |           $     | |     *
    *   --  | | | |       |__| \  /\  /   $$$  | |     *
    *     |  - -  |   r   |  |  \/  \/ e     $  -  m   *
    *   --                |  |            $$$          *
    *                                      $           *
    ****************************************************
    ----------------------------------------------------
    
    CONSTRUCTED BY S00PrA\/\/e$Um
    
    Special thanks to: XploYtr & T3rM!NaT3R.
    ');
    

    You may have trouble finding the exploit buried in that perl script, because the perl script consists almost entirely of graffiti and posturing and chest-thumping. (You may also have noticed a bug.) Here is the script with all the fluff removed:

    $theexploit = "\0";
    
    open(file, ">exploit.xyz");
    print(file $theexploit);
    

    This could've been conveyed in a simple sentence: "Create a one-byte file consisting of a single null byte." But if you did that, then you wouldn't get your chance to put your name up in lights on the screen of a Microsoft security researcher!

    (For the record, the issue being reported was not only known, a patch for it had already been issued at the time the report came in. The crash is simply a self-inflicted denial of service with no security consequences. There isn't even any data loss because XYZ can open only one file at a time, so by the time it crashes, all your previous work must already have been saved.)

  • The Old New Thing

    Puzzling triple rainbow clearly identifies location of pot of gold

    • 11 Comments

    I noted to some friends that the weather forecast for Seattle two weekends ago called for rain on Friday, rain on Saturday, and rain on Sunday. But at least on Monday, the forecast was not for rain.

    It was for heavy rain.

    One of the consequences of Seattle's annual Rain Festival (runs from January 1 to December 31) is that we get plenty of potential for rainbows. A friend of mine was lucky enough to capture a photo of a puzzling triple rainbow this past weekend. The primary and secondary rainbows we all know about, but what's that vertical rainbow shooting straight up into the sky? (And observe that the landing point of the rainbow is clearly in front of a house and trees, so go get your pot of gold.)

    It turns out that the mysterious third rainbow is a reflection rainbow. Reflection rainbows occur when light bounces off a body of water before being refracted by rain droplets. The body of water acts like a mirror and creates a virtual light source, which results in a rainbow that is off-center from the primary.

    Science!

    (I find it interesting that there are some rainbow phenomena that science is still trying to understand. Only a few months ago did researchers figure out how twinned rainbows are formed.)

    Bonus reading: Seattle weather celebrity (yes, we have weather celebrities here) Cliff Mass digs into the triple-rainbow phenomenon, augmenting his analysis with Doppler radar, because that's how he rolls.

  • The Old New Thing

    How does the window manager decide where to place a newly-created window?

    • 16 Comments

    Amit wonders how Windows chooses where to place a newly-opened window on a multiple-monitor system and gives as an example an application whose monitor choice appears inconsistent.

    The easy part is if the application specifies where it wants the window to be. In that case, the window is placed at the requested location. How the application chooses those coordinates is up to the application.

    On the other hand, if the application passes CW_USE­DEFAULT, this means that the application is saying, "I have no opinion where the window should go. Please pick a place for me."

    If this is the first top-level window created by the application with CW_USE­DEFAULT as its position, and the STARTF_USE­POSITION flag is set in the STARTUP­INFO, then use the position provided in the dwX and dwY members.

    Officially, that's all you're going to see in the documentation. Past this point is all implementation detail. I'm providing it here to satisfy your curiosity, but please don't write code that relies on it. (This is, I realize, a meaningless request, but I must go through the motions of making it anyway.)

    Okay, now let's dive into the various levels of automatic window positioning the window manager performs. Remember, these algorithms are not contractual and can change at any time. (In fact, they have changed in the past.) Just to make it harder to rely on this algorithm, I will not tell you which operating system implements the algorithm described below.

    From now on, assume that the application has specified CW_USE­DEFAULT as its position. Also assume that the window is a top-level window.

    First we have to choose a monitor.

    • If the window was created with an owner, then the window goes onto the monitor associated with the owner window. This tends to keep related windows together on the same monitor.
    • Else, if the process was created by the Shell­Execute­Ex function, and the SEE_MASK_HMONITOR flag was passed in the SHELL­EXECUTE­INFO structure, then the window goes onto the specified monitor.
    • Else, the window goes on the primary monitor.

    Next, we have to choose a location on that monitor.

    • If this is the first time we need to choose a default location on a monitor, or if the previous default location is too close to the bottom right corner of the monitor, then act as if the previous default location for the monitor was the upper left corner of the monitor.
    • The next default location on a monitor is offset from the previous default location, diagonally down and to the right.
      • The vertical offset is chosen so that the top edge of the new window lines up against the bottom of the previous window's caption.
      • The horizontal offset is chosen so that the left edge of the new window lines up against the right edge of the caption icon of the previous window.

    The effect of this algorithm is that if you open a bunch of default-positioned windows on a monitor, they line up in a pretty cascade marching down and to the right, until the cascade goes too far, and then they return to the upper left and resume cascading.

    Finally, after choosing a monitor and a location on the monitor, the selected location is adjusted (if possible) so that the window does not span monitors.

    And that's it, the default-window-positioning algorithm, as it existed in an unspecified version of Windows. Remember, this algorithm has been tweaked in the past, and it will get tweaked more in the future, so don't rely on it.

  • The Old New Thing

    The Hater's Guide to the Williams-Sonoma Catalog

    • 7 Comments

    Today is the traditional start of the holiday shopping season in the United States. If you are thinking of getting something from Williams-Sonoma, Drew Margary has selected a few items of note for your consideration (NSFW: language).

    (It cracks me up that the model is using the batter dispenser which "measures out uniform circles" to fill a square waffle-maker.)

  • The Old New Thing

    When studying performance, you need to watch out not only for performance degradation, but also unexpected performance improvement

    • 5 Comments

    In addition to specific performance tests run by individual feature teams, Windows has a suite of automated performance tests operated by the performance team, and the results are collated across a lot of metrics. When a number is out of the ordinary, the test results are flagged for further investigation.

    The obvious thing that the performance metrics look for are sudden drops in performance. If an operation that used to consume 500KB of memory now consumes 750KB of memory, then you need to investigate why you're using so much memory all of a sudden. The reasons for the increase might be obvious, like "Oh, rats, there's a memory leak." Or they might be indirect, like "We changed our caching algorithm." Or "In order to address condition X, we added a call to function Y, but it turns out that function Y allocates a lot of memory." Or it could be something really hard to dig up. "We changed the timing of the scenario, so a window gets shown before it is populated with data, resulting in two render passes instead of one; the first pass is of an empty window, and then second is a when the data is present." Chasing down these elusive performance regressions can be quite time consuming, but it's part of the job.

    (End of exposition.)

    The non-obvious thing is that the performance metrics also look for sudden improvements in performance. Maybe the memory usage plummeted, or the throughput doubled. Generally speaking, a sudden improvement in performance has one of two sources.

    The first is the one you like: The explained improvement. Maybe the memory usage went down because you found a bug in your cache management policy where it hung onto stale data too long. Maybe the throughput improved because you found an optimization that let you avoid some expensive computations in a common case. Maybe you found and fixed the timing issue that was resulting in wasted render passes. (And then there are two types of explained improvements: the expected explained improvement, where something improved because you specifically targeted the improvement, and the unexpected explained improvement, where something improved as an understood side-effect of some other work.)

    The second is the one that you don't like: The unexplained improvement. The memory usage activity went down a lot, but you don't remember making any changes that affect your program's memory usage profile. Things got better but you don't know why.

    The danger here is that the performance gain may be the result of a bug. Maybe the scenario completed with half the I/O activity because the storage system is ignoring flush requests. Or it completed 15% faster because the cache is returning false cache hits.

    At the end of the day, when you finally understand what happened, you can then make an informed decision as to what to do about it. Maybe you can declare it an acceptable degradation and revise the performance baseline. ("Yes, we use more memory to render, but that's because we're using a higher-quality effects engine, and we consider the additional memory usage to be an acceptable trade-off for higher quality output.") Maybe you will look for an alternate algorithm that is less demanding on memory usage, or bypass calling function Y if it doesn't appear that condition X is in effect. Maybe you can offset the performance degradation by improving other parts of the component. Maybe the sudden performance improvement is a bug, or maybe it's an expected gain due to optimizations.

    But until you know why your performance profile changed, you won't know whether the change was good or bad.

    After all, if you don't know why your performance improved, how do you know that it won't degrade just as mysteriously? Today, you're celebrating that your memory usage dropped from 200MB to 180MB. Two weeks from now, when the mysterious condition reverts itself, you'll be trying to figure out why your memory usage jumped from 180MB to 200MB.

  • The Old New Thing

    The resource compiler will helpfully add window styles for you, but if you're building a dialog template yourself, you don't get that help

    • 6 Comments

    A customer was having trouble with nested dialogs. They were doing something very similar to a property sheet, with a main frame dialog, and then a collection of child dialogs that take turns appearing inside the frame dialog based on what the user is doing. The customer found that if they created the child dialogs with the Create­Dialog­Param function, everything worked great, but if they built the template at run-time, keyboard navigation wasn't working right. Specifically, one of their child dialogs contained an edit control, and while you could put focus on it with the mouse, it was not possible to tab to the control. On the other hand, a resource template didn't have this problem. Tabbing in and out worked just fine.

    There is logically no difference between a resource-based dialog template and a memory-based one, because the resource-based one is implemented in terms of the memory-based one.

    The real problem is the memory-based template you created differs somehow from the one the resource compiler generated.

    One way to identify this discrepancy is simply to do a memcmp of the two dialog templates, the resource-based one and the memory-based one, and see where they differ. After all, if you want to know why your template doesn't match the one generated by the resource compiler, you can just ask the resource compiler to generate the template and then compare the two versions.

    Instead of explaining this, I decided to invoke my psychic powers.

    My psychic powers tell me that you neglected to provide the WS_TAB­STOP style to the edit control when you created your in-memory template. (You probably didn't realize that you needed to do this, because the resource compile adds that style by default.)

    When you use the resource compiler to generate a dialog template, it sets a bunch of styles by default, depending on the type of control. For example, EDIT­TEXT says "If you do not specify a style, the default style is ES_LEFT | WS_BORDER | WS_TABSTOP."

    Not mentioned is that the default style is in addition to the defaults for all controls: WS_CHILD | WS_VISIBLE.

    If you want to turn off one of the default styles for a control, you do so with the NOT keyword. For example, if you write

       EDITTEXT IDC_AWESOME, 10, 10, 100, 100, ES_MULTILINE | NOT WS_VISIBLE
    

    the resource compiler starts with the default style of

    dwStyle = WS_CHILD | WS_VISIBLE | ES_LEFT | WS_BORDER | WS_TABSTOP;
    

    then it adds ES_MULTI­LINE:

    dwStyle |= ES_MULTILINE;
    // dwStyle value is now
    // WS_CHILD | WS_VISIBLE | ES_LEFT | WS_BORDER | WS_TABSTOP | ES_MULTILINE
    

    and then it removes WS_VISIBLE:

    dwStyle &= ~WS_VISIBLE;
    // dwStyle value is now
    // WS_CHILD | ES_LEFT | WS_BORDER | WS_TABSTOP | ES_MULTILINE
    

    which is the final style applied to the control.

    The resource compiler is trying to help you out by pre-setting the styles that you probably want, but if you don't realize that those defaults are in place, you may not realize that you need to provide them yourself when you don't use the resource compiler. Maybe it was being too helpful and ended up resulting in helplessness.

    The customer was kind enough to write back.

    Thanks! That did the trick.

    For completeness, the default dialog style is WS_POPUP­WINDOW = WS_POPUP | WS_BORDER | WS_SYS­MENU. If you have a custom font, then you also get DS_SET­FONT, and if you have a caption, then you get WS_CAPTION.

  • The Old New Thing

    It rather involved being on the other side of this airtight hatchway: Silently enabling features

    • 41 Comments

    A security vulnerability report arrived which went roughly like this:

    When you programmatically enable the XYZ feature, the user receives no visual alert that it is enabled. As a result, malware can enable this feature and use it as part of an attempt to turn the machine into a botnet zombie. The XYZ feature should notify the user when it is enabled, so that to presence of malware is more easily determined.

    Okay, first of all, before we get to the security part of this issue, let's look at the user interface design. The proposed change is that, when the XYZ feature is enabled programmatically, the user receive a notification "XYZ is now enabled."

    You know what most users are going to do when they get that notification?

    Ignore it.

    There are two cases where XYZ can be programmatically enabled. The user may have enabled it themselves by, say, checking a checkbox, and the code that handles the checkbox turns around and programmatically enables the XYZ feature. In this case, the notification is an annoyance like my three-year-old niece who narrates every single thing she does. The user goes to the XYZ control panel, enables XYZ, and in response to the XYZ control panel enabling XYZ, the user gets a notification balloon that says "XYZ is now enabled."

    Well DUH.

    The other case is that the user did not enable it themselves, in which case the balloon is an annoyance because it says something that the user doesn't care about and probably doesn't even understand.

    "The tech tech tech is now tech tech tech."

    Displaying a notification doesn't really help. Either the user expects it, in which case it's an annoyance, or the user doesn't expect it, in which case they most likely won't understand it either, so it's still just an annoyance. (And taking no action leaves the feature enabled.)

    Okay, now let's look at the security aspect of this report. Enabling the XYZ feature requires administrator privileges, so any malware which successfully turns on the XYZ feature has already pwned your machine. It's already on the other side of the airtight hatchway.

    Displaying a warning when your machine is pwned doesn't accomplish anything: Since the malware already has complete control of the machine, it can patch out the code that displays the notification balloon. In other words, the only case in which the user actually sees the XYZ notification is when the user was expecting it to be turned on anyway, at which point you're just being a chatty Cathy.

    Exercise: "You can get rid of the notification in the case where the user enabled the feature manually adding a fSuppressWarnings parameter to the Enable­XYZ function, and have the code that handles the checkbox pass fSuppressWarnings = TRUE. That leaves only the second case, which is exactly the case we want the user to be annoyed." Discuss.

Page 1 of 3 (26 items) 123