February, 2005

Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    Concurrency, Part 8 - Concurrency for scalability

    • 18 Comments
    Last time, I talked a bit about why you'd use concurrency in your application.  Today I'd like to talk a smidge about using concurrency to achieve scalability.

    The first and most important thing that you have to understand when you decide to start improving the MP scalability of your application is that you are about to start on a long and arduous journey.  MP scalability is not a task for the faint of heart - as I mentioned in the first of these articles, the reality is that high end scalability is very similar to quantum physics.  Once you start trying to introduce MP scalability into your application, most the rules you thought you understood about programming  change.  As I mentioned earlier, there are only a relatively few people at Microsoft who really understand this problem space (I'm not one of them), so I can only talk in general principles about this (yay, I used the right principle! :)).

    Another thing to realize is that the vast majority of applications don't need to worry about scalability.  If your application is interacting with the user, for example, then worrying about scalability is almost always a total waste of time - the reality is that your application spends most of its time interacting with a user - and users are SLOW relative to computers.  Your scalability is generally limited to the speed of the user, and they are really slow - even the fastest users can only type 9-10 characters per second, and mice only generate hundreds of inputs per second.  This is not to say that concurrency isn't an issue for your application - you might very well chose to use multiple threads, but those threads are typically used for offloading work from your main user interface processing thread.

    When would you want to worry about scalability?  Well, if you're writing a server of some kind, it's going to be really important.  The number of users that is supported by your server is limited by the bottlenecks of your server.  And you need to understand where those bottlenecks exist before you can start worrying about scalability.  For example, an email server is typically disk bound (most people don't realize this, but email is inherently a database processing problem, and as such it's usually disk bound).  But as a disk bound application, you can solve that problem by throwing more disks at the problem.

    I'm only going to talk about CPU bound scalability issues, since the other types of scalability bottlenecks are offtopic.  But in general, depending on the design of your application, you can have bottlenecks in almost every type of resource you consume - you can have bottlenecks in disk bandwidth, in virtual address space, in cpu resources, etc.

    But if your application is CPU bound (for example, a database server that spends most of its time traversing indexes in database), then the obvious solution is to throw more CPUs at the problem.

    And that's when the problem gets strange.

    The key to MP scalability the realization that a single processor core can only do one thing at a time.  And if your CPU is spending time doing anything that's not directly related to your application, it's hurting your scalability.

    Why is this important?  Well, for two reasons.  The first is that it means that optimally, your application would only have one runnable thread per CPU core at anytime.  Any more threads than that means that the operating system is going to context switch between the threads.  And if the operating system's context switching, then that means that it's spending CPU cycles saving your state, finding the next runnable thread, restoring that threads state.  This is time that could be spent in your application performing its calculations, and instead, it's wasted switching to another thread in your application.

    The other reason has to do with your thread quanta.  The NT scheduler schedules threads based on a concept called a quanta - the minimal amount of time that a thread will run.  Typically a quantum ranges from 20ms to 120ms in length (it varies depending on the OS configuration).  The NT scheduler is fairly deterministic, the OS won't context switch away from a thread unless one of two events occurs - first, if a higher priority thread becomes runnable, and second, if the thread relinquishes the CPU (by calling a system call that would block).  So, in general, if your thread doesn't block, your thread is somewhat immune to context switches for the length of its quantum (this is a rough approximation, but works for this discussion).

    So it's critical that if you're trying to manage your scalability, you need to ensure that you get the most of your CPU.  And that means that you don't ever want to let the NT scheduler context switch away from your thread for any other reason than your quantum expiring.

    Now it turns out that that's a heck of a lot harder than it sounds.  For example, it means that you can never do any synchronous file I/O (since that blocks your thread). It means that you can never ever use Win32 critical sections, or call WaitForSingleObject, since any of them might block.

    It means that you need to take all the things I talked about in the last 7 posts on the series and throw most of them away, except for the first principle: If your data is never accessed on more than one thread, then you don't have to worry about concurrency.  The challenge is to get that to happen.   Because in most non trivial servers, you WILL have shared data structures - for instance, the index in your table in the database is a shared structure. 

    Also, there are times that you ARE going to block before your quantum has expired (for example, when you need to bring a page from the database into cache) - but you don't want to let the OS pick the next thread to run, YOU want to pick the next thread to run.

    I talked about one of the solutions in an earlier post: Fibers.  Fibers provide a lightweight mechanism for an application to write their own context switching API - that allows you to keep your thread "hot" at all times, avoiding the OS context switch mechanism.

    One other major trick in the "don't block" toolkit are the Interlocked Win32 primitives.  The interlocked primitives use the CPUs locking mechanisms to guarantee cross-CPU core synchronization of data, which allows you to avoid acquiring a critical section.

    Tomorrow, I'll spend a bit of time cataloging the different mechanisms for achieving scalability that Win32 provides.

    Edit: Corrected typo in subject

    Edit2: quanta->quantum, where appropriate.  Thanks Drew

  • Larry Osterman's WebLog

    Concurrency, part 7 - Why would you ever want to use concurrency in your application?

    • 14 Comments
    So I've spent a great deal of time talking about concurrency issues, but one thing I've avoided mentioning until now is when do you worry about concurrency.

    The first (and most common) time that concurrency matters occurs when your code lives in a DLL.  If your code is in a DLL, then you've got to worry about concurrency (unless you have some external contract that guarantees your code is only called on a single thread (like being a COM apartment model object)).  This is an inescapable rule, if you're in a DLL, you have no control over how your code is called, and you must assume a worst case scenario.  And there are times in a DLL when it's necessary to do work on separate threads - as I mentioned before, you can't use COM objects in a DLL, which means that all COM calls in a DLL have to be done on another thread (unless your code is living in a COM DLL, in which case you can safely assume that your caller's called CoInitialize for you).

    But if you're writing an application, you still might want to use multiple threads.  The biggest reason is for convenience.  There are times when it's just useful to be able to kick off a chunk of work onto another thread.  This is especially true for applications that interact with the user.  Even though those applications spend 99% of their time idle, it's critically important that the application be immediately responsive to the user.  So any tasks that could conceivably take a long time (like opening a file) should be performed on a thread that's not interacting with the user.  That's why my copy of FrontPage has 7 threads running.  In fact, on my machine, the only application with only one thread is cmd.exe - all the other processes have at least two threads.  Outlook has 37 threads on my machine right now :).

    And, of course, the third reason for writing for concurrency is for performance.  If your machine has only one CPU core, then adding multiple threads won't actually improve your performance (actually they'll hurt your performance due to the time spent waiting for the system to switch from one thread to another), but on a machine with more than one processor, if you have multiple CPU-bound operations that can be overlapped, then running them on separate threads can dramatically improve your scalability.

    But if you take the latter tack (adding multiple threads to resolve performance bottlenecks), it's critical to realize that you're potentially walking into a minefield.  All the stuff I've talked about so far is pretty straightforward, and applies to all sorts of applications.  But when you start trying to adopt your code for high performance computing, it opens up a whole new world of potential bottlenecks and issues.

    And that's what I'll spend some time talking about next - a rough primer on concurrency for scalability, and some of the issues associated with it.

  • Larry Osterman's WebLog

    Concurrency - An aside

    • 13 Comments

    Sorry, I've got a massive headache today, and really don't feel up to writing the next item in the concurrency series, I should have something for Thursday (tomorrow's a day off for me :)).  Meanwhile, I wrote this up a bit earlier during the series:

    My discussion about critical sections the other day got me to thinking about OS/2.  One of the books on my bookshelf that I don't read that often is Gordon Letwin's "Inside OS/2".  At the time, it was groundbreaking - an OS design manual from the designer of the operating system.  In many ways, it was the inspiration for Helen Custer's Inside Windows NT.

    At some point I'll write about Gordon's book, it's actually a fascinating look at what Microsoft considered the state of the art in operating system development back in the late 1980's.

    But one of the things I found (with 17 years of hindsight) was the OS/2 notion of "critical sections" as referenced by the DosCritSecEnter and DosCritSecLeave API.  Unlike the Win32 concept of critical sections (where a critical section prevents more than one thread from calling EnterCriticalSection on a particular critical section), the OS/2 concept of critical sections was radically different.

    According to "Inside OS/2", when you entered a critical section, all the other threads in your process were suspended.  That's right - in OS/2, concurrent access to data structures was enforced by preventing any other threads in your app from being scheduled.

    Now, in reality, this isn't really what happened - instead you'd use the critical section to protect another data structure that would in turn control access to the resource.  In effect, the OS/2 critical section is the user-mode equivalent of an NT spin lock on a uniprocessor machine.

    But I sometimes find it fascinating that the OS/2 designers implemented these as their first class synchronization structures.

     

  • Larry Osterman's WebLog

    Concurrency, part 6 - Reference counting is hard :)

    • 9 Comments
    The other day, I talked about using reference counting as a way of ensuring object when enumerating a list containing objects.

    Reference counting is an incredibly useful, but incredibly dangerous technique.  If you use it right, it can be a huge help, if you get it wrong, it's a nightmare.

    For example, what's wrong with this code:

    class Thing1 : public RefCountedObject
    {
        CriticalSection _Thing1Lock;
        class Thing2 *_Thing2;
    public:
        void Lock() { EnterCriticalSection(&_Thing1Lock); }
        void Unlock() { LeaveCriticalSection(&_Thing1Lock); }

        void SetThing2(Thing2 thing2)
        {
            Lock();
            _Thing2 = thing2;
            _Thing2->AddRef();
            Unlock();
        }
    };

    class Thing2 : public RefCountedObject

    {
        CriticalSection _Thing2Lock;
        class Thing1 *_Thing1;
    public:
        void Lock() { EnterCriticalSection(&_Thing2Lock); }
        void Unlock() { LeaveCriticalSection(&_Thing2Lock); }

        void SetThing1(Thing1 *thing1)
        {
            Lock();
            _Thing1 = thing1;
            _Thing1->AddRef();
            Unlock();
        }
    };

    If you call thing2->SetThing1 and thing1->SetThing2, you've just introduced a circular dependency - neither thing1 or thing2 will ever be released.  And, while this example is utterly silly, there are lots of examples where you can get circular references. 

    I think a real world example fits in quite nicely here.  We've got a notification class for some of our stuff that had a deadlock in it.  The original class looked like:

    class NotificationManager: public IUnknown
    {
        struct NotificationBlock
        {
            CALLBACKFUNCTION _Function;
            LPVOID _Context;
            NotificationBlock(CALLBACKFUNCTION Function, LPVOID Context) :
                _Function(Function),
                _Context(Context);
            {
            }
        }
        CAtlList<NotificationBlock> _NotificationListList;
        CRITICAL_SECTION _NotificationListLock;

        RegisterForNotifications(CALLBACKFUNCTION Function, LPVOID Context)
        {
            NotificationBlock block;
            block._Function = Function; block._Context = Context;
           
            EnterCriticalSection(&_NotificationListLock);
            _NotificationList.Add(block);
            LeaveCriticalSection(&_NotificationListLock);
        }
        UnregisterNotifications(CALLBACKFUNCTION Function, LPVOID Context)
        {
            NotificationBlock block(Function, Context);
            EnterCriticalSection(&_NotificationListLock);
            pos = _NotificationList.Find(block);
            _NotificationList.RemoveAt(pos);
            LeaveCriticalSection(&_NotificationListLock);
        }
        GenerateNotification(LPVOID NotificationBlock)
        {
            NotificationBlock block;
            EnterCriticalSection(&_NotificiationListLock)
            pos = _NotificationList.GetHeadList(block);
            while (block = _NotificationList->GetNext(pos))
            {
                block._Function(block._Context, NotificationBlock);
            }
            LeaveCriticalSection(&_NotificationListLock);
        }
    }

    Unfortunately, because the callback function was called with the list locked, there was a huge potential for deadlock (and in fact, one was observed).

    My first thought was to change the notification block using a temporary list..

        GenerateNotification(LPVOID NotificationBlock)
        {
            NotificationBlock block;
            CAtlList<NotificationBlock> temporaryList;
            EnterCriticalSection(&_NotificiationListLock)
            pos = _NotificationList.GetHeadList(block);
            while (block = _NotificationList->GetNext(pos))
            {
                NotificationBlock tempBlock(block._Function, block._Context);
                temporaryList.Add(tempBlock);
            }
            LeaveCriticalSection(&_NotificationListLock);
            while (block = temporaryList.RemoveHead())
            {
                block._Function(block._Context, NotificationBlock);
            }       
        }

    The problem with this is that it's possible for someone to have removed the block from the list as soon as the lock was released.  So I needed to have a way of protecting the blocks.  Well, the fourth principle comes into play: I reference counted the blocks.

    Now the function looks like:

    class Thing1 : public INotification
    {
        CriticalSection _Thing1Lock;
        Thing1Container *_MyContainer;
    public:
        void Lock() { EnterCriticalSection(&_Thing1Lock); }
        void Unlock() { LeaveCriticalSection(&_Thing1Lock); }

        void FinalConstruct(NotificationContainer *MyContainer)
        {
            _MyContainer = MyContainer;
            _MyContainer->AddRef();
            _MyContainer->RegisterForNotifications(CallbackFunction, this);
        }
        void FinalRelease()
        {
            _MyContainer->UnregisterNotifications(CallbackFunction, this);
            _MyContainer->Release();
        }
    };

    class NotificationManager: public IUnknown
    {
        CAtlList<INotification *> _NotificationListList;
        CRITICAL_SECTION _NotificationListLock;

        RegisterForNotifications(INotification* Notify)
        {
            EnterCriticalSection(&_NotificationListLock);
            _NotificationList.Add(Notify);
            LeaveCriticalSection(&_NotificationListLock);
        }
        UnregisterNotifications(INotification *Notify)
        {
            EnterCriticalSection(&_NotificationListLock);
            pos = _NotificationList.Find(Notify);
            _NotificationList.RemoveAt(pos);
            LeaveCriticalSection(&_NotificationListLock);
        }
        GenerateNotification(LPVOID NotificationBlock)
        {
            INotification *notify;
            CAtlList<INotification *> temporaryList;
            EnterCriticalSection(&_NotificiationListLock)
            pos = _NotificationList.GetHeadList(block);
            while (notify = _NotificationList->GetNext(pos))
            {
                notify->AddRef();
                temporaryList.Add(notify);
            }
            LeaveCriticalSection(&_NotificationListLock);
            while (notify = temporaryList.RemoveHead())
            {
                notify->OnNotification(NotificationBlock);
                notify->Release();
            }
        }


    }

    Unfortunately, this introduces a circular reference.  Since thing1 is added to the list in its final construct, the call to UnregisterNotifications never happens - the notification block keeps a reference to thing1 and the reference count never goes away.  Unless there's an external mechanism for removing thing1 from the container, neither object will ever be freed.

    The easiest way of solving this is to introduce a tiny "delegator" object between thing1 and the notification manager.  The notification manager references the delegator, not the Thing1 object, and thus the lifetime of Thing1 is unaffected by the notification manager.

    The good news is that managed environments, like the CLR make most of this reference counting stuff pretty much unnecessary, which is a very good thing.

    Oh, and the fifth principle: "Reference counting is hard if you're not really careful".

    So much for framework involving concurrency.  Believe it or not, I'm almost done :)  Tomorrow, I want to talk about opportunities for concurrency.

     

    Edit: One of these days, I'll remember that your principal is your friend.

  • Larry Osterman's WebLog

    Concurrency, part 5 - What happens when you can't avoid lock misordering?

    • 14 Comments
    Yesterday, I talked about simple deadlocks.  In general, deadlocks are caused by violating lock ordering.  But what do you do if you can't avoid violating lock ordering?

    It does happen - in my experience, the most common place where this can occur is when you have a list of objects, where the list is protected by a different lock from the objects.

    So you have (yeah, I know it's a stupid example, but...):

    class Thing1
    {
        CriticalSection _Thing1Lock;
        Thing1Container *_MyContainer;
    public:
        void Lock() { EnterCriticalSection(&_Thing1Lock); }
        void Unlock() { LeaveCriticalSection(&_Thing1Lock); }

        void Thing1Method1()
        {
            Lock();
            ...
            Unlock();
        }
        ~Thing1()
        {
            Lock();
            _MyContainer->RemoveThing1FromList(this);
            Unlock();
        }
    };

    class Thing1Container
    {
        CAtlList<Thing1 *> _Thing1List;
        CRITICAL_SECTION _Thing1ListLock;

        void EnumerateThing1()
        {
            POSITION pos;
            EnterCriticalSection(&_Thing1ListLock);
            pos = _Thing1List.GetHeadPosition();
            while (!pos)
            {
                Thing1 *thing1 = _Thing1List.GetNext(pos);
                thing1->DoSomeWork();
            }
            LeaveCriticalSection(&_Thing1ListLock);
        }
        AddThing1ToList(Thing1 *Thing1)
        {
            EnterCriticalSection(&_Thing1ListLock);
            _Thing1List.Add(Thing1);
            LeaveCriticalSection(&_Thing1ListLock);
        }
        RemoveThing1FromList(Thing1 *Thing1)
        {
            EnterCriticalSection(&_Thing1ListLock);
            pos = _Thing1List.Find(Thing1);
            _Thing1List.RemoveAt(pos);
            LeaveCriticalSection(&_Thing1ListLock);
        }

    }

    If the Thing1 class also has its own lock, or if the call to thing1->DoSomeWork() may take a "long" time (for some value of long that depends on your application), then holding the lock across the call to DoSomeWork can be a major issue.

    So the obvious change is to release the lock across the call to thing1->DoSomeWork().

        void EnumerateThing1()
        {
            POSITION pos;
            EnterCriticalSection(&_Thing1ListLock);
            pos = _Thing1List.GetHeadPosition();
            while (!pos)
            {
                Thing1 *thing1 = _Thing1List.GetNext(pos);
                LeaveCriticalSection(&_Thing1ListLock);
                thing1->DoSomeWork();
                EnterCriticalSection(&_Thing1ListLock);
            }
        }
     

    Unfortunately, now you have an even bigger problem.  The instant you left the critical section, another thread could have come along and destroyed thing1.  Normally, the call to destroy thing1 would have blocked on the _Thing1ListLock, but once you exited the critical section, the other thread could run and destroy the object.

    The immediate solution to this problem is to add reference counting to thing1. 

    class Thing1
    {
        CriticalSection _Thing1Lock;
        LONG _ReferenceCount;
        Thing1Container *_MyContainer;
    public:
        LONG AddRef() { return InterlockedIncrement(&_ReferenceCount); }
        LONG Release()
        {
            LONG lRet = InterlockedDecrement(&_ReferenceCount);
            if (lRet == 0)
            {
                delete this;
            }
            return lRet;   
        }
        void Lock() { EnterCriticalSection(&_Thing1Lock); }
        void Unlock() { LeaveCriticalSection(&_Thing1Lock); }

        void Thing1Method1()
        {
            Lock();
            ...
            Unlock();
        }
        ~Thing1()
        {
            Lock();
            _MyContainer->RemoveThing1FromList(this);
            Unlock();
        }
    };

    Next, you want to change EnumerateThing1 to:

        void EnumerateThing1()
        {
            POSITION pos;
            EnterCriticalSection(&_Thing1ListLock);
            pos = _Thing1List.GetHeadPosition();
            while (!pos)
            {
                Thing1 *thing1 = _Thing1List.GetNext(pos);
                thing1->AddRef();
                LeaveCriticalSection(&_Thing1ListLock);
                thing1->DoSomeWork();
                EnterCriticalSection(&_Thing1ListLock);
                thing1->Release();
            }
        }

    You also need to change AddThing1ToList:

        AddThing1ToList(Thing1 *Thing1)
        {
            EnterCriticalSection(&_Thing1ListLock);
            Thing1->AddRef();
            _Thing1List.Add(Thing1);
            LeaveCriticalSection(&_Thing1ListLock);
        }
        RemoveThing1FromList(Thing1 *Thing1)
        {
            EnterCriticalSection(&_Thing1ListLock);
            pos = _Thing1List.Find(Thing1);
            _Thing1List.RemoveAt(pos);
            Thing1->Release();
            LeaveCriticalSection(&_Thing1ListLock);
        }

    Note that I'm using InterlockedIncrement and InterlockedDecrement for the reference counting.  That's because InterlockedIncrement and InterlockedDecrement don't require an additional lock - they atomically increment or decrement the variable by using low level CPU instructions that avoid the need for a separate lock.  Also, when adding reference counting to an object, don't forget to make the destructor of that object private (or protected, if it's virtual) - that way you don't accidentally delete the object.

    I was also careful to add a reference to Thing1 when it was put on the list.  This guarantees that thing1 won't be destroyed when we call thing1->Release() in EnumerateThing1 - otherwise we could have a lock inversion.

    Now there's still a problem with the enumeration function above - it's related to the use of the "pos" variable.  If another thread called AddThing1ToList (or, even worse, RemoveThing1FromList) after the lock was released, then the "pos" variable is invalidated!  It doesn't even have to be a remove call for the current item in the enumeration,  the "pos" variable in EnumerateThing1 is only valid as long as no thread has modified the list.  So you've just introduced a data corruption into your application by releasing the lock.  So you need to be careful to ensure that your list doesn't get corrupted.

    There are a several of ways of handling that - my current favorite is to have a temporary list and stage the thing1's in the temporary list:

        void EnumerateThing1()
        {
            POSITION pos;
            CAtlList<Thing1 *> callbackList;
            EnterCriticalSection(&_Thing1ListLock);
            pos = _Thing1List.GetHeadPosition();
            while (!pos)
            {
                Thing1 *thing1 = _Thing1List.GetNext(pos);
                thing1->AddRef();
                callbackList.Add(thing1);
            }
            LeaveCriticalSection(&_Thing1ListLock);
            pos = callbackList.GetHeadPosition();
            while (thing1 = callbackList.RemoveHead())
            {
                thing1->DoSomeWork();
                thing1->Release();
            }
        }

    This solution applies my first principal: I ensured that the data was never accessed on more than one thread.  There are other solutions, this is just the easiest - it has its own issues, since it builds a temporary copy of the list (and if the list is long, that may involve a lot of memory).  Btw, this solution is essentially the one that the C# uses with the foreach construct - a temporary, immutable copy of the list is made and the caller iterates over that temporary copy of the list (CLR purists will disagree with me on this one, and they're right, but at a 20,000 foot level, that's essentially what happens).

    By now, my fourth principal of concurrency should be pretty clear: "Don't call into other objects with locks being held, and make sure that you reference count all your objects".

    In general, callbacks can be VERY tricky, and usually require that the callback function clearly describe their semantics.  For example, the waveOutProc function passed in to waveOutOpen has the following caveat:

    Applications should not call any system-defined functions from inside a callback function, except for EnterCriticalSection, LeaveCriticalSection, midiOutLongMsg, midiOutShortMsg, OutputDebugString, PostMessage, PostThreadMessage, SetEvent, timeGetSystemTime, timeGetTime, timeKillEvent, and timeSetEvent. Calling other wave functions will cause deadlock.

    This caveat exists because winmm doesn't release some internal locks across callback functions and thus can deadlock easily.

    Tomorrow, I'll talk a bit about some of the other issues associated with reference counting - there are some a really subtle bugs that can happen if you're not careful when using reference counts.

    Edit: Don't call DoSomeWork in the loop :)

  • Larry Osterman's WebLog

    Concurrency, Part 4 - Deadlocks

    • 14 Comments
    Yesterday I talked about critical sections and their use in protecting application data.

    But critical sections are like duct-tape - they hold applications together but they also have their dark side.  A couple of commenters have touched on this, but the biggest issue with critical sections is "deadlocks".

    The simplest way of generating a deadlock is what is called a "lock order inversion".  Basically if you have two objects, call them "Thing1" and "Thing2".  Each is protected by a critical section.

    So what happens when you have:

    class Thing1
    {
        CriticalSection _Thing1Lock;
        class Thing2 *_Thing2;
    public:
        void Lock() { EnterCriticalSection(&_Thing1Lock); }
        void Unlock() { LeaveCriticalSection(&_Thing1Lock); }

        void Thing1Method1()
        {
            Lock();
            ...
            _Thing2->Thing2Method2();
            ...
            Unlock();
        }
        Thing1Method2()
        {
            Lock();
            ...
            Unlock();
        }
    };

    class Thing2
    {
        CriticalSection _Thing2Lock;
        class Thing1 *_Thing1;
    public:
        void Lock() { EnterCriticalSection(&_Thing2Lock); }
        void Unlock() { LeaveCriticalSection(&_Thing2Lock); }

        Thing2Method1()
        {
            Lock();
            ...
            _Thing1->Thing1Method2();
            ...
            Unlock();
        }
        Thing2Method2()
        {
            Lock();
            ...
            Unlock();
        }
    };

    So what happens when Thread1 calls:

            thing1->Thing1Method1();

    And Thread2 calls:

            thing2->Thing2Method1();

    ?

    Thread 1 Thread 2
    thing1->Thing1Method1() thing2->Thing2Method1()
    EnterCriticalSection(&_Thing1Lock) EnterCriticalSection(&_Thing2Lock)
    ... ...
    _thing2->Thing2Method2() _thing1->Thing1Method2()
    EnterCriticalSection(&_Thing2Lock) EnterCriticalSection(&_Thing1Lock)

    At this point, you've deadlocked the process.  Thread 1 is blocked waiting for Thread2 to release the _Thing2Lock.  Thread2 is blocked waiting on Thread1 to release the _Thing1Lock.

    And there's no way of getting out of this situation, you're totally stuck.

    So how do you avoid deadlocks?  Well, this is where the whole "concurrency discussion" starts getting tricky.

    The simplest solution is once again, to avoid the problem.  Define a strict ordering for your locks and stick with it.

    So this would change Thing2::Thing2Method1() to be:
        Thing2Method1()
        {
            _Thing1->Lock();
            Lock();
            ...
            _Thing1->Thing1Method2();
            ...
            Unlock();
            _Thing1->Unlock();
        }
     

    And now, the flow of control is:

    Thread 1 Thread 2
    thing1->Thing1Method1() thing2->Thing2Method1()
    EnterCriticalSection(&_Thing1Lock) EnterCriticalSection(_Thing1Lock)
       
    ...  
    _thing2->Thing2Method2()  
    EnterCriticalSection(&_Thing2Lock)  
    _Thing1->Thing1Method2()  
    EnterCriticalSection(&_Thing1Lock)  
    ...  
    LeaveCriticalSection(&_Thing1Lock)  
    ...  
    LeaveCriticalSection(&_Thing2Lock)  
    LeaveCriticalSection(&_Thing1Lock)  
      EnterCriticalSection(&_Thing2Lock)
      ...
      _Thing1->Thing1Method2()
      EnterCriticalSection(&_Thing1Lock)
      ...
      LeaveCriticalSection(&_Thing1Lock)
      ...
      LeaveCriticalSection(&_Thing2Lock)
      LeaveCriticalSection(&_Thing1Lock)

    Voila, we've avoided the deadlock.

    So that's the third principle of concurrent programming: Know your lock order and never, ever violate it.

    But what happens if you can't avoid violating the lock order?  It does happen, especially if your code calls out to other components.

    Tomorrow, I'll talk about some ways of avoiding the problem.

     

    Edit: Fixed typo in Thing2::Lock and Unlock, thanks ac.

     

  • Larry Osterman's WebLog

    Concurrency, Part 3 - What if you can't avoid the issue?

    • 12 Comments
    Yesterday, I talked about how you deal with concurrency issues by simply avoiding the concurrency issues - essentially by ensuring that your data is only ever accessed by a single thread.

    But, of course, there are times when this is unavoidable.  For example, if your code is located in a DLL, there's no way of knowing how your caller's going to be calling your code.

    If you're a COM component, then you can resolve the issue by marking your component apartment threaded and then checking to see if the thread on which you're being called is the thread on which you were first instantiated (the IWinHttpRequest object does this, for example).  Problem solved :)  Of course, this pushes the problem off to your caller - they're stuck with dealing with the issues of marshalling your calls to the right thread, etc. 

    Of course this isn't scalable, and it's really unfriendly, so we need a more friendly solution (and one that allows for a more scalable solution).  Since concurrent programming is all about protecting your data from being accessed by multiple threads, then really, all you need to do is to ensure that only one thread in your process can access your shared (global) data.

    Windows provides a relatively simple mechanism for serializing code execution, EnterCriticalSection and LeaveCriticalSection (yeah, I know you already know that :)).

    Again, if you're interested in protecting your data, simply initialize a critical section in your startup, call EnterCriticalSection on every entry, and LeaveCriticalSection on exit.  Problem solved, you won't have to worry about concurrency issues.  Again, you're not going to be scalable, but at least you're not forcing your clients to work overtime.  An example of a component that does this is Exchange's MAPI client DLL (or at least it used to do this when I worked on it, it might have changed).

    But this still isn't scalable.  So the next step is to identify the fields you want to protect and implement a critical section around them.  For example, by default, each Win32 heap has a critical section associated with it.  When you call HeapAlloc(), the heap logic enters the critical section, performs the allocation and leaves the critical section.  Similarly, HeapFree() enters the critical section, performs the free, and leaves the critical section.

    For a huge number of scenarios, that's sufficient - you simply identify the data that's going to be protected, wrap it in a critical section, enter the critical section before you access the data, leave the critical section when you're done, and you're good to go.

    But there's a caveat to wrapping your data structures with critical sections.  It often doesn't work if you have more than one type of data structure being protected.  In fact, if your code is reasonably sophisticated, you've got a potential problem..

    And that's Larry's second principle of concurrency: "Critical sections can be your best friend (unless they're your worst enemy)".    I'll talk about why this is tomorrow.

     

    Edit: Added a second principle (I knew I forgot something :))

     

  • Larry Osterman's WebLog

    Concurrency, Part 2 - Avoiding the problem

    • 27 Comments
    Yesterday's article on concurrency discussed the basic concepts of concurrency.  Now I'd like to start talking about how you deal with concurrency...

    The first, and most important thing to realize about concurrent programming is that it's all about two things: your data and your threads.  If you only have one thread, then you don't have to worry about concurrency issues.  If you have more than one thread, then you only have to worry about concurrency issues if more than one thread can simultaneously access that data.  And that's my first principle of concurrent programming:  If your data is never accessed on more than one thread, then you don't have to worry about concurrency.   Again, the guys who get concurrency are cringing with this principle- the reality is (of course) more complicated than that, I'll get back to why it's more complicated later (I need to introduce some more concepts beforehand).

    In Win32, in general, there are three ways that you can guarantee that your thread is the only one executing the data.

    The first is your stack.  On Win32, the data on your stack is owned by the thread (this might not be true for other architectures, I don't know :().  Unless you explicitly pass pointers to your stack to another thread, then you don't have to worry about other threads messing with your stack data, so you don't need to worry about protecting the data.

    The second way of ensuring that only one thread can access your data is to use ThreadLocalStorage, or TLS.  The idea behind TLS is that when your process starts, it allocates a "slot" in TLS.  That allocation returns you an index into a table, and you can stick whatever value you want to into that table.  When your thread starts up, you can allocate a block of memory, stick it into the table, and then, later on during the execution of the thread, you can go back and query the value of that block.  The block remains per-thread, and can be accessed without protecting the data.  This allows you to maintain per-thread context blocks which can be used to hold state that's more global than the stack.  Btw, the C runtime library allows you to declare variables in TLS by simply decorating them with __declspec(thread) - there are some caveats about using this, but the facility is available...

    The third way of ensuring that only one thread can access your data is simply to be careful in how you write your code.  As an example, in my last "What's wrong with this code" article, I purposely allocated the FileCopyBlock structures in one thread, put them on a queue and executed them in worker threads.  As a result, I didn't have to protect the FileCopyBlock fields - since only one thread could ever access the data at a time, they didn't need to be protected.  Now more than one thread accessed the data (the block was constructed on the main thread and destructed on the worker threads).  But at any given time, the blocks weren't accessed by more than one thread.  This principle can be applied in a number of different ways - my example was quite simple, but it wouldn't be difficult to imagine a FSM where the state was kept in a block that was enqueued and dequeued based on state transitions - the block would only ever be accessed by one thread at a time and thus wouldn't have to be protected.

     

    It turns out that you can write some fairly sophisticated multithreaded code without ever having to ever worry about synchronizing your shared data, just by being careful and setting up your data structures appropriately, you can do pretty amazing things.

    But, of course, there are times that you can't avoid having more than one thread accessing your data.  Tomorrow, I'll talk about some of the ways around that problem.

    Edit: Principal->Principle (thanks Mike :))

  • Larry Osterman's WebLog

    Concurrency, part one.

    • 22 Comments
    Last week, I mentioned I was going to start a series of articles on concurrency, here goes.  I do feel a need to give a caveat: I'm sort-of winging it here - I know where I want to go on this one, but I'm not sure how I'm going to get there :)  Please bear with me, because this is a relatively complicated topic.  Also remember that I'm trying to start this from the ground up - the reality is that a huge number of the issues that show up can't be addressed unless you've covered the fundamentals.

    The interesting thing about concurrent programming is that it can be as nightmarishly difficult as you want to make it, or as easy as you want to make it.  The reality is that the more you push the envelope of concurrent programming the harder it is.

    On the other hand, with a few relatively simple steps, you can achieve a "reasonable" level of concurrency in your application without too much pain.

    At this point, the people who really get high performance, highly scalable computing are cringing.  And they're right to be cringing, high performance scalable concurrent computing is REALLY hard.  There are only a couple of dozen people at Microsoft who know enough to do it and get it right consistently (and I'm absolutely NOT among them).  Getting high performance concurrent computing (the kind of thing that is needed to get your server application scaling evenly to more than 10 or so CPU cores) is HARD.  It literally ranks among the hardest things we know how to do in computing.  There are reams and reams of PhD theses that have been written about how to do it by people way smarter than I am.

    So you can think of this series as being a series about "good enough" concurrent programming - it's not designed for high-end systems, it's about making the stuff that runs day-to-day work well enough.

    So, with all that said, on with the show.

     

    First off, what IS concurrent programming?  It's a big phrase, and it's not clear what's meant by it.  In a nutshell, concurrent programming is about trying to make your application more scalable by making your application do more things at once.  As I mentioned in my post about Herb Sutter's Free Lunch article on CPU speed limitations in Dr. Dobbs, concurrency is going to be more and more important in computing in the future.

    How can your application do more than one thing at once?  Well, the reality is that if your computer has only one CPU core, then it can't.  But most machines sold these days come with more than one CPU core (heck, the $700 machine I bought my daughter for Christmas came with a HT P4 CPU).  So the reality is that MP machines are more mainstream than you might think.  And they're going to become more and more mainstream as time goes on.

    But even with that, your machine can only do as many things at once as it has CPU cores.  The key thing to realize is that just because your machine can only do one thing at a time, doesn't mean it is actually DOING something most of the time -in fact, 99% of the time, your CPU is literally turned off, especially if your machine has a Celeron CPU core.  Even if your application does a lot of stuff, it STILL idle much of the time - every time you read from a file, you block your process waiting on the disk to return your file's data, every time you allocate virtual memory, you potentially wait until the memory management system flushes dirty pages to disk and maps those pages into your address space.  And all the time that your application is idle, you could potentially be using your CPU for other things.  And if you do have a machine with more than one core), then your machine CAN do more than one thing at once.

    Since your machine is idle most of the time, then its useful to identify ways of taking advantage of that idle time - if you can do stuff in the time your app is blocked, that's stuff that doesn't have to slow down the main thread of your application.  Simple examples of this are the spelling check (or background printing) in a word processing application - checking the spelling of words in a document (or formatting the document for printing) don't need to be done in the main thread, so they can be offloaded into a worker thread that would perform the actual work. 

    In Win32, there are two major systems for implementing concurrent programming, Threads and Fibers.  I won't touch on Fibers, because I don't have enough experience with Fiber programming to write authoritatively on the issues of programming with Fibers, but I can talk a bit about concurrent programming with threads.  And there's a wealth of content related to concurrent programming with threads so...

    Ok, so that's what concurrent programming is, why does everyone keep saying that it's so hard?

    The big deal with concurrent program is that it introduces a huge set of issues that you don't see normally.  First off, you need to ensure that your application can take advantage of concurrency - I can't think of the number of times that someone's naively added multiple threads to their application and made their application (or system) slow down.  The other problem is that multithreaded applications introduce a series of failures that aren't seen in single threaded applications.

    The basic problem is that many (most) operations are so-called read/alter/write operations.  For example, most (if not all) arithmetic operations in C are R/A/W operations.  For example, in C, b = b + c reads the value of b from memory, reads the value of c from memory, adds those two values and then writes it back to memory at location b.  If multiple threads can access any of those memory locations, that value in memory, then the results are, as they say, "undetermined".  The language doesn't help you because neither the C or C++ language make any guarantees about the behavior of operations in the presence of multiple threads.

    So if you're not really careful, you can get truly strange and interesting data corruption issues - even simple operations like incrementing a variable can misbehave in strange and mysterious ways.

     

    Ok, having said all that, tomorrow I'll start discussing some relatively simple rules for writing concurrent applications.

  • Larry Osterman's WebLog

    Larry gets taken to task on concurrency

    • 23 Comments
    In yesterdays blog post, I was taken to task, first by Bob Frankston, and then by Raymond Chen about comments I made, and I want to talk a bit about their comments.

    Here's the relevant sections:

    From Bob Frankston (yeah, the Bob Frankston, I'm honored):

    I accidentally glanced at the comments and noticed the comment about 64 bit operations not being atomic. I guess I've been around too long and don't see any reason to assume any such operation is atomic. If the CLI doesn't make this promise then you can't assume it for any operation.
    With HT processors we're going to see a lot of subtle bugs appear and they'll all be blamed on mysterious viruses or maybe ghosts left in the CPU from processes that died horrible deaths.
    Seriously the current processors are the result of years of single threaded programming. Maybe we should make atomicity a well-defined part of the architecture instead of just assuming it because the program seems to work.

    I responded:

    Your point is actually a huge issue. On our currently supported 32bit processors, 32bit aligned memory operations are guaranteed (by the processor) to be atomic. However, in general, you're totally right - on 64bit processors this is explicitly NOT true - on those machines the rules on write ordering have been significantly relaxed, which can cause all sorts of issues.
    [...]
    Your comment about the ordering issue is why I suggested using the InterlockedExchangeAdd API - it asserts a memory barrier after the add to ensure that all (logical and physical) processors have a consistent view of the data.
    Btw, the biggest likely item to fail in the presence of misordered writes is the Double Checked Lock Pattern - here's a great discussion of why it fails on modern processors: http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
    The problem is that there is a lot of existent code that uses it that will start breaking in really subtle ways.

    And then Raymond chimed in (slightly edited):

    "The problem with this is that 64bit math isn't atomic on 32bit processors."
    Not even 32-bit math is atomic on 32-bit processors. Given
    static int i = 0;
    // Thread A
    i += 2;
    // Thread B
    i += 3;
    the final result might be i=2, or it could be i=3 or it could be i=5. if you want atomic arithmetic you must use an Interlocked function.

    Of course, Bob and Raymond are totally right. 

    And this shows just how hard understanding concurrency issues is.  In fact, both Raymond and I are correct in our assertions (he more than I because his claim is more conservative than mine).  This is because he and I are talking about different aspects of the concurrency issue.  My comments (about 64bit operations being atomic) has to do with the fact that a 32bit processor has no way of accessing 64bit addresses atomically.  What that means is that when a 32bit processor writes a 64bit value, it does it in two memory operations.

    And that means that between the two writes, another thread can come in and modify the value.  And that leads to corruptions.  This can't happen on current 32bit processors because they guarantee write ordering and write atomicity.

    Raymond's comments have to do with memory accesses on machines with more than one CPU core.  His scenario can't happen on the currently supported single cored 32bit architectures (it can on some of the 64bit architectures due to relaxed write ordering issues).  But if you've got more than one cpu core (either a HT machine or two distinct CPUs), then you can absolutely see it happen.

    The reason for this is that most arithmetic (and logical) operations are read-alter-write operations.  They read a value from memory, perform some operation on it and then write the value back.  If some other thread can come in during the read-alter-write operation, then any of the results that Raymond called out can occur.  For instance, in his example above, thread B reads the value of i before thread A updates the value adds 3 to 0 and writes back 3. 

    Having said that, Raymond's scenario can fail even on single cored processors.  You see, the reason his code doesn't fail on single cored processors is because he's adding a constant to a memory value.  And the processor has an instruction for adding constant values to memory addresses:
            add [i], 3

    But it doesn't take very many changes to the code to make it break on single cored machines:

    static int i = 0;
    int j = 2;
    int k = 3;
    // Thread A
    i += j;
    // Thread B
    i += k;
     

    Then Raymonds scenario will fail on a single processor machine - the reason for this is that in order to perform the read-alter-write operation on two memory variables requires that the read-alter-write operation be explicitly performed:

           move eax, [j]
        add [i], eax
     

    If there's more than one thread modifying j, then BAM!™ we've got a problem.

    So, when I wrote my original comment, I was thinking about the memory write case - relying on the fact that on 32bit processors, a natively aligned 32bit memory read is guaranteed to be atomic (and for 64bit processors, a 64bit aligned memory read is similarly guaranteed to be atomic).  On currently supported 32bit processors, however a 64bit memory read is guaranteed to NOT be atomic.

    Please note that I'm calling out reads explicitly - memory writes are currently guaranteed to be atomic on the current crop of 32bit processors, but that is not the case for all the currently supported 64bit processors - they allow write misordering - see the linked paper about the double checked lock pattern above for more details of why this is a bad thing.

    I'm going to spend a bit of time over the next few days talking about concurrency and the issues associated with concurrency.  The good news is that with a relatively small set of common sense rules, most of the issues I talked about become irrelevant (unless you're looking to squeeze your scalability to the absolute highest levels).

  • Larry Osterman's WebLog

    OT: The Woodinville Family Preschool hits 23 this month

    • 0 Comments

    Valorie just pointed out that the preschool that both our kids attended, Woodinville Family Preschool just hit 23 years old this month.

    The WFP is a cooperative pre-school located in a residential neighborhood off of Woodinville.  It's run by Cecile Mielenz as an adjunct of Shoreline Community College.

    I can't say too much about WFP - the program there isn't about providing a break for parents, instead the program is really about teaching parents how to be better parents, and about preparing kids for entry into school.

    In many ways, the parent education is just as important (or more important) as the kid education - many (most) of us don't naturally have the skills to deal with kids when we have them and programs like this help.

     

    If you have toddler age kids in the Woodinville, WA area, I strongly suggest you check them out.

     

  • Larry Osterman's WebLog

    WHat's wrong with this code, part 9 (File Copy) - the answers

    • 11 Comments
    Wow, where to begin on this one.

    Lets start with the easy one, the simple coding error.

    Here it is:

        //
        // Account for data read.
        //
        dataBlock->dataLengthWritten->QuadPart += dataBlock->readLength;
     

    The problem with this is that 64bit math isn't atomic on 32bit processors.  Which means that if you tried to copy files over 4G in size, then the addition would potentially mis-order the writes to the two halves of the dataLengthWritten field, causing measurement issues.

    To fix that, you need to use the InterlockedExchangeAdd64() function which will perform the operation atomically.

    Now for the complicated coding error.  That one shows up here:

       if (!SetFilePointerEx(dataBlock->sourceHandle, dataBlock->fileDataOffset, NULL, FILE_BEGIN))
        {
            printf("Unable to set current file position in source to %i64d: %d", dataBlock->fileDataOffset, GetLastError());
            exit(1);
        }
        DWORD readLength;
        if (!ReadFile(dataBlock->sourceHandle, dataBuffer, dataBlock->readLength, &readLength, NULL))
        {
            printf("Unable to read data from file: %d\n", GetLastError());
            exit(1);
        }
     

    The problem here is that the file wasn't opened for overlapped I/O.  If you don't specify FILE_FLAG_OVERLAPPED on the call to CreateFile, then the OS will synchronize access to the file.  It does that by acquiring a lock across each file operation.  But the key thing is that this lock is only held across individual operations.  So if there were two worker threads running, their calls to SetFilePointerEx call would reset each other.  In other words, you'd have:

    Thread 1 Thread 2
    Set file pointer to 0  
      Set file pointer to 20
    Read from file  
      Read from file

    In general, if you don't specify FILE_FLAG_OVERLAPPED, then the OS makes no guarantees about what happens with file I/O if you attempt to do the I/O on more than one thread. 

    So the first thread would think its reading at 0, but would in fact be reading at 20.  To fix this, the file would have to be opened with the FILE_FLAG_OVERLAPPED flag.  The copy routine would have to remove the calls to SetFilePointerEx and instead replace the ReadFIle with:

        DWORD readLength;
        OVERLAPPED readOverlapped;
        readOverlapped.hEvent = CreateEvent(NULL, FALSE, TRUE, NULL);
        readOverlapped.Offset = dataBlock->fileDataOffset.LowPart;
        readOverlapped.OffsetHigh = dataBlock->fileDataOffset.HighPart;
        if (!ReadFile(dataBlock->sourceHandle, dataBuffer, dataBlock->readLength, NULL, &readOverlapped))
        {
            DWORD error = GetLastError();
            if (error == ERROR_IO_PENDING)
            {
                if (!GetOverlappedResult(dataBlock->sourceHandle, &readOverlapped, &readLength, TRUE))
                {
                    printf("Unable to read data from file: %d\n", GetLastError());
                    exit(1);
                }
            }
            else
            {
                printf("Unable to read data from file: %d\n", GetLastError());
                exit(1);
            }
        }
        CloseHandle(readOverlapped.hEvent);
     

    And this leads neatly to the design errors.  As I mentioned, there are at least three of them.

    The first is fairly large - the reality is that this code doesn't actually overlap reads and writes - instead it replaces them with synchronous operations.  Which doesn't really generate a lot of parallelism - the code is still spending most of its time blocked waiting on I/O to complete.  Instead, the code should be reworked based on a finite state machine (or recoded to use ReadFIleEx with a callback function) to force the code to proceed in a more orderly fashion.

    The second design error is that the code will effectively read (and writes) randomly through the file, while QueueUserWorkItem queues requests in order, if there are more than one worker thread running, the order in which operations run is effectively random.  This means that the cache manager will thrash mightily trying to pre-read data.  This can be fixed by specifying FILE_FLAG_RANDOM_ACCESS on the CreateFile.  But the code STILL has problems - it would be nice if the OS could read ahead of the writes and doing random I/O doesn't help.  In reality, using unbuffered I/O (and limiting the I/O to the native cluser size on disk) is probably a better choice - in other words, accepting to cache the file data internally.  But the I/O ordering issue that leads to the third design failure.

    The third error is that by extending the file by doing asynchronous writes if you ever write beyond the current "valid data length" of the file, the filesystem starts a lazywriter thread backfilling the file data with zeros.  If you later come in and write that data (which will happen with this design) you end up writing twice as much data. And that leads to a fourth design error.

    The fourth design failure is that the code isn't recognizing the physical limitations of hard disks - hard disks typically have only one spindle and only one read arm (with multiple heads).  They can only do one thing at a time, which means that if your source and destination are on the same drive, overlapping the reads and writes in this manner is going to thrash the hard drives heads - which will hurt your performance.  You'd be better off reading as much of the data from the source at once and then writing it out to the destination.

     

    Oh, and as to why my testing didn't show any problems?  That's because QueueUserWorkItem is quite conservative about spinning up new threads - in all my testing, I never saw more than one thread running - so my code didn't actually speed up anything because it was never run in parallel.  And that's a fifth design error (although if you'd made the entire process fully asynchronous, it wouldn't need ANY worker threads).

    Kudos:

    B.Y. nailed almost all the issues in the first comment.

    The first issue (synchronization of the byte count was also caught by Skywing).

    Stuart Ballard caught the fact that the random writes had the potential of causing fragmentation issues.

    Harald totally nailed the perf issue covered by random I/O to the disks (the 4th issue above).

    Later on, B.Y. nailed the concept of using a read queue and a write queue for the I/O.

    Not bugs:

    Harald pointed out that alternate data streams aren't copied, that's by design (see the original article).

    Capt. Jean-Luc Pikachu asked about the operator new not throwing.  By default (unless you #include <new>) the C runtime operator new doesn't throw, instead it returns null.

    Mea Culpas:

    B.Y. also caught:

    • A missed error check on the QUeueUserWorkItem() call
    • A potential exception if an allocation error occurs inside the loop queuing the items

    Mike Dimmick also pointed out issues with the process heap critical section (used under the covers by operator new).  This is a very real scalability issue, and since this particular problem is all about scalability it is relevant.  He also picked up on some of the worker thread queue issues that made up the fifth error above.

     

    The reason I started writing this "what's wrong with this code" post was to come up with an example that shows the issue of doing I/O to a synchronous file handle on more than one thread, it was only after writing the code that I started realizing what a truly extraordinary bug farm that could be found from just the simple task of copying a file as quickly as possible.  When I started writing the answer, I thought I'd identified only two design issues, by the time I'd finished I had five of them.

    The simple answer is: Use CopyFile and let the OS do the heavy lifting for you :)

  • Larry Osterman's WebLog

    What's wrong with this code, part 9 - File Copy

    • 31 Comments

    Yes, it's time for another "What's wrong with this code"...

    Today's example of bad coding involves an attempt to speed up file copy operations by overlapping the read and write operations to the file.  The theory is that the system can be reading from one part of the source file while its writing to another part of the destination.

    To do this, the author of the code (ok, it was me, constructing a contrived example - this isn't The Daily WTF) tried (quite badly it turns out) to perform the file I/O by queueing the work to worker threads.

    struct FileCopyBlock
    {
        LARGE_INTEGER fileDataOffset;
        DWORD readLength;
        HANDLE sourceHandle;
        HANDLE destinationHandle;
        PLARGE_INTEGER dataLengthWritten;
    };
    const DWORD readChunkSize = 0x10000;

    DWORD WINAPI CopyFileDataChunk(LPVOID Context)
    {
        FileCopyBlock *dataBlock = (FileCopyBlock *)Context;
        BYTE *dataBuffer = new BYTE[dataBlock->readLength];
        if (dataBuffer == NULL)
        {
            printf("Unable to allocate data block for copy\n");
            exit(1);
        }
        if (!SetFilePointerEx(dataBlock->sourceHandle, dataBlock->fileDataOffset, NULL, FILE_BEGIN))
        {
            printf("Unable to set current file position in source to %i64d: %d", dataBlock->fileDataOffset, GetLastError());
            exit(1);
        }
        DWORD readLength;
        if (!ReadFile(dataBlock->sourceHandle, dataBuffer, dataBlock->readLength, &readLength, NULL))
        {
            printf("Unable to read data from file: %d\n", GetLastError());
            exit(1);
        }
        if (readLength != dataBlock->readLength)
        {
            printf("Unable to read all from file: %d\n", GetLastError());
            exit(1);
        }
        if (!SetFilePointerEx(dataBlock->destinationHandle, dataBlock->fileDataOffset, NULL, FILE_BEGIN))
        {
            printf("Unable to set current file position in destination to %i64d: %d\n", dataBlock->fileDataOffset, GetLastError());
            exit(1);
        }
        DWORD writeLength;
        if (!WriteFile(dataBlock->destinationHandle, dataBuffer, dataBlock->readLength, &writeLength, NULL))
        {
            printf("Unable to read data from file: %d\n", GetLastError());
            exit(1);
        }
        if (writeLength != dataBlock->readLength)
        {
            printf("Unable to write all from file - out of disk space?\n");
            exit(1);
        }
        //
        // Account for data read.
        //
        dataBlock->dataLengthWritten->QuadPart += dataBlock->readLength;
        //
        // Free our buffers
        //
        delete [] dataBuffer;
        delete dataBlock;
        return 0;
    }

    void MyCopyFile(LPCTSTR SourceFile, LPCTSTR DestinationFile)
    {
        HANDLE sourceHandle;
        HANDLE destinationHandle;

        sourceHandle = CreateFile(SourceFile, FILE_READ_DATA, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
        if (sourceHandle == INVALID_HANDLE_VALUE)
        {
            printf("Error opening source: %d\n", GetLastError());
            return;
        }
        destinationHandle = CreateFile(DestinationFile, FILE_WRITE_DATA, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, sourceHandle);
        if (destinationHandle == INVALID_HANDLE_VALUE)
        {
            DWORD error = GetLastError();
            CloseHandle(sourceHandle);
            printf("Error opening destination: %d\n", error);
            return;
        }
        LARGE_INTEGER fileSize;
        if (!GetFileSizeEx(sourceHandle, &fileSize))
        {
            printf("Unable to get file size: %d\n", GetLastError());
            CloseHandle(sourceHandle);
            CloseHandle(destinationHandle);
            return;
        }
        //
        // Break the read up to 64K chunks, queuing it to a worker thread to do the copy.
        //
        LARGE_INTEGER currentOffset;
        LARGE_INTEGER fileSizeRemaining;
        LARGE_INTEGER fileSizeCopied;
        fileSizeRemaining.QuadPart = fileSize.QuadPart;
        fileSizeCopied.QuadPart = 0;
        currentOffset.QuadPart = 0;
        while (fileSizeRemaining.QuadPart != 0)
        {
            FileCopyBlock *copyBlock = new FileCopyBlock;
            if (copyBlock == NULL)
            {   
                printf("Unable to allocate copy block\n");
                CloseHandle(sourceHandle);
                CloseHandle(destinationHandle);
                return;
            }
            if (fileSizeRemaining.QuadPart < readChunkSize)
            {
                copyBlock->readLength = fileSizeRemaining.LowPart;
            }
            else
            {
                copyBlock->readLength = readChunkSize;
            }
            copyBlock->fileDataOffset = currentOffset;
            copyBlock->sourceHandle = sourceHandle;
            copyBlock->destinationHandle = destinationHandle;
            copyBlock->dataLengthWritten = &fileSizeCopied;
            currentOffset.QuadPart += copyBlock->readLength;
            fileSizeRemaining.QuadPart -= copyBlock->readLength;

            QueueUserWorkItem(CopyFileDataChunk, copyBlock, WT_EXECUTEINIOTHREAD);
        }

        while (fileSizeCopied.QuadPart != fileSize.QuadPart)
        {
            Sleep(100);
        }
        CloseHandle(sourceHandle);
        CloseHandle(destinationHandle);
    }

    Clearly this isn't a real world example.  For starters, there is essentially no error recovery - this is by design and is not a bug - adding in real code to handle the errors would make this example even longer than it currently is.  So errors are simply handled with exit(1).

    In addition, the code to handle the end of read (sleeping 100) is indescribably cheesy - it's horribly inefficient and defeats the primary purpose of making the I/O overlapped - if you time this file copy routine, it will always take a multiple of 100ms because of the sleep - which wastes a lot of time on small files.

    The code also doesn't handle ACLs and alternate data streams (although it should handle file attributes and EAs).  This is also not a bug.  And the file always creates the destination even if it exists, again, that's by design.

    But even with all these caveats, there are still at least five major bugs in this code.  Three of them are serious design flaws, one is a complicated implementation issue, and the other one of them is a simple implementation issue.

    The fascinating thing about this is that even with these flaws, the code works just fine in many cases - I was unable to introduce the failure using my test harness, even though the errors are quite fundamental.

    As always, tomorrow will be the answers and kudos for those who found the problem (and those who found problems I missed while testing this).

  • Larry Osterman's WebLog

    Faster Syscall Trap redux

    • 12 Comments
    Raymond has a funny historical article about how Windows made system calls on 386 processors.

    What he left out was the 286 version of this story.  Microsoft and Intel had a similar meeting to the one that Raymond described with the 386, but in that case, one of the Microsoft requests was for the ability to switch from protected mode back into real mode.

    You see, the 286 had the ability to switch from real mode (no virtual memory) to protected mode (virtual memory), but not back.  The theory was that you'd never want to go back to real mode, that would be "silly".

    But of course, that doesn't deal with the issue of compatibility.  OS/2 supported one real mode application running in the system, in the "DOS Box".  The DOS box was essentially just another task, it got time sliced like other processes (ok, it really didn't, but conceptually that's what happened), so the system did a LOT of switching between real mode and protected mode.

    It was critical that we be able to switch from protected mode back to real mode (when switching to the DOS box).  The problem is that the only documented way of doing this was to write to the keyboard controller device (which controls WAY more just the keyboard on a PC).  Unfortunately, the keyboard controller was REALLY slow, and this mechanism took literally milliseconds.

    So Microsoft went crazy trying to find a fast way of switching back to real mode.

    Eventually they found it.

    Their solution:

        LIDT -1
        INT 1

    What did this do?  Well, LIDT -1 sets the interrupt descriptor table to an invalid physical address.  The system tried to execute the INT 1 instruction, which caused it to fault the IDT into memory (a fault).  Well, the IDT couldn't be found, so that raised a not present fault (a double fault).  The not present fault tried to fault in the not present fault handler, which failed (a triple fault).

    The 286 processor couldn't handle faults more than 3 deep, so it gave up the ghost and reset itself.  Which caused the system ROM to start executing, and we simply set the real mode start address (which was kept in real memory) and poof! we had transferred from protected mode to real mode in microseconds (not milliseconds).

    I actually found a write-up of this technique on the web here.  Interestingly enough, the article on the web credits Intel for this technique, they may be right, I remember it being developed in-house, but I may be wrong.  After I unpack my office, I'll check my 286 reference manual.

    Much later: I just checked my 286 reference manual (a rather well thumbed first edition), and I found the reference that was discussed in that article.  The comment accompanying the comment is "Setup null IDT to force shutdown on any protection error or interrupt".  That's it, that's the only hint that Intel came up with the triple faulting technique to force a restart in real mode.  Personally, I'm not surprised that the IBM engineers didn't pick up on this.

     

  • Larry Osterman's WebLog

    Book Review: Find the Bug by Adam Barr

    • 13 Comments
    Several months ago, I mentioned Adam Barr's book "Find the Bug".  I pre-ordered it from Amazon, but have only now finally finished reading it, so it took me somewhat longer to get this book review out :)

    This is the first book review I've done since middle school, so please bear with the somewhat free-form style :)

     

    First off, I liked the book, and I'd heartily recommend it to groups that are doing code reviews as a primer.

    In "Find the Bug", Adam presents 50 vaguely real world examples of code bugs that are likely to be found only by debugging or careful code inspection.  He spreads the bugs across five different languages (C, Java, Python, Perl, and x86 assembly), providing ten different bugs in each.

    The book is organized into five different sections, one for each language.  Each section contains a brief primer on the language, basically enough information for a reader unfamiliar with the language to be able to determine the bug even if they don't know the language (for instance, I don't know Python, but was able to find most of the bugs in the python section after reading the language overview).

    For each bug, Adam follows a basic template.  He starts with a brief description of the purpose of the function (sorting entries, searching for values, playing LIFE, playing go fish, etc), the code for the function, a series of questions about the code, and a series of hints to aid in the process of determining the bug.  Finally, he explains the origin of the bug and presents the corrections needed for the bug.

    Adam does a good job of presenting a fairly broad set of potential bugs, and given the diverse languages that he presents, there are opportunities for people to develop their code reviewing skills in a series of different languages.

    Adam has a good writing style (as shown in his previous blook Proudly Serving my Corporate Masters), and it carries forward in this book - the examples are well written and read easily.

    One thing I did not like about the book was the bug classification system.  Adam categorizes all the bugs in the book according to a taxonomy of bugs based roughly on the taxonomy that Donald Knuth used for TeX.  Personally, I found the taxonomy rather distracting.  Especially since the taxonomy of some of the bugs wasn't clear - for instance, bug 7.2 (assembly language, multiply two numbers using shifts) could be either F.location or A.logic depending on how you wanted to classify them. 

    It's also clear that Adam's not an assembly language programmer.  I was cringing reading some of the assembly language sections (especially the multiply using shifts example) because the code looked "wrong" to my eyes.  Essentially, the root of the problem was that Adam didn't use some of the "standard" tricks that a seasoned assembly language programmer would use.  For instance he had a loop where he used a TEST ECX, ECX/JNZ <destination> - an assembly language programmer would use JECXZ <temporary label ahead of the jmp>/JMP <destination> instead (because JCXZ doesn't change the flags register).  He also zeroed registers using MOV EAX, 0 instead of XOR EAX, EAX (which results in slightly smaller code).  These are total nits however :).

    Find the Bug's not a good general purpose textbook, but, as I said, it's a really good primer to help hone code reviewing techniques.  It's going to stay on the bookshelf.

    I give it a thumbs up :)

     

    Oh, and I've got to point out the Kanji Backspace problem (sorry Adam, I had to point it out).  This is problem 3.10 (C language, Kanji Backspace).  Adam's blogged about this one, but the upshot is that we had a disagreement about the Kanji Backspace algorithm as presented in his book - the algorithm does work, but the code as presented doesn't correctly identify Kanji lead byte characters.  The algorithm specified also also only works for DBCS character sets like Kanji - it doesn't work for general purpose multi-byte character sets.

     

    Details:

    Title: Find the Bug
    Author: Adam Barr
    ISBN: 0-321-22391-8

     

    Full Disclosure: Adam's a friend of mine, we've known each other for at least 12 years now.

     

    Edit: Removed nonexistant JECXNZ instruction, thanks Benny.

     

  • Larry Osterman's WebLog

    Somewhat Offtopic: For parents/interested parties in the Seattle area.

    • 1 Comments

    The Northshore School District (where Valorie works) is currently investigating ways of improving the quality of education in NSD.

    From what Valorie's told me, they seem to be willing to put just about any options on the table, ranging from extending the current PACE program to all 12 grades, establishing an IB program that runs from Junior High through graduation, establishing magnet schools for science and technology, fine arts, etc.  The list of options they're considering is quite dizzying.

    Anyway, the NSD has authored a survey to help gather feedback from NSD parents and the community to help determine the direction in which they want to go, and everyone is invited to participate.

    The survey can be found here: http://www.zoomerang.com/survey.zgi?p=WEB223ZSQCNU2M

     

  • Larry Osterman's WebLog

    Riffing on Raymond - FindFirst/FindNext

    • 16 Comments

    As I mentioned, I've been Riffing on Raymond a lot - Yesterdays post from Raymond got me to thinking about FindFirst and FindNext in MS-DOS.

    As Raymond pointed out:

    That's because the MS-DOS file enumeration functions maintained all their state in the find structure. The FAT file system was simple enough that the necessary search state fit in the reserved bytes and no external locking was necessary to accomplish the enumeration. (If you did something strange like delete a directory while there was an active enumeration on it, then the enumeration would start returning garbage. It was considered the program's responsibility not to do that. Life is a lot easier when you are a single-tasking system.)

    The interesting thing about the fact that MS-DOS kept its state in a the reserved bytes of the find structure was that there were a bunch of apps that figured this out.  And then they realized that they could make suspend and resume their searches by simply saving away the 21 reserved bytes at the start of the structure and spitting them into a constant find first structure.

    So a program would do a depth first traversal of the tree, and at each level of the tree, instead of saving the entire 43 byte FindFirst structure, they could save 22 bytes per level of the hierarchy by just saving the first 21 bytes of the structure.  In fact, some of them were even more clever, they realized that they could save just the part of the reserved structure that they thought were important (something like 8 bytes/level).

    And that's just what they did...

    Needless to say, that caused kittens when the structures used for search had to change - these apps looked into the internal data structures and assumed they knew what they did...

     

  • Larry Osterman's WebLog

    OMG, I've been slashdotted!

    • 12 Comments

    Wow.  I came in and yesterdays (and Wednesdays) post were on the front page of /.

    I know that because I came in and found 40+ messages waiting moderation.  My first thought was that it was comment spam.  I looked at the first of them, realized it wasn't, so I figured someone had picked up the article.  Wandered over to slashdot, and what do you know, I'm on the front page :)

    I'm wading through the comments right now, but I do want to point any slashdot readers to my comment policy.  It's on the front page, but people might not click on it.

    The bottom line is that I try to be REALLY tolerant of comments.  I've only moderated three or four non-spam comments in over 200+ posts and 2500ish comments.

    But I do request that the discussion remain civil, "Microsoft is teh suk" posts will be moderated.

    The other thing that will be moderated is foul language.  I don't tolerate it on my blogs comments, this is a family blog, and I feel pretty strongly about this issue.

     

  • Larry Osterman's WebLog

    Tipping Points

    • 84 Comments

    One of my birthday presents was the book "The Tipping Point" by Malcolm Gladwell.

    In it, he talks about how epidemics and other flash occurances happen - situations that are stable, and a small thing changes and suddenly the world changed overnight.

    I've been thinking a lot about yesterdays blog post, and I realized that not only is it a story about one of the coolest developers I've ever met, it also describes a tipping point for the entire computer industry.

    Sometimes, it's fun to play the "what if" game, so...

    What if David Weise hadn't gotten Windows applications running in protected mode?  Now, keep in mind, this is just my rampant speculation, not what would have happened.  Think of it kinda like the Marvel Comics "What if..." series (What would have happened if Spiderman had rescued Gwen Stacy, etc [note: the deep link may not work, you may have to navigate directly]).

    "What If David Weise hadn't gotten Windows applications running in protected mode..."[1]

    Well, if Windows 3.0 hadn't had windows apps running in protected mode, then it likely would have not been successful.  That means that instead of revitalizing interest in Microsoft in the MS-DOS series of operating systems, Microsoft would have continued working on OS/2.  Even though working under the JDA was painful for both Microsoft and IBM, it was the best game in town.

    By 1993, Microsoft and IBM would have debuted OS/2 2.0, which would have had supported 32bit applications, and had MVDM support built-in.

    Somewhere over the next couple of years, the Windows NT kernel would have come out as the bigger, more secure brother of OS/2, it would have kept the workplace shell that IBM wrote (instead of the Windows 3.1 Task Manager).

    Windows 95 would have never existed, since the MS-DOS line would have withered and died off.  Instead, OS/2 would be the 32bit application for lower end machines.  And instead of Microsoft driving the UI story for the platform, IBM would have owned it.

    By 2001, most PC class machines would have OS/2 running on them (probably OS/2 2.5) with multimedia support.  NT OS/2 would also be available for business and office class machines.  With IBMs guidance, instead of the PCI bus becoming dominant, the MCA was the dominant bus form factor.  The nickname for the PC architecture wasn't "Wintel", instead it was "Intos" (OS2tel was just too awkwards to say).  IBM, Microsoft and Intel all worked to drive the hardware platform, and, since IBM was the biggest vendor of PC class hardware, they had a lot to say in the decisions.

    And interestingly enough, when IBM came to the realization that they could make more money selling consulting services than selling hardware, instead of moving to Linux, they stuck with OS/2 - they had a significant ownership stake in the platform, and they'd be pushing it as hard as they can.

    From Microsoft's perspective, the big change would be that instead of Microsoft driving the industry, IBM (as Microsoft's largest OEM, and development partner in OS/2) would be the driving force (at least as far as consumers were concerned).  UI decisions would be made by IBM's engineers, not Microsoft's.

    In my mind, the biggest effect of such a change would be on Linux.  Deprived of the sponsorship of a major enterprise vendor (the other enterprise players followed IBMs lead and went with OS/2), Linux remained as primarily an 'interesting' alternative to Solaris, AIX, and the other *nix based operating systems sold by hardware vendors.  Instead, AIX and Solaris became the major players in the *nix OS space, and flourished as an alternative. 

     

    Anyway, it's all just silly speculation, about what might have happened if the industry hadn't tipped, so take it all with a healthy pinch of salt.

    [1] I'm assuming that all other aspects of the industry remain the same: The internet tidal wave hit in the mid 90s, computers remained as fast as they had always, etc. - this may not be a valid set of assumptions, but it's my fantasy.  I'm also not touching on what affects the DoJ would have had on the situation.

  • Larry Osterman's WebLog

    Farewell to one of the great ones

    • 48 Comments
    Yesterday was the last day at Microsoft for David WeiseI've written about David (in passing) in the past, but never in detail.

    David started at Microsoft in 1986, when Microsoft acquired Dynamical Systems Research.  Before founding DSR, he was a member of the ORIGINAL MIT blackjack team - not the latecomers that you see in all the movies, but the original team, back in the 1970s.  According to Daniel Weise (David's twin brother), they ran it like an investment company - MIT people could invest money in the blackjack team, and the blackjack team would divide their winnings up among them.  Apparently RMS was one of the original investors, during David's going away party, Daniel joked that the FSF was founded on David's blackjack winnings :)

    After leaving Princeton with a PhD in molecular biophysics, David, Chuck Whitmer, Nathan and Cameron Myhrvold, and a few others founded DSR to create a "Topview" clone.  For those new to the industry, Topview was a text based multitasking shell that IBM created that was going to totally revolutionize the PC industry - it would wrest control of the platform from Microsoft and allow IBM to maintain its rightful place as leader of the PC industry.  Unfortunately for IBM, it was an utter flop.

    And, as Daniel pointed out, it was unfortunate for DSR.  Even though their product was twice as fast as IBMs and 2/3rds the size, when you base your business model on being a clone of a flop, you've got a problem.

    Fortunately, at the time, Microsoft was also worried about Topview, and they were looking for a company that understood the Topview environment so that if it was successful, Microsoft would have the ability to integrate Topview support into Windows.

    Finding DSR may have been one of the best acquisitions that Microsoft ever made.  Not only did they find the future CTO (and founder of Microsoft Research) Nathan Myhrvold, but they also hired David Weise.

    You see, the DSR guys were wizards, and David was a wizard's wizard.  He looks at programs and makes them smaller and faster.  It's absolutely magical to watch him at his work.

    I (and others) believe that David is single handedly responsible for making Microsoft over a billion dollars.  He's also (IMHO) the person who is most responsible for the success of Windows 3.0.

    Everywhere David worked, he dramatically improved the quality of the product.  He worked on the OS/2 graphics drivers and they got faster and smarter.  He (and Chuck) figured out tricks that even the designers of the hardware didn't realize could be done.

    And eventually, David found himself in the Windows group with Aaron Reynolds, and Ralph Lipe (and several others).

    Davids job was to move the graphics drivers in windows into protected mode on 286 and better processors (to free up precious memory below 640K for Windows applications).  He (and Chuck) had already figured out how to get normal Windows applications to use expanded memory for their code and data, but now he was tackling a harder  problem - the protected mode environment is subtler than expanded memory - if you touched memory that wasn't yours, you'd crash.

    David succeeded (of course).  But David, being David, didn't stop with the graphics drivers.

    He (along with Murray Sargent, creator of the SST debugger) also figured out how to get normal Windows applications running in protected mode.

    Which totally and utterly and irrevocably blew apart the 640K memory barrier.

    I remember wandering over to the Windows group over in Building 3 to talk to Aaron Reynolds about something to do with the MS-DOS redirector (I was working on DOS Lan Manager at the time).  I ran into David, and he called me into his office "Hey, look at what I've got working!".

    He showed me existing windows apps running in protected mode on the 286.  UNMODIFIED Windows 1.0 applications running in protected mode.

    He then ran me around the rest of the group, and they showed me the other stuff they were working on.  Ralph had written a new driver architecture called VxD.  Aaron had done something astonishing (I'm not sure what).  They had display drivers that could display 256 color bitmaps on the screen (the best OS/2 could do at the time was 16 colors).

    My jaw was dropping lower and lower as I moved from office to office.  "Oh my goodness, you can't let Steve see this, he's going to pitch a fit" (those aren't quite the words I used, but this is a family blog).

    You see, at this time, Microsoft's systems division was 100% focused on OS/2 1.1.  All of the efforts of the systems division were totally invested in OS/2 development.  We had invested literally tens of millions of dollars on OS/2, because we knew that it was the future for Microsoft.  OS/2 at the time just ran a single DOS application at a time, and it had only just recently gotten a GUI (in 1989).  It didn't have support for many printers (only about 5, all made by IBM, and (I believe) the HP Laserjet).

    And here was this little skunkworks project in building three that was sitting on what was clearly the most explosive product Microsoft had ever produced.  It was blindingly obvious, even at that early date - Windows 3.0 ran multiple DOS applications in virtual x86 machines.  It ran Windows applications in protected mode, breaking the 640K memory barrier.  It had a device driver model that allowed for development of true 32bit device drivers.  It supported modern displays with color depths greater than had been available on PC operating systems. 

    There was just no comparison between the two platforms - if they had to compete head-to-head, Windows 3.0 would win hands down.

    Btw, David had discussed it with Steve (I just learned that yesterday).  As David put it, he realized that this was potentially an issue, so he went to Steve, and told him about it.  Steve asked Dave to let him know when he'd made progress.  That night, David was up until 5AM working on the code, he got it working, and he'd left it running on his machine.  He left a note on SteveB's office door saying that he should stop by David's office.  When David got in the next day (at around 8AM), he saw that his machine had crashed, so he knew that Steve had come by and seen it.

    He went to Steve's office, and they had a chat.  Steve's only comment was that David should tell his manager and his manager's manager so that they'd not be surprised at the product review that was going to happen later that day.  At the product review, Steve and Bill greenlighted the Windows 3.0 project, and the rest was history.  My tour was apparently a couple of days after that - it was finally ok to let people know what the Windows 3.0 team was doing.

    The rest was history.  At its release, Windows 3.0 was the most successful software project in history, selling more than 10 million copies a month, and it's directly responsible for Microsoft being where it is today.

    And, as I mentioned above, David is responsible for most of that success - if Windows 3.0 hadn't run Windows apps in protected mode, then it wouldn't have been the unmitigated success it was.

    David's spent the last several years working in linguistics - speech generation, etc.  He was made a distinguished engineer back in 2002, in recognition of his contribution to the industry. The title of Distinguished Engineer is the title to which all Microsoft developers aspire, it is literally the pinnacle of a developers career at Microsoft when they're named DE - other DE's include Dave Cutler, Butler Lampson, Jim Gray, Anders Hejlsberg.  This is unbelievably rarified company - these are the people who have literally changed the world.

    And David absolutely belongs in their company.

    David's leaving to learn more about the state of molecular biology today, he wants to finally be able to use his PhD, the field has changed so much since he left it, and it's amazing what's happening in it these days.

    As I said as I was leaving his goodbye party:

    "Congratulations, good luck, and, from the bottom of my heart, thank you".

    Bonne Chance David, I wish you all the best.  When you get your Nobel Prize, I'll be able to say "I knew him back when he worked at Microsoft".

     

    Edit: Corrected David's PhD info based on Peter Woit's blog post here.  Sorry David, and thanks Peter.

    Edit2: Grey->Gray :)  Thanks Jay

  • Larry Osterman's WebLog

    We've RI'ed!!!

    • 55 Comments
    We've RI'ed!

    ??  What on earth is he talking about ??

    An RI is a "Reverse Integration".  The NT source system is built as a series of branches off of a main tree, and there are two sets of operations that occur - when a change is made to the trunk, the changes are "forward integrated" to be branches.  New feature development goes on in the branches, and when the feature is ready for "prime time", the work is "reverse integrated" back into the main tree, and those changes are subsequently forward integrated into the various other branches.

    The primary reason for structure is to ensure that the trunk always has a high level of quality - the branches may be of varying quality levels, but the main trunk always remains defect free.

    Well, yesterday afternoon, our feature RI'ed into the main multimedia branch, this is the first step towards having our code in the main Windows product (which should happen fairly soon).

    When a feature is RI'ed into any of the main Windows branches, code has to go through a series of what are called "Quality Gates".  The quality gates are in place to ensure a consistent level of engineering quality across the product - among other things, it ensures that the feature has up-to-date test and development specifications, an accurate and complete threat model, that the tests for the feature have a certain level of code coverage.  There are a bunch of other gates beyond these, but they're related to internal processes that aren't relevant.

    The quality gates may seem like a huge amount of bureaucracy to go through, and they can be difficult, but their purpose is really worthwhile - the quality gates are what ensures that no code is checked into the trunk that doesn't meet the quality bar for being a part of Windows.

    Our team's been working on this feature (no, I can't say what it is, yet :() for over three years, it's been a truly heroic effort on the part of everyone involved, but especially on the part of the group's development leads, Noel Cross and Alper Selcuk, who were at work at 2AM every day for most of the past three weeks ensuring that all the I's were dotted and the T's were crossed.

    This is SO cool.

    Edit: Cut&Paste error led to typo in Noel's name

     

Page 1 of 1 (21 items)