clip_image001

Hi. This is Tom Ball. I am a Principal Researcher at Microsoft Research, where I manage the Software Reliability Research group in the Research in Software Engineering area.

On behalf of the CHESS team, I am happy to announce our first DevLabs pre-release of the CHESS tools (build 0.1.30106.5) for finding subtle concurrency errors in multithreaded single-process Windows and .NET programs.

CHESS is specifically designed for concurrency unit-testing and requires that you provide a set of test functions, each testing a particular concurrency scenario in the program. CHESS exhaustively enumerates all thread schedules of a test function by systematically inserting preemptions (unplanned interruptions of a thread) at various points in a program’s execution.

CHESS is realized as a test host for Visual Studio Team System 2008, as well as a set of command-line tools for analyzing .NET and unmanaged code. CHESS also includes a simple graphical user interface for exploring error traces of concurrent programs called Concurrency Explorer.

This post gives a glimpse of the Visual Studio Team System 2008 integration. Later posts will describe the command line tools. You can find out more about CHESS at our home page and MSDN forum.

Exploring Thread Schedules with CHESS

We provide a test host ([HostType(“Chess”)]) for Visual Studio Team System 2008 that runs managed unit tests under control of CHESS. Let’s take a quick look at a test of a bank Account class that is supposed to be thread-safe (you can find the code for Account at the end of this post):

    [TestClass]
    public class TestBank
    {
        [TestMethod]
        public void WithdrawAndDepositConcurrently()
        {
            var account = new Account(10);
            var child = new Thread(
               o => { (o as Account).Withdraw(2); }
               );
            child.Start(account);
            account.Deposit(1);
            child.Join();

            Assert.AreEqual<int>(9, account.Read());
        }
    }

The attributes [TestClass] and [TestMethod] tell Visual Studio that the class TestBank and method WithdrawAndDepositConcurrently are test code. The body of the method creates an Account instance with $10. It then creates a child thread that will withdraw $2 from the account. The main thread starts the child thread and concurrently deposits $1 in the same account and then waits for the child thread to complete. Of course, regardless of the thread schedule, we expect the account to contain $9 at the end, as asserted in the final statement. We ran this test and got the following output:

image

Should we be satisfied that the test passed? Our answer is “definitely not!” This is because this test has no control over which thread schedule executes. To test the code with CHESS, we simply attribute the WithdrawAndDepositConcurrently method with the HostType attribute, as shown below:

        [TestMethod]
        [HostType("Chess")]
        public void WithdrawAndDepositConcurrently()
        {

Running the test again, we see the following:

image 

We ended up with $8 instead of $9 - not good! If we run the test again, we get exactly the same result. This is because CHESS explores thread schedules in a deterministic order. That is, CHESS does not randomly perturb the thread scheduling but instead systematically explores the thread schedules.

Reproducing a Buggy Thread Schedule with CHESS

For long running tests, the number of schedules that CHESS explores can be enormous: CHESS may explore thousands if not tens of thousands of schedules before finding an error. When CHESS does find an error, it records an ASCII representation of the thread schedule that led to the bug.  With this schedule, you can use CHESS to immediately reproduce the bug without waiting through the many bug-free schedules CHESS explored to find the bug. To access the CHESS repro, we double click on the test in the above pane to see:

image 

The “Error Message” section details the nature of the error. CHESS uses “Standard Console Output” section to give you information about the test and how to reproduce the error. The section “Standard Console Error” section contains a set of attributes that will help you reproduce the error with CHESS. We click on the link to copy this section’s content to the clipboard and then paste the contents of the clipboard before the method WithdrawAndDepositConcurrently, so the code looks like:

clip_image008

Note that the ChessScheduleString is in a region so you can hide it; it is not intended to be human-readable. The string contains the schedule of events in the thread schedule that caused the assertion violation. There are two new attributes to direct CHESS. The first (“ChessMode”) tells CHESS to reproduce the execution directed by the CHESS schedule string. (The other mode of CHESS is the default exploration mode in which CHESS enumerates the thread schedules.)

Debugging with CHESS

Now we will run the test under the control of the debugger, with CHESS controlling the schedule, to find the source of assertion violation. When debugging, the second attribute (“ChessBreak”) is active. This directive tells CHESS to break before each thread preemption (recall that a preemption is an unexpected context switch).  CHESS has a vocabulary of concurrency primitives that you can use when debugging. As shown below, the first breakpoint takes place just before the main thread is about to acquire a lock on the Account in order to perform the deposit:

clip_image010

This is the spot of the first preemption which transfers control from the main thread to the child thread. We now hit F10 to jump to the next preemption, which takes place in the child thread:

clip_image011

The second breakpoint takes place just after the child thread has read the value of the Account into the local variable “temp” (which has value 10) but just before the child thread is about to acquire a lock on the Account in order to perform the withdrawal. The error in the code is immediately obvious, as the comment explains: there is a window of time after the read of the Account’s balance but before the withdrawal in which another thread can interrupt the child thread.

We press F10 a few times to see that control returns to the main thread which performs the deposit (raising the balance to 11 dollars):

clip_image012

The main thread then blocks in the call child.Join(), waiting for the child thread to continue:

clip_image014

We press F10 a few more times until the assignment statement of the child thread is highlighted, as shown below. Hovering over the variable “balance”, we see that the current balance is 11, reflecting the deposit of the parent thread:

clip_image016

Hovering over the local variable “temp”, we see that its value is the old/stale value of balance (10):

clip_image018

Oops! Running to completion, we will witness that the assertion fails (because Withdraw will subtract 2 from 10 to get 8). The complete code of the buggy Account class is:

    public class Account
    {
        private int balance;

        public Account(int amount)
        {
            balance = amount;
        }

        public void Withdraw(int amount)
        {
            int temp = Read();
            // oops, temp could become stale if we are
            // preempted here
            lock (this)
            {
                balance = temp - amount;
            }
        }

        public int Read()
        {
            int temp;
            lock (this)
            {
                temp = balance;
            }
            return temp;
        }

        public void Deposit(int amount)
        {
            lock (this)
            {
                balance = balance + amount;
            }
        }
    }

Don’t Stress, Test with CHESS!

Please download CHESS, try it out on your code and send us comments via our forum. Enjoy!

- Tom Ball for the CHESS Team