My friend Tim Dodd found this presentation back when we worked together at ISS somewhere around '96-'97. It's by John Ousterhout, who worked at Sun Microsystems Laboratories – the deck is dated 9/28/95. We found it hilarious, because we worked with a lot of highly threaded code on a daily basis, and didn't think we were doing anything really special. The really funny thing is the slide where the author claims that threads are "too hard for most programmers to use", and "Even for experts, development is painful." The diagram has a range from casual programmers to wizards, with thread programmers at the high end of the wizard range. So we must be wizards! Tim really is a wizard, though he's far too modest to label himself as such.

I recently ran into a bug in my code that made me think that maybe thread programming is for wizards – here's the problem – if you're dealing with a named pipe asynchronously, one of the best ways to do it is using I/O completion ports – really efficient, and I used that in the sample service I put into Writing Secure Code for Vista. In the piece of code I was working on, there could be at most one read pending against the pipe, but possibly a lot of writes. If the client goes away without warning, you get messages posted to the port for each one, and you want them all to clear before you reset the pipe for a new client. I'd put in a simple reference counter class using InterlockedIncrement and InterlockedDecrement, and for sanity, I asserted that the ref count should never go to less than 0. Of course, I hit the assert… The code looked about like this:

if( WriteFile(…. ))

So what happened?

As it turns out once you go down into WriteFile, somewhere down in kernel mode, likely in NtWriteFile, the write happens, it causes the message to get posted to the port, and then your other thread, which these days has a whole processor of its own, comes out of its wait, and is able to call m_WriteCount.Release() before the other thread can even get back up the call stack and call Add(). Ouch.

So while IO completion ports are really cool, and highly efficient, the mixture of these and modern multi-core systems add up to some serious race condition potential. It also calls out the need for test rigs – the test rig was able to force this failure reliably (though it still took well over 10k iterations), where the app I'd bolt it into might not find this easily, if ever, and getting repro steps would be a ton of fun.

 Oh - found a link to the presentation -