For better or worse, the Windows UI model ties a window to a particular thread, that has led to a programming paradigm where work is divided between "UI threads" and "I/O threads". In order to keep your application responsive, it's critically important to not perform any blocking operations on your UI thread and instead do them on the "I/O threads".
One thing that people don't always realize is that even asynchronous APIs block. This isn't surprising - a single processor core can only do one thing at a time (to be pedantic, the processor cores can and do more than one thing at a time, but the C (or C++) language is defined to run on an abstract machine that enforces various strict ordering semantics, thus the C (or C++) compiler will do what is necessary to ensure that the languages ordering semantics are met).
So what does an "async" API really do given that most APIs are written in languages that don't contain native concurrency support ? Well, usually it packages up the parameters to the API and queues it to a worker thread (this is what the CLR does for many of the "async" CLR operations - they're not really asynchronous, they're just synchronous calls made on some other thread).
For some asynchronous APIs (like ReadFile and WriteFile) you CAN implement real asynchronous semantics - under the covers, the ReadFile API adds a read request to a worker queue and starts the I/O associated with reading the data from disk, when the hardware interrupt occurs indicating that the read is complete, the I/O subsystem removes the read request from the worker queue and completes it .
The critical thing to realize is that even for the APIs that do support real asynchronous activity there's STILL synchronous processing going on - you still need to package up the parameters for the operation and add them to a queue somewhere, and that can stall the processor. For most operations it doesn't matter - the time to queue the parameters is sufficiently small that you can perform it on the UI thread.
And sometimes it isn't. It turns out that my favorite API, PlaySound is a great example of this. PlaySound provides asynchronous behavior with the SND_ASYNC flag, but it does a fair amount of work before dispatching the call to a worker thread. Unfortunately, some of the processing done in the application thread can take many milliseconds (especially if this is the first call to winmm.dll).
I originally wrote down the operations that were performed on the application's thread, but then I realized that doing so would cement the behavior for all time, and I don't want to do that. So the following will have to suffice:
In general, PlaySound does the processing necessary to determine the filename (or WAV image) in the application thread and posts the real work (rendering the sound) to a worker thread. That processing is likely to involve synchronous I/Os and registry reads. It may involve searching the path looking for a filename. For SND_RESOURCE, it will also involve reading the resource data from the specified module.
Because of this processing, it's possible for the PlaySound(..., SND_ASYNC) operation to take several hundred milliseconds (and we've seen it take as long as several seconds if the current directory is located on an unreliable network). As a result, even the SND_ASYNC version of the PlaySound API should be avoided on UI threads.
 I bet most of you didn't know that the C language definition strictly defines an abstract machine on which the language operates.
 Yes, I know about the OpenMP extensions to C/C++, they don't change this scenario.
 I know that this is a grotesque simplification of the actual process.
 For those that are now scoffing: "What a piece of junk - why on earth would you even bother doing the SND_ASYNC if you're not going to really be asynchronous", I'll counter that the actual rendering of the audio samples for many sounds takes several seconds. The SND_ASYNC flag moves all the actual audio rendering off the application's thread to a worker thread, so it can result in a significant improvement in performance.
There's lots of "exceptions" like this, in and outside the Windows SDK. In .NET, WebRequest.BeginGetResponse is another example. Occasionally this method will block as the DNS lookup is not performed asynchronously.
While people can complain about these oversights, it's still caveat emptor (unless there's a Latin world for programmer).
Nitpick (apologies): Re your point about C++'s abstract machine (I know too little about C99 to comment on it) -- it's single-threaded, and as far as I'm aware defines no semantics for multi-threaded code. The C++ standardisation committee are working on addressing this -- see the papers on concurrency under http://www.open-std.org/jtc1/sc22/wg21/docs/papers/
Not a real nitpick, it's valid. Someone else pointed this out as well. The C/C++ abstract machine actually is mute about concurrancy, which means that it's implementation defined.
> the C language definition strictly defines an abstract machine
> on which the language operates
It only defines a few characteristics of the abstract machine. Most characteristics are still left unspecified.
For example when a programmer starts using an unfamiliar machine, traditionally they started by investigating the word size, byte size, whether the thing was 2's complement or 1's complement[*], whether the hardware was aware of some human language character encoding, etc. Of course in modern times most of these are irrelevant since everyone knows that the word size is 16[**], the byte size is 8, the thing is 2's complement, and the human language character encoding is the same as your own language except if your own language isn't English. The C standard doesn't define any of these. The C standard still allows a word size of 36 and byte size of 9 like the second machine that K&R mentioned. The C standard still halfheartedly tries to allow 1's complement but the stdio section kind of breaks it.
[* Or in antiquity, signed magnitude, or BCD, or other. By the way the C standard still allows signed magnitude too, but not BCD.]
[** Even in x64 Windows, WORD is still 16 bits.]
> The C/C++ abstract machine actually is mute about concurrancy,
> which means that it's implementation defined.
Not from the standard's point of view. Implementations can freely define whatever extensions they wish, but the standard doesn't require extensions. The standard only states a few things as being implementation defined. Concurrency is not among them.
I think it would be possible for an implementation to abuse the combination of volatile and const in order to build a concurrency model that almost, but not quite, unlikes a violation of the C standard. But for practical purposes, implementations of concurrency pretty much ignore the C standard. Actually for practical purposes implementations have to ignore some simpler parts of the C standard too.
Way off topic I know, but is there any further information about this C/C++ abstract machine I can read up on?
s/, unlikes/, entirely unlikes/
If Doug Adams were still with us he'd ignore me for that.
While I'm not a Windows programmer by any means, isn't this entire issue a matter of how you define what you mean by _ASYNC? You're basically saying that there are a certain set of operations that the function call will do synchronously, and a certain set that aren't. Wouldn't the problem be solved quite easily* by performing the registry and file-space operations asynchronously?
* Yes, this is yet again an oversimplification since the ever-present backwards compatibility giant looms in the background.
- First time poster Sash.
PS: Great work with your articles! Although I'm not a Windows programmer, I'm a big fan of software design in general, and your work is almost always enlightening.
> but is there any further information about this C/C++ abstract machine I can read up on?
1. Get yourself a copy of the standards (C99 and C++03) ... were $18 for a PDF each, maybe a little more now. Start at www.ansi.org.
2. Hang about on the standardisation newsgroups (comp.std.c and comp.std.c++).