I got an email from someone today, paraphrased below:

Q: When I set the sampling frequency to 48 kHz, and ask Windows what the audio period is, I get exactly 10 milliseconds. When I set it to 44.1 kHz, I get very slightly over 10 milliseconds: 10.1587 milliseconds, to be precise. Why?
A: Alignment.

A while back I talked a bit about the WASAPI exclusive-mode alignment dance. Some audio drivers have a requirement that they deal in buffer sizes which are multiples of a certain byte size - for example, a common alignment restriction for HD Audio hardware is 128 bytes.

A more general audio requirement is that buffer sizes be a multiple of the size of a PCM audio frame.
For example, suppose the audio format of a stream is stereo 16-bit integer. A single PCM audio frame will be 2 * 2 = 4 bytes. The first two bytes will be the 16-bit signed integer with the sample value for the left channel; the last two bytes will be the right channel.
As another example, suppose the audio format of a stream is 5.1 32-bit floating point. A single PCM audio frame will be 6 * 4 = 24 bytes. Each of the six channels are a four-byte IEEE floating-point value; the channel order in Windows will be {Left, Right, Center, Low-Frequency Effects, Side Left, Side Right}.

The audio engine tries to run at as close to a 10 millisecond cadence as possible, subject to the two restrictions above. Given a "desired minimum interval" (in milliseconds), and a streaming format, and an "alignment requirement" in bytes, you can calculate the closest achievable interval (without going under the desired interval) as follows:

Note: this only works for uncompressed formats
aligned_buffer(desired_milliseconds, format, alignment_bytes)
    desired_frames = nearest_integer(desired_milliseconds / 1000.0 * format.nSamplesPerSec)
    alignment_frames = least_common_multiple(alignment_bytes, format.nBlockAlign) / format.nBlockAlign
    actual_frames = ceiling(desired_frames / alignment_frames) * alignment_frames
    actual_milliseconds = actual_frames / format.nSamplesPerSec * 1000.0

Here's a table of the actual buffer size (in frames and milliseconds), given a few typical inputs:

Desired (milliseconds) Format Alignment (bytes) Desired frames Alignment (frames) Actual (frames) Actual (milliseconds)
10 44.1 kHz stereo 16-bit integer 128 441 32 448 10.16
10 48 kHz stereo 16-bit integer 128 480 32 480 10
10 44.1 kHz 5.1 16-bit integer 128 441 32 448 10.16
10 48 kHz 5.1 16-bit integer 128 480 32 480 10
10 44.1 kHz 5.1 24-bit integer 128 441 64 448 10.16
10 48 kHz 5.1 24-bit integer 128 480 64 512 10.66

So to be really precise, the buffer size is actually 640/63 = 10.158730 milliseconds.