I chatted in the past about how audio device alignment requirements impact the buffer size and the WASAPI alignment dance.

There are three alignment requirements on audio buffers:

  1. The buffer size must be a multiple of WAVEFORMATEX.nBlockAlign. This allows individual audio frames to be copied around without worrying about them being cut in half and then having to glue them together at the end.
  2. KSPROPERTY_RTAUDIO_BUFFER must be a multiple of a page - that is, 4096 bytes. This allows multiply mapping the buffer into consecutive pages, which in turn simplifies memory copies where the buffer is the source or the destination. KSPROPERTY_RTAUDIO_BUFFER is for timer-driven streaming; there is an event-driven analog, KSPROPERTY_RTAUDIO_BUFFER_WITH_NOTIFICATION, which has no corresponding alignment requirement.
  3. HD Audio buffer allocations must be a multiple of 256 bytes. For timer-driven buffers, this applies to the whole buffer. For event-driven buffers, this applies to the sum of the "ping" and "pong" buffers, so the individual "ping" or "pong" buffer must be a multiple of 128 bytes.

Consider a 5.1 16-bit 48 kHz stream playing to HD Audio hardware via KSPROPERTY_RTAUDIO_BUFFER. Where multiple alignment requirements apply, the effective alignment requirement is the least common multiple of all the applicable requirements.

From the nBlockAlign requirement, the buffer must be a multiple of (6 * 16) / 8 bytes = 12 bytes.

From the KSPROPERTY_RTAUDIO_BUFFER requirement, the buffer must be a multiple of PAGE_SIZE = 4096 bytes.

From the HD Audio requirement, the buffer must be a multiple of 256 bytes (this is timer-driven, so we do not divide by 2.)

In all, then, the buffer must be a multiple of LCM(12, 4096, 256) = 12288 bytes.

Since WAVEFORMATEX.nAvgBytesPerSec = ((6 * 16) / 8) * 48000 = 576000 byte/sec, this corresponds to 12288 / 576000 * 1000 = 21.333 milliseconds.