XAudio2 is Microsoft’s cross-platform game audio API.  XACT, which you may be familiar with from XNA Game Studio, is built on top of XAudio2.

There are some key concepts in XAudio2 that we need to cover before I can into more interesting tips and techniques for working with the API.

The most important concept is the concept of Voices.  Voices in XAudio2 come in three flavors: Source, Submix, and Mastering. 

The Mastering voice (IXAudio2MasteringVoice) represents the final audio output of the API (this is where your audio is sent to your speaker system/sound card/receiver/TV).  There can be one, and only one, mastering voice active in your title at any time.  Since this voice directly relates to the audio hardware, it only makes sense to have just one.  There’s little to do with the mastering voice, other than creating it, and adding it to your audio graph (more on what a graph is soon).  You do need to tell the Mastering Voice how many speakers are set up or connected.

A Source Voice (IXAudio2SourceVoice) represents an actual sound in your game.  This sound can be mono, stereo, 5.1, or anything in between.  There are generally many source voices present in your title at one time.  In the simplest possible case (at least with audible audio content), there is only one source voice.  In the extreme case, you could have thousands of voices, if your title was heavy on audio content, and you created voices in a 1:1 ratio with your audio content files.  My next post will talk about a different (better?) solution, which reuses voices for multiple pieces of audio.

A Submix Voice (IXAudio2SubmixVoice) is conceptually in between the source and mastering voices.  There can be multiple submix voices, or even zero submix voices.  I saved these voices for last, since they are conceptually the most complex.  Submix voices provide several different things, depending on your needs.  You can route one or more source voices into the same submix voice, and the output of your submix voices can go to other submix voices, or to the mastering voice, or both.  Submix voices can be used to group related pieces of audio content.  For instance, you could create a submix voice that represents ‘music’ in your game (as opposed to sound effects or dialogue).  You could then provide an option to modify the volume of all ‘music’ audio at the same time.  The implementation of this might be calling the SetVolume() method on the ‘music’ submix voice.  I’ll cover submix voices in more detail in a future post.

The last topic for this post is the audio graph.  Conceptually, this graph represents the path your audio data takes to get from disk, through your audio pipeline, and out to the speakers.  Generally, you build this graph at load time, and leave it alone for the life of a level (or even the entire game session).  You create the graph by specifying the output for each voice, when you create it.  So you can create a graph where three source voices feed into a single submix voice, which then feeds the mastering voice.  At the same time, you have two other voices that feed directly into the mastering voice.  The set of voices in your title, and the connections between them, form the audio processing graph of your game.

Hopefully this overview of XAudio2 is helpful.  There seems to be a dearth of information regarding game audio on the internet, so I hope to share some of what I learn with the rest of you.