There are a couple new ways to create a SoundEffect in XNA 4.0 - a new factory method that loads wave files and a couple new constructors that take raw PCM audio data.

SoundEffect.FromStream

public static SoundEffect FromStream(Stream stream)

This method takes a plain vanilla Stream to a standard .wav file with PCM audio data. Yep, wav files can contain audio in a lot of different format – PCM, ADPCM, MP3, WMA, whatever. Most wav files in the wild are usually already in PCM format but you can always fire up your favorite audio editor to check/convert the format. Audacity or Wavosaur are both easy to use and free editors.

Here are some useful details -

  • Format constraints - PCM, 8 or 16-bit, 8KHz to 48KHz, mono or stereo.
  • Reads the loop region from the file. If there isn’t one the whole file will loop – only if you play it looped of course.
  • We don’t mess with the seek position of the stream so make sure it’s set to the right place when you pass it in (and reset it when this method returns if you’re going to use it elsewhere in your code).
  • If you’re adding wave files to your content project make sure you set “Build Action” to “Content” and “Copy to Output Directory” to “Copy if Newer”. This will ensure that it doesn’t get compiled to an xnb file and that the file gets deployed with your project.

SoundEffect Constructors

public SoundEffect(byte[] buffer, int sampleRate, AudioChannels channels)
public SoundEffect (byte[] buffer, int offset, int count, int sampleRate, AudioChannels channels, int loopStart, int loopLength)

These constructors are handy for things like creating procedural sound effects, reading from your custom wave bank etc.

  • Format for the audio data is restricted to PCM, 16-bit, 8KHz, 48KHz, mono or stereo.
  • Offset and count are in bytes – offset is where the audio data starts within the buffer and count is the length of audio data.
  • The length of the audio data for playback must be block aligned for the format. This means that -
    • For the first simpler constructor the size of the entire buffer must be block aligned.
    • For the second meatier overload the offset and count values must be block aligned. 
  • Loop start and length are in samples. If you don’t specify any (that is pass in 0 for both) we’ll loop the entire buffer.

The most common questions we get at this point are -

1. If the format is restricted to 16-bit, why do these methods take byte[] and not short[]?

When we first designed this API our first instinct was to do exactly that – any method that took raw audio buffers took short[] and not byte[]. When we looked a little deeper though we realized that this really wouldn’t address some important scenarios. For example, things got really clumsy whenever we wanted these APIs to play nicely with streams. Also when we thought about extensibility especially around formats that we may want to support in the future the special implicit meaning given to the short[] (“array of 16-bit samples”) seemed odd and anti .Net. So after considering all the options we settled on byte[] – for one it plays nice with existing .Net APIs that take raw buffers. It also follows an established and familiar pattern. It also gives us better flexibility for supporting other formats in the future. Within the audio rendering stack everything eventually turns into arrays of bytes anyway. Yeah, there are scenarios where this forces you to do some extra work to do conversions to byte[] (especially true when writing any DSP code) but this is something we could easily optimize in a future release. And taking short[] would not have really solved this anyway.   

2. What exactly is block alignment and how do I calculate it?

In a single sentence, block alignment value is the number of bytes in an atomic unit (aka block) of audio for a particular format. For PCM formats the formula is extremely simple: “Block Alignment = Bytes per Sample * Number of Channels”. For example, block alignment value for mono 16-bit PCM format is 2 and for stereo 16-bit PCM format it’s 4. We have a couple handy helpers that can help calculate the block aligned values – GetSampleSizeInBytes and GetSampleDuration convert from time units to block aligned byte value and back.

3. Why are the loop arguments expressed in samples? Why not in bytes like the offset and count?

Most (all?) audio editors that allow setting loops in audio files do so in samples. In most cases when someone wants to loop a sub-region within a sound they use an audio editor to define it in the content itself. Even compressed formats like XMA on Xbox read the loop values in samples. Ultimately these formats that support sample accurate looping (i.e. smooth loops that play back exactly as authored and don’t pop or glitch) decode to PCM and use these loop sample values during playback (the loop values for non-PCM formats can have their own alignment restrictions but that’s a discussion for a different post). In any case for PCM formats converting from byte value to sample value is incredibly trivial - “Sample Offset = Byte Offset / Block Alignment”. So we optimized for the most common case and loop arguments ended up being in sample units.    

Pseudocode is worth a thousand words

Example 1 – Playback a wave file from the content folder. In this case “Sample.wav” is a sound file that’s included in the content project as “content”.

SoundEffect soundFromFile = SoundEffect.FromStream(TitleContainer.OpenStream(@"Content\Sample.wav"));
soundFromFile.Play();

Example 2 –  Create and playback a second long 48KHz stereo sound.

// Allocate a buffer to hold 1 second worth of stereo data
byte[] buffer = new byte[SoundEffect.GetSampleSizeInBytes(TimeSpan.FromSeconds(1), 48000, AudioChannels.Stereo)];
// Method fills the passed buffer with audio data – left as an exercise to the reader (I’ve always wanted to write that!) :)
FillBuffer(buffer);
// Create the SoundEffect from the buffer
SoundEffect sound = new SoundEffect(buffer, 48000, AudioChannels.Stereo);
// Play a "fire and forget" instance.
sound.Play();

Example 3 – Plays back 500ms from 200ms offset within the buffer.

int bufferLength = SoundEffect.GetSampleSizeInBytes(TimeSpan.FromSeconds(1), 48000, AudioChannels.Stereo);
int offset = SoundEffect.GetSampleSizeInBytes(TimeSpan.FromMilliseconds(200), 48000, AudioChannels.Stereo);
int count = SoundEffect.GetSampleSizeInBytes(TimeSpan.FromMilliseconds(500), 48000, AudioChannels.Stereo);
byte[] buffer = new byte[bufferLength];
FillBuffer(buffer);
SoundEffect sound = new SoundEffect(buffer, offset, count, 48000, AudioChannels.Stereo, 0, 0); 
sound.Play();