Welcome to MSDN Blogs Sign in | Join | Help

Below is a presentation I just did for our Microsoft S2B Fall Webcast Series.

You can download the presentation and reuse it if you like. Please give credit where credit is due :)

Stock photography from sxc.hu

Check this out:

image

Maybe some more tweaks left, but it’s pretty much done.

Notable tweaks:

  • Each generator is a WPF usercontrol. These synths are added dynamically at runtime. (You could add ten if you wanted, but three is a pretty standard number).
  • Audio format changed from 2 channels to 1. It’s just less complicated.
  • Some minor architectural cleanup.

I’ll be going over everything in this app in an upcoming Coding4Fun article. Stay tuned.

 

Now playing: Dire Straits – Sultan of Swing

All RIGHT! Today is 9/17 and my Zune HD is out for delivery. Naturally I’ve already started making a game for the thing. Well, actually, I’ve started writing the blog post – i’ll keep the post going as I build the game.

Because I don’t have a lot of brainpower left for creativity, I’m going to make the first app anyone builds for a device with an accelerometer, which is the one where you move the ball through a maze and try to avoid going down a hole. Now there are not many kid-friendly titles that you can make with the words “balls” and “hole,” so I am going to tentatively call this game Inertia.

The first thing I’m going to do is make the game project in Visual C# 2008 Express.

Prerequisites (install in this order)

 

Creating The Zune HD Game Project

  1. Open Visual C# Express 2008
  2. File –> New Project –> Zune Game (3.1) (as shown)
    image

OK, so that was really easy. Now I’m going to create some content.

Making Content

I’ll make a ball, and a background. My thought process at this point is how am I going to make the maze? And the answer to that is to make a tile-based game. Tiles are squares, and the Zune HD’s screen resolution is 480x272, both of which have a common factor of 16. So my tile grid will be 30 tiles by 17 tiles. Kind of an odd number but OK.

First I’m going to use a graphic design tool to make a background of some sort. Maybe something cool and icy. I’ll make a new folder called Textures under the Content folder and save it as background.png.

background

Next, I’ll make a ball. This should be really easy. It has to fit in any given tile at any time, so it has to be less than 16 by 16. Let’s make it 14x14 just so it stays an even number.

 image

Now I am going to make a single block which I will use as my walls. I will use transparency to make this less ugly in the game.

block

Remember that we have a grid of 30 by 17 tiles. So let’s mock up a quick level, which we will make into a readable format later. I printed off my grid of 30x17 and penciled in a quick level. The X’s represent holes. The S and E are start and end tiles. I will actually rotate this 90 degrees so that it’s easier to draw on the screen and it overlays the background nicely.

image image

Next I am going to put this into a text format that can be parsed by a content processor. I will use certain characters to denote elements for the tiles:

  • _ : blank space
  • X : hole
  • * : block
  • S : start
  • E : end

The result looks like this:

image

I saved it as Level1.dat under my content folder in a subfolder called Levels.

In the next post we’ll start bringing a lot of this content together. But before we do that I want to do one last thing: create a Windows copy of the project so we don’t have to deploy to Zune all the time.

Creating a Windows Copy of the Game

The purpose of the Window copy is really to verify layout and other things that are not device dependent. Obviously we can’t very well test playability on a PC, but this will definitely come in handy.

  • Right-click the Inertia Zune project (not the solution) and choose Create Copy of Project for Windows.
  • Go to Game1.cs and find the constructor for Game1. Add the code on lines 6-8. This will force the Windows game to be the same dimensions as the Zune game.

  •    1: public Game1()
       2: {
       3:     graphics = new GraphicsDeviceManager(this);
       4:     Content.RootDirectory = "Content";
       5:  
       6:     graphics.PreferredBackBufferHeight = 480;
       7:     graphics.PreferredBackBufferWidth = 272;
       8:     graphics.ApplyChanges();
       9:  
      10:     // Frame rate is 30 fps by default for Zune.
      11:     TargetElapsedTime = TimeSpan.FromSeconds(1 / 30.0);
      12: }

 

That’s it for now – next post, we’ll make a little more progress on this game.

It’s been a little over 2 and a half years since I joined Microsoft as the Academic Developer Evangelist for the Southeast region. In that time I’ve certainly learned a lot and had a tremendous amount of fun in this job, which I still think is one of the best jobs at Microsoft.

That’s why I’m pleased to announce that I’m continuing in my role as ADE, but moving to the Seattle area to cover the greater Pacific Northwest, Alaska and Hawaii.

So it looks like I’ll be going from this:

image

to this:

image 

This is an exciting move for a number of reasons. First, I love the Pacific Northwest and I have lived in the South all my life. Time for a change. Secondly, about 95% of my in-laws either live in Hawaii or somewhere on the west coast… with another kiddo on the way it will be tremendously helpful to have them close by. And third, I’ve really missed being able to walk into people’s offices, go to lunch with coworkers and do all the usual corporate stuff.

I feel like I have so much to learn and a lot of new challenges to address – I’m incredibly excited.

As for this blog, which has admittedly become a little stale in recent months – I haven’t decided what to do with it yet. I’ll have more news on that as I go through the planning stages of transitioning across the country.

I know a lot of you out there are Seattle area residents and hopefully one of these days we’ll get to meet.

That’s it for now – I’ll post again as I have more information!

In the last article, we saw how to synthesize a sine wave at 440Hz and save it to a working WAV file. Next, we’ll expand on that application and learn how to implement some other common waveforms.

image

First of all, I copied WaveFun from the last article as a foundation to work from and called it WaveFun2. The UI provides a few more configuration options, as shown here to the right.

>> DOWNLOAD THE DELICIOUS CODE HERE (21 KB) <<

You can now specify where you want the file to go, as well as the frequency and volume of the waveform. The code has been lightly refactored to support feeding these values from the UI. I’m not going to go into great detail on these changes in this post, but I will show you how to generate the waves.

You can pick from five total waveforms: Sine, Square, Sawtooth, Triangle and White Noise. These are very common waveforms used in synthesizing music.

Let’s learn more about these different waveforms.

Also, for the sine and square waves, we are using a variable t that represents the angular frequency (basically, the “tone” frequency in radians and adjusted for sample rate).

Sine

We have already done some work around the sine wave. It is the smoothest sounding tone signal we can produce.

The Shape

A sine wave looks like this:

image

The Equation

Sample = Amplitude * sin(t*i)

where t is the angular frequency and i is a unit of time.

The Algorithm

Generating just one 16-bit channel of a sine wave (mono) is very easy.

for (int i = 0; i < numSamples; i++)
{
data.shortArray[i] = Convert.ToInt16(amplitude * Math.Sin(t * i));
}
 

If you wanted to generate two identical, properly aligned channels of sine data, you have to write the same value twice in a row because channel data is interleaved. Further examples will show this in multichannel format, like below.

for (int i = 0; i < numSamples - 1; i ++)
{
for (int channel = 0; channel < format.wChannels; channel ++)
{
data.shortArray[i + channel] = Convert.ToInt16(amplitude * Math.Sin(t * i));
}
}

Square

The square wave is closely related to the sine wave, although it is not in sine form. The square wave produces a very harsh tone due to the abrupt rises and falloffs in the waveform:

image

The Equation

There are a number of ways to generate square waves, and many of them generate imperfect square waves (especially electronics). Since we are using a very precise program with nice things like for loops, we can generate a square wave that is absolutely perfect.

The equation we will be using is:

Sample = Amplitude * sgn(sin(t * i))

In this case the sgn function just tells us whether the value of the sine function is positive, negative, or zero.

The Algorithm

To generate two channels of a square wave:

for (int i = 0; i < numSamples - 1; i++)
{
for (int channel = 0; channel < format.wChannels; channel++)
{
data.shortArray[i] = Convert.ToInt16(amplitude * Math.Sign(Math.Sin(t * i)));
}
}

Sawtooth

The sawtooth wave has tonal qualities somewhere between a sine and a square wave, almost like a saxophone. It ramps up in a linear fashion and then falls off.

image

The Equation

A “proper” sawtooth wave is created with additive synthesis, which we’ll get into later. Multiple sine waves are added together to create harmonics, until eventually the wave takes the shape of the sawtooth (check this nice animation from wikipedia).

Synthesizing a sawtooth in this manner is best done with an algorithm known as a fast Fourier transform, which we won’t get into just yet because it’s kinda complicated, although all kinds of filters and effects are calculated using FFT algorithms. Instead, we’ll be calculating it as if we were plotting a graph, which means we get “infinite” harmonics. This isn’t really a good thing; it will sound less smooth than a sawtooth calculated with a FFT, but whatever.

The equation for a sawtooth wave is sometimes expressed in this way:

y(t) = x – floor(x);

However, we’ll be generating the wave procedurally.

The Algorithm

The algorithm we will be using is a lot like the one we’d use just to plot this wave on a chart.

We have to determine the “step” of the y-coordinate such that we get a nice diagonal line that goes from minimum amplitude to maximum amplitude over one wavelength, and this is based on the frequency of the wave. The “step” is the difference in amplitude between adjacent samples.

image

   1: // Determine the number of samples per wavelength
   2: int samplesPerWavelength = Convert.ToInt32(format.dwSamplesPerSec / (frequency / format.wChannels));
   3:  
   4: // Determine the amplitude step for consecutive samples
   5: short ampStep = Convert.ToInt16((amplitude * 2) / samplesPerWavelength);
   6:  
   7: // Temporary sample value, added to as we go through the loop
   8: short tempSample = (short)-amplitude;
   9:  
  10: // Total number of samples written so we know when to stop
  11: int totalSamplesWritten = 0;
  12:  
  13: while (totalSamplesWritten < numSamples)
  14: {
  15:     tempSample = (short)-amplitude;
  16:  
  17:     for (uint i = 0; i < samplesPerWavelength && totalSamplesWritten < numSamples; i++)
  18:     {
  19:         for (int channel = 0; channel < format.wChannels; channel++)
  20:         {
  21:             tempSample += ampStep;
  22:             data.shortArray[totalSamplesWritten] = tempSample;
  23:  
  24:             totalSamplesWritten++;
  25:         }
  26:     }                        
  27: }


On line 2, we calculate the number of samples in 1 “tooth.”

On line 5, we calculate the amplitude step by taking the total amplitude range (amplitude * 2) and dividing it by the number of samples per saw tooth.

On line 8, we declare a temporary sample to use. This sample has the amplitude step added to it until we get to maximum amplitude.

We put this in a while loop, because our sample data might end in the middle of a sawtooth wave and we don’t want to go out of bounds.

On line 15, we reset the sample to the minimum amplitude (this happens before each saw tooth is calculated).

We use the same structure – two for loops – for this algorithm. We also add checks to make sure we’re not at the end of the sample data yet.

On line 21 we increment the temporary sample value by the amplitude step and assign it to the data array.

Triangle

The triangle wave is like the sawtooth wave, but instead of having a sharp falloff between wavelengths, the amplitude rises and falls in a smooth linear fashion.

The triangle wave produces a slightly more coarse tone than the sine wave.

image

The Formula

… again, uses a fast Fourier transform to synthesize from a sine wave using odd harmonics. So let’s just create one the easy way!

The Algorithm

The algorithm we use is similar to that of the sawtooth wave shown above. All we are doing is changing the sign of the ampStep variable whenever the current sample (absolute value) is greater than the specified amplitude. So when the wave reaches the max and min, the step changes its sign so the wave goes the other way.

for (int i = 0; i < numSamples - 1; i++)
{
for (int channel = 0; channel < format.wChannels; channel++)
{
// Negate ampstep whenever it hits the amplitude boundary
if (Math.Abs(tempSample) > amplitude)
ampStep = (short)-ampStep;

tempSample += ampStep;
data.shortArray[i + channel] = tempSample;
}
}

White Noise

Generating white noise is probably the easiest of them all. White noise consists of totally random samples. Interestingly, white noise is sometimes used to generate random numbers.

image

The Equation

… There isn’t one.

The Algorithm

All you need to do is randomize every sample from –amplitude to amplitude. We don’t care about the number of channels in most cases so we just fill every sample with a new random number.

Random rnd = new Random();
short randomValue = 0;

for (int i = 0; i < numSamples; i++)
{
randomValue = Convert.ToInt16(rnd.Next(-amplitude, amplitude));
data.shortArray[i] = randomValue;
}

Conclusion

These are just some examples of waves that you can use to create interesting sounds. Next, we are going to learn to combine them to create more complex waveforms.

Currently Playing: Black Label Society – 1919 Eternal – Demise of Sanity

If you’ve been following this series, you’re probably thinking, “Finally! He is going to show us some code!”

Well, I hate to disappoint you. So I’ll go ahead and show some code.

We’ve already discussed how audio is represented and what the WAV format looks like. The time has come to put these concepts into practice.

WaveFun! Wave Generator

image The app we will be building is really not all that flashy. It just generates a simple waveform with 1 second of audio and plays it back for you. Nothing configurable or anything like that. Trust me, though, the flashy stuff is coming.

>> DOWNLOAD THE TASTY CODE HERE (18.8 KB) <<

If one giant button on a form isn’t the pinnacle of UI design, I have no idea what to do in this world.

Anyway, this is what the structure of the app looks like:

image

Chunk Wrappers (Chunks.cs)

(oooh, delicious!)

The first thing we care about is Chunks.cs, which contains wrappers for the header and two chunks that we learned about in the last article.

Let’s look at the code for the WaveHeader wrapper class. Note the data types we use instead of just “int.” The strings will be converted to character arrays later when we write the file. If you don’t convert them, you get end-of-string characters that ruin your file. dwFileLength is initialized to zero, but is determined later (retroactively) after we have written the stream and we know how long the file is.

public class WaveHeader
{
    public string sGroupID; // RIFF
    public uint dwFileLength; // total file length minus 8, which is taken up by RIFF
    public string sRiffType; // always WAVE
 
    /// <summary>
    /// Initializes a WaveHeader object with the default values.
    /// </summary>
    public WaveHeader()
    {
        dwFileLength = 0;
        sGroupID = "RIFF";
        sRiffType = "WAVE";
    }
}

Next up is the code for the Format chunk wrapper class. Again, note that the datatypes are consistent with the wave file format spec. Also note that we can explicitly set the chunk size in the constructor to 16 bytes, because the size of this chunk never changes (just add up the number of bytes taken up by each field, you get 16).

public class WaveFormatChunk
{
    public string sChunkID;         // Four bytes: "fmt "
    public uint dwChunkSize;        // Length of header in bytes
    public ushort wFormatTag;       // 1 (MS PCM)
    public ushort wChannels;        // Number of channels
    public uint dwSamplesPerSec;    // Frequency of the audio in Hz... 44100
    public uint dwAvgBytesPerSec;   // for estimating RAM allocation
    public ushort wBlockAlign;      // sample frame size, in bytes
    public ushort wBitsPerSample;    // bits per sample
 
    /// <summary>
    /// Initializes a format chunk with the following properties:
    /// Sample rate: 44100 Hz
    /// Channels: Stereo
    /// Bit depth: 16-bit
    /// </summary>
    public WaveFormatChunk()
    {
        sChunkID = "fmt ";
        dwChunkSize = 16;
        wFormatTag = 1;
        wChannels = 2;
        dwSamplesPerSec = 44100;
        wBitsPerSample = 16;
        wBlockAlign = (ushort)(wChannels * (wBitsPerSample / 8));
        dwAvgBytesPerSec = dwSamplesPerSec * wBlockAlign;            
    }
}

Finally, let’s have a look at the wrapper for the Data chunk. Here, we use an array of shorts because we have 16-bit samples as specified in the format block. If you want to change to 8-bit audio, use an array of bytes. If you want to use 32-bit audio, use an array of floats. dwChunkSize is initialized to zero and is determined after the wave data is generated, when we know how long the array is and what the bit depth is.

public class WaveDataChunk
{
    public string sChunkID;     // "data"
    public uint dwChunkSize;    // Length of header in bytes
    public short[] shortArray;  // 8-bit audio
 
    /// <summary>
    /// Initializes a new data chunk with default values.
    /// </summary>
    public WaveDataChunk()
    {
        shortArray = new short[0];
        dwChunkSize = 0;
        sChunkID = "data";
    }   
}

Now we have all the tools we need to assemble a wave file!

The Wave Generator (WaveGenerator.cs)

This class does two things. It has a constructor, which instantiates all these chunks and then uses a very simple algorithm to generate sample data for a sine wave oscillating at 440Hz. This results in an audible pitch known as Concert A.

In this file, we have an enum called WaveExampleType, which is used to identify what kind of wave we want to create. Feel free to create your own and modify the “big switch statement” to add different sound wave options.

public enum WaveExampleType
{
ExampleSineWave = 0
}

The WaveGenerator class only has three members, and they are all chunks.

public class WaveGenerator
{
// Header, Format, Data chunks
WaveHeader header;
WaveFormatChunk format;
WaveDataChunk data;

/// <snip>
}


The constructor of the WaveGenerator class takes in an argument of type WaveExampleType, which we switch on to determine what kind of wave to generate. Lots of stuff happens in the constructor, so I’ll use line numbers here to refer to after the jump.

   1: public WaveGenerator(WaveExampleType type)
   2: {          
   3:     // Init chunks
   4:     header = new WaveHeader();
   5:     format = new WaveFormatChunk();
   6:     data = new WaveDataChunk();            
   7:  
   8:     // Fill the data array with sample data
   9:     switch (type)
  10:     {
  11:         case WaveExampleType.ExampleSineWave:
  12:  
  13:             // Number of samples = sample rate * channels * bytes per sample
  14:             uint numSamples = format.dwSamplesPerSec * format.wChannels;
  15:             
  16:             // Initialize the 16-bit array
  17:             data.shortArray = new short[numSamples];
  18:  
  19:             int amplitude = 32760;  // Max amplitude for 16-bit audio
  20:             double freq = 440.0f;   // Concert A: 440Hz
  21:  
  22:             // The "angle" used in the function, adjusted for the number of channels and sample rate.
  23:             // This value is like the period of the wave.
  24:             double t = (Math.PI * 2 * freq) / (format.dwSamplesPerSec * format.wChannels);
  25:  
  26:             for (uint i = 0; i < numSamples - 1; i++)
  27:             {
  28:                 // Fill with a simple sine wave at max amplitude
  29:                 for (int channel = 0; channel < format.wChannels; channel++)
  30:                 {
  31:                     data.shortArray[i + channel] = Convert.ToInt16(amplitude * Math.Sin(t * i));
  32:                 }                        
  33:             }
  34:  
  35:             // Calculate data chunk size in bytes
  36:             data.dwChunkSize = (uint)(data.shortArray.Length * (format.wBitsPerSample / 8));
  37:  
  38:             break;
  39:     }          
  40: }

Lines 4-6 instantiate the chunks.

On line 9, we switch on the wave type. This gives us an opportunity to try different things without breaking stuff that works, which I encourage you to do.

On line 14, we calculate the size of the data array. This is calculated by multiplying the sample rate and channel count together. In our case, we have 44100 samples and 2 channels of data , giving us an array of length 88,200.

Line 19 specifies an important value: 32760 is the max amplitude for 16-bit audio. I discussed this in the second article. As an aside, the samples will range from -32760 to 32760; the negative values are provided by the fact that the sine function’s output ranges from -1.0 to 1.0. For other nonperiodic functions you may have to specify -32760 as your lower bound instead of zero – we’ll see this in action in a future article.

Line 20 specifies the frequency of the sound. 440Hz is concert A. You can use any other pitch you want – check out this awesome table for a handy reference.

On line 24, we are doing a little fun trig. See this article if you want to understand the math, otherwise just use this formula and love it.

Line 26 is where the magic happens! The structure of this nested for loop can change. It works for 1 or 2 channels – anything beyond that and you would need to change the condition in the topmost loop (i < numSamples – 1) lest you get an out of memory error.

It’s important to note how multichannel data is written. For WAV files, data is written in an interleaved manner. The sample at each time point is written to all the channels first before advancing to the next time. So shortArray[0] would be the sample in channel 1, and shortArray[1] would be the exact same sample in channel 2. That’s why we have a nested loop.

On line 31, we use Math.Sin to generate the sample data based on the “angle” (t) and the current time (i). This value is written once for each channel before “i” is incremented.

On line 36, we set the chunk size of the data chunk. Most other chunks know how to do this themselves, but because the chunks are independent, the data chunk does not know what the bitrate is (it’s stored in the format chunk). So we set that value directly. The reason we need the bit rate is that the chunk size is stored in bytes, and each sample takes two bytes. Therefore we are setting the data chunk size to the array length times the number of bytes in a sample (2).

At this point, all of our chunks have the correct values and we are ready to write the chunks to a stream. This is where the Save method comes in.

Again, I’ll use line numbers to refer to the Save method below.

   1: public void Save(string filePath)
   2: {
   3:     // Create a file (it always overwrites)
   4:     FileStream fileStream = new FileStream(filePath, FileMode.Create);   
   5:  
   6:     // Use BinaryWriter to write the bytes to the file
   7:     BinaryWriter writer = new BinaryWriter(fileStream);
   8:  
   9:     // Write the header
  10:     writer.Write(header.sGroupID.ToCharArray());
  11:     writer.Write(header.dwFileLength);
  12:     writer.Write(header.sRiffType.ToCharArray());
  13:  
  14:     // Write the format chunk
  15:     writer.Write(format.sChunkID.ToCharArray());
  16:     writer.Write(format.dwChunkSize);
  17:     writer.Write(format.wFormatTag);
  18:     writer.Write(format.wChannels);
  19:     writer.Write(format.dwSamplesPerSec);
  20:     writer.Write(format.dwAvgBytesPerSec);
  21:     writer.Write(format.wBlockAlign);
  22:     writer.Write(format.wBitsPerSample);
  23:  
  24:     // Write the data chunk
  25:     writer.Write(data.sChunkID.ToCharArray());
  26:     writer.Write(data.dwChunkSize);
  27:     foreach (short dataPoint in data.shortArray)
  28:     {
  29:         writer.Write(dataPoint);
  30:     }
  31:  
  32:     writer.Seek(4, SeekOrigin.Begin);
  33:     uint filesize = (uint)writer.BaseStream.Length;
  34:     writer.Write(filesize - 8);
  35:     
  36:     // Clean up
  37:     writer.Close();
  38:     fileStream.Close();            
  39: }

Save takes one argument – a file path. Lines 4-7 set up our file stream and binary writer associated with that stream. The order in which values are written is EXTREMELY IMPORTANT!

Lines 10-12 write the header chunk to the stream. We use the .ToCharArray method on the strings to convert them to actual character / byte arrays. If you don’t do this, your header gets messed up with end-of-string characters.

Lines 15-22 write the format chunk.

Lines 25 and 26 write the first two parts of the data array, and the foreach loop writes out every value of the data array.

Now we know exactly how long the file is, so we have to go back and specify the file length as the second value in the file. The first 4 bytes of the file are taken up with “RIFF" so we seek to byte 4 and write out the total length of the stream that we’ve written, minus 8 (as noted by the spec; we don’t count RIFF or WAVE).

Lastly, we close the streams. Our file is written! And it looks like this:

image

Zoom in a bit to see the awesome sine waviness:

image

All that’s left are the 5 lines of code that initialize the WaveGenerator object, save the file and play it back to you.

Putting it All Together – Main.cs

Let’s look at Main.cs, the codebehind for our main winform.

   1: using System;
   2: using System.Windows.Forms;
   3: using System.Media;
   4:  
   5: namespace WaveFun
   6: {
   7:     public partial class frmMain : Form
   8:     {
   9:         public frmMain()
  10:         {
  11:             InitializeComponent();
  12:         }
  13:  
  14:         private void btnGenerateWave_Click(object sender, EventArgs e)
  15:         {
  16:             string filePath = @"C:\Users\Dan\Desktop\test2.wav";
  17:             WaveGenerator wave = new WaveGenerator(WaveExampleType.ExampleSineWave);
  18:             wave.Save(filePath);            
  19:  
  20:             SoundPlayer player = new SoundPlayer(filePath);               
  21:             player.Play();
  22:         }
  23:     }
  24: }

On line 3, we reference System.Media. We need this namespace to play back our wave file.

Line 14 is the event handler for the Click event of the only huge button on the form.

On line 16, we define the location of the file to be written. IT IS VERY IMPORTANT THAT YOU CHANGE THIS TO A LOCATION THAT WORKS ON YOUR BOX.

Line 17 initializes the wave generator with a sine wave, and line 18 saves it to the location you defined.

Lines 20 and 21 use System.Media.SoundPlayer to play back the wave that we saved.

All Done!

Press F5 to run your program and bask in the glory of a very loud 440Hz sine wave.

Next Steps: If you are a math Jedi, you can experiment with the following code from WaveGenerator.cs:

double t = (Math.PI * 2 * freq) / (format.dwSamplesPerSec * format.wChannels);

for (uint i = 0; i < numSamples - 1; i++)
{
// Fill with a simple sine wave at max amplitude
for (int channel = 0; channel < format.wChannels; channel++)
{
data.shortArray[i + channel] = Convert.ToInt16(amplitude * Math.Sin(t * i));
}
}

Just remember it’s two-channel audio, so you have to write each channel in the frame first before writing the next frame.

In the next article, we’ll look at some algorithms to generate other types of waves.

Currently Playing: Lamb of God – Wrath – Set to Fail

The WAV format is arguably the most basic of sound formats out there. It was developed by Microsoft and IBM, and it is rather loosely defined. As a result, there are a lot of WAV files out there that theoretically should not work, but somehow do.

Even if you do not plan to work with WAV data directly, future entries in this series will use the DirectSound library which has direct analogs to elements of the wave file, so it’s important to understand what all of this means.

WAV files are usually stored uncompressed, which means that they can get quite large, but they cannot exceed 4 gigabytes due to the fact that the file size header field is a 32-bit unsigned integer (32 bit file length means a maximum of 4 gigs).

WAV files are stored in a binary format. They are comprised of chunks, where each chunk tells you something about the data in the file.

Here are a couple of quick links that describe the WAV file format in detail. If you are doing work with the WAV format, you should bookmark these:

Let’s dig in a little deeper to how WAV works.

Chunks

A chunk is used to represent certain metadata (or actual data) regarding the file. Each chunk serves a specific purpose and is structured very specifically (order matters).

Note: There are a lot of available chunks that you can use to accomplish different things, but not all of them are required to be in a wave file. To limit the amount of stuff you have to absorb, we’ll only be looking at the header and the two chunks you need to create a functioning WAV file.

Also, every chunk (including the file header) starts with four characters (byte) that define what’s coming called sGroupID. We’ll see more about this… now.

Header

While not really a chunk according to the WAV spec, the header is what comes first in every WAV file. Here is what the header looks like:

Field Name Size (bytes) C# Data Type Value Description
sGroupID 4 char[4] “RIFF” For WAV files, this value is always RIFF. RIFF stands for Resource Interchange File Format, and is not limited to WAV audio – RIFFs can also hold AVI video.
dwFileLength 32 uint varies The total file size, in bytes, minus 8 (to ignore the text RIFF and WAVE in this header).
sRiffType 4 char[4] “WAVE” For WAV files, this value is always WAVE.

 

Format Chunk

The Format chunk is the metadata chunk. It specifies many of the things we talked about in part 1 of this series, such as the sample rate, bit depth, number of channels, and more.

Before we look at the format chunk structure, let’s run through some definitions in gory detail.

  • Sample – A single, scalar value representing the amplitude of the sound wave in one channel of audio data.
  • Channel – An independent waveform in the audio data. The number of channels is important: one channel is “Mono,” two channels is “Stereo” – there are different waves for the left and right speakers. 5.1 surround sound has 5 channels, one of which is for the lowest sounds and is usually sent to a subwoofer. Again, each channel holds audio data that is independent of all the other channels, although all channels will be the same overall length.
  • Frame – A frame is like a sample, but in multichannel format – it is a snapshot of all the channels at a specific data point.
  • Sampling Rate / Sample Rate – The number of samples (or frames) that exist for each second of data. This field is represented in Hz, or “per second.” For example, CD-quality audio has 44,100 samples per second. A higher sampling rate means higher fidelity audio.
  • Bit Depth / Bits per Sample – The number of bits available for one sample. Common bit depths are 8-bit, 16-bit and 32-bit. A sample is almost always represented by a native data type, such as byte, short, or int. A higher bit depth means each sample can be more precise, resulting in higher fidelity audio.
  • Block Align – This is the number of bytes in a frame. This is calculated by multiplying the number of channels by the number of bytes (not bits) in a sample. To get the number of bytes per sample, we divide the bit depth by 8 (assuming a byte is 8 bits). The resulting formula to calculate block align looks like blockAlign = nChannels * (bitsPerSample / 8). For 16-bit stereo format, this gives you 2 channels * 2 bytes = 4 bytes.
  • Average Bytes per Second – Used mainly to allocate memory, this measurement is equal to sampling rate * block align.

Now that we know what all these things mean (or at least, you can scroll up and read them when you need to), let’s dive into the format chunk’s structure.

Field Name Size (bytes) C# Data Type Value Description
sGroupID 4 char[4] “fmt “ Indicates the format chunk is defined below. Note the single space at the end to fill out the 4 bytes required here.
dwChunkSize 32 uint varies The length of the rest of this chunk, in bytes (not including sGroupID or dwChunkSize).
wFormatTag 16 ushort 1 For WAV files, this value is always 1 and indicates PCM format.
wChannels 16 ushort varies Indicates the number of channels in the audio. 1 for mono, 2 for stereo, etc.
dwSamplesPerSec 32 uint varies The sampling rate for the audio (e.g. 44100, 8000, 96000, depending on what you want).
dwAvgBytesPerSec 32 uint sampleRate * blockAlign The number of multichannel audio frames per second. Used to estimate how much memory is needed to play the file.
wBlockAlign 16 ushort wChannels * (dwBitsPerSample / 8) The number of bytes in a multichannel audio frame.
dwBitsPerSample 32 uint varies The bit depth (bits per sample) of the audio. Usually 8, 16, or 32.

As I mentioned, this chunk gives you pretty much everything you need to specify the wave format. When we look at DirectSound later, we will see the same fields to describe the wave format.

Now, let’s look at the data chunk, my favorite chunk of them all!

Data Chunk

The data chunk is really simple. It’s got the sGroupID, the length of the data and the data itself. Depending on your chosen bit depth, the data type of the array will vary.

Field Name Size C# Data Type Value Description
sGroupID 4 bytes char[4] “data” Indicates the data chunk is coming next.
dwChunkSize 32 bytes uint varies The length of the array below.
sampleData Number of elements in the sample data:
dwSamplesPerSec * wChannels * duration of audio in seconds
byte[] (8-bit audio)
short[] (16-bit audio)
float[] (32-bit audio)
sample data All sample data is stored here.

For the sample data, it’s important to note that each element of the data array is a signed value – it can be negative, positive or zero. The range of each of these elements is important too.

Since each sample represents amplitude, and we are working with signed values, we have to consider what is our minimum and maximum amplitude given the data type we’ve chosen. Due to some crazy business involving Endianness and 2’s complement, 16-bit samples range from -32760 to 32760 instead of -32768 to 32768 (2^16 / 2). We don’t worry about this with 8-bit data, because 8 bits is only one byte and Endianness is not an issue, nor with 32-bit data because it is represented as a proper float from -1.0f to 1.0f.

Here are the value ranges for each data type:

  • 8-bit audio: -128 to 127 (because, again, of 2’s complement)
  • 16-bit audio: -32760 to 32760
  • 32-bit audio: -1.0f to 1.0f

Making a Wave File

Writing the wave file is as easy as constructing the Header, Format Chunk and Data Chunk and writing them in binary fashion to a file in that order.

There are a couple of caveats. Firstly, determining the dwChunkSize for the format and data chunks can be weird because you have to sum up the byte count for each field and then use that result as your chunk size. You can do this by using the .NET Framework’s BitConverter class to convert each of the fields to a byte array and then retrieve its length, summing up the total byte count for all the counted elements in the chunk. If you get the chunk size wrong, the wave file won’t work because whatever is trying to play it has an incorrect number of bytes to read for that chunk.

The other thing you have to calculate AFTER generating the chunks is the dwFileLength field in the header. This value is equal to the total size in bytes of the entire file minus 8 (including those fields that are ignored when calculating chunk size, such as sGroupID and dwChunkSize). The reason we subtract 8 is because the format dictates that we don’t count the 8 bytes held by the RIFF and WAVE markers in the header. There are some other ways to do this, which we’ll explore in the next article.

The best way to accomplish all these things effectively is to implement a structure or class for the header and the two chunks, which have all the fields defined properly with the data types shown above. Put methods on the chunk class that will calculate that chunk’s dwChunkSize as well as the total size of the chunk including sGroupID and dwChunkSize. Then, add the total chunk size of the data & format chunks, then add 4 (the size of the header chunk, since we only count dwFileLength). Assign that value to dwFileLength and you are golden.

That’s It, For Now…

What, no example code? I know, I know. That’s what the next blog post is for, so stay tuned. In the next post, you’ll learn how to use all this knowledge to write a C# program that synthesizes audio, generates the samples, and writes the results to a working WAV file that you can listen to!

Currently Playing: Silversun Pickups – Swoon – The Royal We

The Black Art of Audio Programming

I feel like I’ve got a pretty good handle on most aspects of programming – algorithms, databases, business logic, etc. One area of programming that has always baffled me is audio. I know what sound waves look like, but I never understood how that pretty graph in your audio editor somehow represents a tone, or a song, or what have you.

I was recently presented with a challenge that caused me to dig a little deeper in this area and find out more about how digital sound works.

How Sound Works

Sound in the physical world is pretty straightforward. You are in some kind of medium that can transmit waves, like open air or even water. (Sound cannot exist in places where there is no medium, like a vacuum, i.e. outer space, because there is no air or anything to disturb). When something happens that makes a sound, a longitudinal wave emanates from that place in all directions. This wave causes changes in the pressure in that medium.

So, if I clap in a room full of air, my clap causes these waves of pressure to occur. When those waves hit my eardrums, my brain interprets them as sound.

Check out this article for a more detailed explanation of how sound works in the physical world.

Digital Audio

When you look at an audio file in an audio editor, you see a graph known as a waveform. This represents the entire sound wave for that chunk of audio.

The screenshot below should be a familiar sight. It is part of the waveform for some random song on my hard drive:

image

Note there are two waveforms here. This is because the file has two channels of audio, making it a stereo recording. Mono recordings only have one channel.

If you were to play all of this audio, you would hear part of a song. However, let’s zoom in considerably:

image

Wait, what’s this? It’s just a connected series of points! And if you were to play this tiny, tiny segment of the waveform, you would most likely not hear anything at all, save for perhaps a click or two. What is going on here??

Anatomy of Digital Audio

Each data point you see in that second screenshot is called a sample. The sample is simply the amplitude of the wave at that miniscule point in time.

There are literally thousands of these things in a single second of audio. The number of samples per second in an audio file is known as the sampling rate and usually falls somewhere around 44,100 samples per second (written as 44,100 Hz or 44.1 kilohertz). A higher sampling rate means more data points per second, and consequently, higher fidelity audio. Other common sampling rates are 48 KHz, 96 KHz, 192KHz, 8KHz and anywhere in between. If you are recording audio at 44.1 KHz, you will have 44,100 separate data points for each second of recorded audio, and that is a LOT of data points. This huge abundance of data is very close to capturing continuous data.

So because we zoomed in so far on that second screenshot, you are only hearing the measurement of the wave for a tiny, tiny fraction of a second, which is too fast to be audible anyway. The datapoints themselves are not what make the sound, but rather the overall change in these thousands of datapoints over a much longer time.

Similarly, if your audio was just a singe oscillation of a sine wave, you would not hear anything. You would have to hear many of them together in very rapid succession to hear anything reminiscent of a tone.

This brings me to the topic of frequency.

Fundamental Frequency

Let’s have a look at a sine wave:

image

Look how many oscillations we have here, in only 0.05 seconds of audio!!

The number of oscillations of the wave per second is called the fundamental frequency of the tone. If the sine wave oscillates 440 times in a second, you will hear a pitch widely recognized as ‘Concert A (440 Hz)’. In this same screenshot, if we were looking at a 200Hz wave, the oscillations would be spaced farther apart and the tone would be lower. If we were looking at a 1200Hz wave, the oscillations would be closer together and the tone would be higher.

Each time you see a crest in that wave, it is a high-amplitude, high-air-pressure moment. The faster these high pressure moments come at your ears, the higher the perceived pitch will be. But the valleys are important too, because without the reduction in pressure, there is no change in pressure at all for your eardrums to notice. Because the sine wave is the smoothest curve we can generate, the resulting tone is also the mellowest-sounding to the ears (whereas a square wave is rather harsh, and a sawtooth wave is somewhere in between).

The video below shows how the faster you hear a particular sound repeated, the higher the pitch of the resulting tone becomes.

Explanation of Frequency in Digital Audio from Dan Waters on Vimeo.

Bit Depth

The amplitude of the wave at a given point is usually measured in decibels. However, the range and potential resolution of amplitude is impacted by how fine the measurement of an individual sample is. In an 8-bit waveform, you have 8 bits to define a sample. In 16-bit audio, you have 16 bits to define that same sample, so a much more detailed representation of sound is possible. The higher the bit depth, the better the sound quality (and the larger the resulting file size).

More Than Sine Waves

Obviously, songs are more interesting than just a single sine wave. That’s because when you break down the final mix of a song, it’s still just a wave – it causes your speakers to create very minute and rapid disturbances in the air based on changes in amplitude of the wave.

Multiple Tones

So, let’s take a look at a two-channel mix of a Concert C sine wave (261.25 Hz) and the A below concert C (440 Hz). Together, you hear a very sad-sounding A-minor dyad. Notice the difference in the period of the two waves. Because this is two-channel audio, you hear two sine waves at different frequencies, giving you a dual tone effect.

image

Of course, there aren’t many songs out there that just have one voice in each channel. The instruments are all mixed together and deliberately placed in the left or right side of the mix, and the mixing process takes care of how the audio is weighted across the two stereo channels.

You can add these two waves together just by taking the average of both sine waves at each point. The resulting stereo waveform looks like this, and you hear both frequencies played at the same time in both channels:

image

Conclusion

Now you know the basics of how sound waves work. In the next article, we will explore how to synthesize a sine wave using C# into a WAV file.

Currently Playing: Spin Doctors - Pocket Full Of Kryptonite - Off My Line

HomeGroup is a feature in Windows 7 that allows you to very easily share libraries with other computers on your home network. So flippin’ easy, in fact, that I made a video.

I had never tried setting it up before, so there were a couple of things I learned on the fly:

  • Your Network Location must be set to Home.
  • Domain-joined machines can join a homegroup but cannot share files. They can access files on other computers.
  • Homegroups are searchable and it’s fast.

That’s pretty much it. With all the PCs in my house I can be very confident that I can find whatever I need from whatever machine I choose.

I have been quietly working on a new XNA book to support the upcoming release of XNA Game Studio 3.1 and one of its undocumented features. Most people don’t know that you can actually deploy to other devices besides the Xbox 360, Zune and PC if you have the right driver widgets installed and with a few registry hacks.

As a result I think folks will really enjoy what I have come up with.

9781430218617

That’s right – XNA games on the iPhone! I am probably jumping the gun on this one (sorry XNA Team) but the potential is just too great.

In this book you will learn:

  • How to get your shiny black thing connected to XNA Game Studio on a PC (obviously this will not work on a Mac)
  • How to advantage of the iPhone’s many hardware features such as GPS, vibration, and gyroscopes to develop top-grossing titles
  • How to deploy your game to XBox Live Community Games as well as the App Store
  • How to live in harmony with Mac users if you’re a PC user
  • How to deal with colors other than white if you’re a Mac user

The book is still in the works and numerous legal details are still being sorted out, but the planned print date for this book is Sept 9, 2009. (or, Nein! Nein! Nein!)

This is my presentation at the ITT Super Career Fair, 3.26.2009.

Students don’t just do homework, they multitask. They listen to internet radio. They get up to get soda. They browse the net.

Well, at least I did.

Here’s how Windows 7 can make your multitasking experience a whole lot easier.

It runs great!

I received my first author copies of the book in the mail today and boy am I pleased with how it came out.

lebook

Check it out in stores on Mar 23 or on Amazon.com: http://www.tinyurl.com/zunebook.

All of the sample code is available on the book’s web site: http://www.apress.com/book/view/9781430218616

Have you seen how Internet Explorer 8’s new Web Slices feature allows you to monitor an eBay auction directly from your Favorites toolbar in real time? How about the accelerator that lets you highlight any text on a page and immediately perform an eBay search on that text?

Check out this recent video I made to see how it works.

More Posts Next page »
 
Page view tracker