The WAV format is arguably the most basic of sound formats out there. It was developed by Microsoft and IBM, and it is rather loosely defined. As a result, there are a lot of WAV files out there that theoretically should not work, but somehow do.
Even if you do not plan to work with WAV data directly, future entries in this series will use the DirectSound library which has direct analogs to elements of the wave file, so it’s important to understand what all of this means.
WAV files are usually stored uncompressed, which means that they can get quite large, but they cannot exceed 4 gigabytes due to the fact that the file size header field is a 32-bit unsigned integer (32 bit file length means a maximum of 4 gigs).
WAV files are stored in a binary format. They are comprised of chunks, where each chunk tells you something about the data in the file.
Here are a couple of quick links that describe the WAV file format in detail. If you are doing work with the WAV format, you should bookmark these:
Let’s dig in a little deeper to how WAV works.
A chunk is used to represent certain metadata (or actual data) regarding the file. Each chunk serves a specific purpose and is structured very specifically (order matters).
Note: There are a lot of available chunks that you can use to accomplish different things, but not all of them are required to be in a wave file. To limit the amount of stuff you have to absorb, we’ll only be looking at the header and the two chunks you need to create a functioning WAV file.
Also, every chunk (including the file header) starts with four characters (byte) that define what’s coming called sGroupID. We’ll see more about this… now.
While not really a chunk according to the WAV spec, the header is what comes first in every WAV file. Here is what the header looks like:
The Format chunk is the metadata chunk. It specifies many of the things we talked about in part 1 of this series, such as the sample rate, bit depth, number of channels, and more.
Before we look at the format chunk structure, let’s run through some definitions in gory detail.
Now that we know what all these things mean (or at least, you can scroll up and read them when you need to), let’s dive into the format chunk’s structure.
As I mentioned, this chunk gives you pretty much everything you need to specify the wave format. When we look at DirectSound later, we will see the same fields to describe the wave format.
Now, let’s look at the data chunk, my favorite chunk of them all!
The data chunk is really simple. It’s got the sGroupID, the length of the data and the data itself. Depending on your chosen bit depth, the data type of the array will vary.
For the sample data, it’s important to note that each element of the data array is a signed value – it can be negative, positive or zero. The range of each of these elements is important too.
Since each sample represents amplitude, and we are working with signed values, we have to consider what is our minimum and maximum amplitude given the data type we’ve chosen. Due to some crazy business involving Endianness and 2’s complement, 16-bit samples range from -32760 to 32760 instead of -32768 to 32768 (2^16 / 2). We don’t worry about this with 8-bit data, because 8 bits is only one byte and Endianness is not an issue, nor with 32-bit data because it is represented as a proper float from -1.0f to 1.0f.
Here are the value ranges for each data type:
Writing the wave file is as easy as constructing the Header, Format Chunk and Data Chunk and writing them in binary fashion to a file in that order.
There are a couple of caveats. Firstly, determining the dwChunkSize for the format and data chunks can be weird because you have to sum up the byte count for each field and then use that result as your chunk size. You can do this by using the .NET Framework’s BitConverter class to convert each of the fields to a byte array and then retrieve its length, summing up the total byte count for all the counted elements in the chunk. If you get the chunk size wrong, the wave file won’t work because whatever is trying to play it has an incorrect number of bytes to read for that chunk.
The other thing you have to calculate AFTER generating the chunks is the dwFileLength field in the header. This value is equal to the total size in bytes of the entire file minus 8 (including those fields that are ignored when calculating chunk size, such as sGroupID and dwChunkSize). The reason we subtract 8 is because the format dictates that we don’t count the 8 bytes held by the RIFF and WAVE markers in the header. There are some other ways to do this, which we’ll explore in the next article.
The best way to accomplish all these things effectively is to implement a structure or class for the header and the two chunks, which have all the fields defined properly with the data types shown above. Put methods on the chunk class that will calculate that chunk’s dwChunkSize as well as the total size of the chunk including sGroupID and dwChunkSize. Then, add the total chunk size of the data & format chunks, then add 4 (the size of the header chunk, since we only count dwFileLength). Assign that value to dwFileLength and you are golden.
What, no example code? I know, I know. That’s what the next blog post is for, so stay tuned. In the next post, you’ll learn how to use all this knowledge to write a C# program that synthesizes audio, generates the samples, and writes the results to a working WAV file that you can listen to!
Currently Playing: Silversun Pickups – Swoon – The Royal We