Desafinado, Part Four: Rolling Your Own WAV Files

Desafinado, Part Four: Rolling Your Own WAV Files

  • Comments 9

We’ve established why every just about piano in the world -- in fact, every concert-pitched musical instrument in the world -- is slightly out of tune. No one actually plays perfect fifths; every fifth interval is slightly flat. Why don't we hear the difference? Is the difference even perceptible?

It is very hard to hear unless you compare and contrast. So let's do that. Here's a little C# program I just whipped up. This program creates a WAV file that first plays two seconds of E a perfect fifth above concert A=220Hz, and then two seconds of E a slightly flattened fifth above A.

Can you hear the difference? I can't hear the difference between the two E's at all.

However, you can REALLY hear the difference in the next section. The file then plays two seconds of E and a "perfect B" above it together, and then two seconds of E and a "concert B" above it.

Now it is obvious -- with such clean, perfect waves you can really strongly hear it when it goes out of tune. You get a sort of ringing "wah wah wah" effect as the waves go in and out of sync with each other. The number of wahs, or, as piano tuners call them, beats per second tells you how close to a perfect fifth the notes are -- the slower the beats, the more in tune.  Experienced piano tuners can easily hear when the number of beats per second is just right for the piano to be exactly out of tune enough to be evenly tempered.

This code could use some explanation.

The basic WAV file format follows the Interchange File Format specification. An IFF file consists of a series of "chunks" where chunks can contain other chunks. Each chunk starts with an eight byte header: four bytes describing the chunk, followed by four bytes giving the size of the chunk (not counting the eight byte header). The header is followed by the given number of bytes of data in a chunk-specific format. A WAV file consists of one main chunk called RIFF that contains three things: the string "WAVE", a "format" chunk that describes the sample rate, etc, and a "data" chunk that contains the sampled waveform.

We won't mess around with any advanced WAV file features like cue points or playlists or compression. We'll just dump out some data and play it with the WAV file player of your choice. We'll use CD quality audio -- 44100 samples per second, each one with 16 bits per sample. (Unlike a CD, we'll do this in mono, not stereo.)

namespace Wave
{
   using System;
   using System.IO;
   class MainClass {
      public static void Main() {
         FileStream stream = new FileStream("test.wav", FileMode.Create);
         BinaryWriter writer = new BinaryWriter(stream);
         int RIFF = 0x46464952;
         int WAVE = 0x45564157;
         int formatChunkSize = 16;
         int headerSize = 8;
         int format = 0x20746D66;
         short formatType = 1;
         short tracks = 1;
         int samplesPerSecond = 44100;
         short bitsPerSample = 16;
         short frameSize = (short)(tracks * ((bitsPerSample + 7)/8));
         int bytesPerSecond = samplesPerSecond * frameSize;
         int waveSize = 4;
         int data = 0x61746164;
         int samples = 88200 * 4;
         int dataChunkSize = samples * frameSize;
         int fileSize = waveSize + headerSize + formatChunkSize + headerSize + dataChunkSize;
         writer.Write(RIFF);
         writer.Write(fileSize);
         writer.Write(WAVE);
         writer.Write(format);
         writer.Write(formatChunkSize);
         writer.Write(formatType);
         writer.Write(tracks); 
         writer.Write(samplesPerSecond);
         writer.Write(bytesPerSecond);
         writer.Write(frameSize);
         writer.Write(bitsPerSample); 
         writer.Write(data);
         writer.Write(dataChunkSize);
         double aNatural = 220.0;
         double ampl = 10000;
         double perfect = 1.5;
         double concert = 1.498307077;
         double freq = aNatural * perfect;
         for (int i = 0; i < samples / 4; i++) {
            double t = (double)i / (double)samplesPerSecond;
            short s = (short)(ampl * (Math.Sin(t * freq * 2.0 * Math.PI)));
            writer.Write(s);
         }
         freq = aNatural * concert;
         for (int i = 0; i < samples / 4; i++) {
            double t = (double)i / (double)samplesPerSecond;
            short s = (short)(ampl * (Math.Sin(t * freq * 2.0 * Math.PI)));
            writer.Write(s);
         }
         for (int i = 0; i < samples / 4; i++) {
            double t = (double)i / (double)samplesPerSecond;
            short s = (short)(ampl * (Math.Sin(t * freq * 2.0 * Math.PI) + Math.Sin(t * freq * perfect * 2.0 * Math.PI)));
            writer.Write(s);
         }
         for (int i = 0; i < samples / 4; i++) {
            double t = (double)i / (double)samplesPerSecond;
            short s = (short)(ampl * (Math.Sin(t * freq * 2.0 * Math.PI) + Math.Sin(t * freq * concert * 2.0 * Math.PI)));
            writer.Write(s);
         }
         writer.Close();
         stream.Close();
      }
   }
}

Compile this guy up, run it, and listen to test.wav. Pretty cool eh?

Next time, we'll wrap up with one of the most interesting psychological effects you can get in music -- a tone that goes down, but never hits bottom.

  • Awesome set of articles. I've been learning how to play the piano, and these articles relating to music theory have been great. Thank you !

    Steven J. Ackerman, Consultant
    ACS, Sarasota, Florida
    http://www.acscontrol.com
    http://spaces.msn.com/members/sjackerman
  • It's a nit, but the field you're calling "tracks" is conventionally known as the "channel count" - you're producing mono content above, if you wanted to produce stereo, you'd set "tracks" to 2 and generate twice as many samples in the for loops (one for the left channel value and one for the right channel value).

    You can see this structure in the sdk in the file mmsystem.h, it's known as a "waveformatex"
  • What I don't understand is how the WAV decoder knows whether channels=4 means quadrophonic or left-center-right-surround.
  • This topic covers something I've been wondering about for years. It's an amazing universe.

    Here's the same code in VBScript. It also works fine in NS Basic/Desktop.

    Option Explicit
    Dim fso, f
    Set fso=createObject("Scripting.FileSystemObject")
    Set f=fso.CreateTextFile("e:\test.wav", True)

    Dim RIFF: RIFF = &h46464952
    Dim WAVE: WAVE = &h45564157
    Dim formatChunkSize: formatChunkSize = 16
    Dim headerSize: headerSize = 8
    Dim format: format = &h20746D66
    Dim formatType: formatType = 1
    Dim tracks: tracks = 1
    Dim samplesPerSecond: samplesPerSecond = 44100
    Dim bitsPerSample: bitsPerSample = 16
    Dim frameSize: frameSize = int(tracks * ((bitsPerSample + 7)/8))
    Dim bytesPerSecond: bytesPerSecond = samplesPerSecond * frameSize
    Dim waveSize: waveSize = 4
    Dim data: data = &h61746164
    Dim samples: samples = 88200 * 4
    Dim dataChunkSize: dataChunkSize = samples * frameSize
    Dim fileSize: fileSize = waveSize + headerSize + formatChunkSize + headerSize + dataChunkSize
    WriteOut RIFF, 4
    WriteOut fileSize, 4
    WriteOut WAVE, 4
    WriteOut format, 4
    WriteOut formatChunkSize, 4
    WriteOut formatType, 2
    WriteOut tracks, 2
    WriteOut samplesPerSecond, 4
    WriteOut bytesPerSecond, 4
    WriteOut frameSize, 2
    WriteOut bitsPerSample, 2
    WriteOut data, 4
    WriteOut dataChunkSize, 4
    Dim aNatural: aNatural = 220.0
    Dim ampl: ampl = 10000
    Dim perfect: perfect = 1.5
    Dim concert: concert = 1.498307077
    Dim freq: freq = aNatural * perfect
    Dim PI: PI = 4 * atn(1)
    Dim i, t, s
    For i = 0 to samples / 4
    t = i / samplesPerSecond
    s = int(ampl * (Sin(t * freq * 2.0 * PI)))
    WriteOut s, 2
    Next
    freq = aNatural * concert
    For i = 0 to samples / 4
    t = i / samplesPerSecond
    s = int(ampl * (Sin(t * freq * 2.0 * PI)))
    WriteOut s, 2
    Next
    For i = 0 to samples / 4
    t = i / samplesPerSecond
    s = int(ampl * (Sin(t * freq * 2.0 * PI) + Sin(t * freq * perfect * 2.0 * PI)))
    WriteOut s, 2
    Next
    For i = 0 to samples / 4
    t = i / samplesPerSecond
    s = int(ampl * (Sin(t * freq * 2.0 * PI) + Sin(t * freq * concert * 2.0 * PI)))
    WriteOut s, 2
    Next
    f.Close
    MsgBox "Done"
    End Sub

    Sub WriteOut(byval x,n) 'write out an n bit integer
    Dim i,c
    If x<0 Then x=(&hFFFFFFFF+x)+1
    For i=1 to n
    c=abs(x mod 256)
    x=int(x/256)
    f.Write chr(c)
    Next
    end sub
  • If you embed your waveformatex in a waveformatextensible, you can explain what the channels should map to by setting the dwChannelMask.
  • [I've removed an earlier incorrect comment -- I was misremembering the problem]

    Though this solution works for writing out a binary file, I don't recommend it. We designed the FSO to work on text files, not binary files.

    I will write a blog describing some of the potential issues in reading and writing binary files with the text stream at some point. Stay tuned!
  • Eric, could you post the resulting WAV files for this and the next post in the series, since some of us haven't moved to .Net yet!

    Cheers! :)
  • Although posting the WAVs is probably still a good idea, I've found the Java applets posted above and played with those instead.
  • Hi Eric,

    Nice article. Some years later here, but I thought I'd mention this article showing how the various WAV headers look for a specifice 5.1 WAV file:

     http://www.jensign.com/multichannel/multichannelformat.html

    More audio stuff here also:

     http://www.jensign.com/audio.html

Page 1 of 1 (9 items)