Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

How long is a WAV file?

How long is a WAV file?

  • Comments 15

One question that kept on coming up during my earlier post was "How long is it going to take to play a .WAV file?".

It turns out that this isn't actually a hard question to answer.  The answer is embedded in the .WAV file if you know where to look.  Just for grins, I spent a few minutes and whipped up a function that will parse a WAV file and return the length of the function.

Remember that a .WAV file is a RIFF file which contains a "WAVE" chunk, the "WAVE" chunk in turn contains two chunks called "fmt " and "data".  The "fmt " chunk contains a WAVEFORMATEX structure that describes the file.  It's roughly based on the "ReversePlay" sample (but I didn't learn about that sample until after I'd written this code :)).

I'm using the built-in multimedia I/O functions, which have the added benefit of being able to parse RIFF files without my having to come up with a ton of code.

#define FOURCC_WAVE mmioFOURCC('W', 'A', 'V', 'E')
#define FOURCC_FMT mmioFOURCC('f', 'm', 't', ' ')
#define FOURCC_DATA mmioFOURCC('d', 'a', 't', 'a')

DWORD CalculateWaveLength(LPTSTR FileName)
{
    MMIOINFO mmioinfo = {0};
    MMCKINFO mmckinfoRIFF = {0};
    MMCKINFO mmckinfoFMT = {0};
    MMCKINFO mmckinfoDATA = {0};
    MMRESULT mmr;
    WAVEFORMATEXTENSIBLE waveFormat = {0};
    HMMIO mmh = mmioOpen(FileName, &mmioinfo, MMIO_DENYNONE | MMIO_READ);
    if (mmh == NULL)
    {
        printf("Unable to open %s: %x\n", FileName, mmioinfo.wErrorRet);
        exit(1);
    }

    mmr = mmioDescend(mmh, &mmckinfoRIFF, NULL, 0);
    if (mmr != MMSYSERR_NOERROR && mmckinfoRIFF.ckid != FOURCC_RIFF)
    {
        printf("Unable to find RIFF section in .WAV file, possible file format error: %x\n", mmr);
        exit(1);
    }
    if (mmckinfoRIFF.fccType != FOURCC_WAVE)
    {
        printf("RIFF file %s is not a WAVE file, possible file format error\n", FileName);
        exit(1);
    }

    // It's a wave file, read the format tag.
    mmckinfoFMT.ckid = FOURCC_FMT;
    mmr = mmioDescend(mmh, &mmckinfoFMT, &mmckinfoRIFF, MMIO_FINDCHUNK);
    if (mmr != MMSYSERR_NOERROR)
    {
        printf("Unable to find FMT section in RIFF file, possible file format error: %x\n", mmr);
        exit(1);
    }
    // The format tag fits into a WAVEFORMAT, so read it in.
    if (mmckinfoFMT.cksize >= sizeof( WAVEFORMAT ))
    {
        // Read the requested size (limit the read to the existing buffer though).
        LONG readLength = mmckinfoFMT.cksize;
        if (mmckinfoFMT.cksize >= sizeof(waveFormat))
        {
            readLength = sizeof(waveFormat);
        }
        if (readLength != mmioRead(mmh, (char *)&waveFormat, readLength))
        {
            printf("Read error reading WAVE format from file\n");
            exit(1);
        }
    }
    if (waveFormat.Format.wFormatTag != WAVE_FORMAT_PCM)
    {
        printf("WAVE file %s is not a PCM format file, it's a %d format file\n", FileName, waveFormat.Format.wFormatTag);
        exit(1);
    }
    // Pop back up a level
    mmr = mmioAscend(mmh, &mmckinfoFMT, 0);
    if (mmr != MMSYSERR_NOERROR)
    {
        printf("Unable to pop up in RIFF file, possible file format error: %x\n", mmr);
        exit(1);
    }

    // Now read the data section.
    mmckinfoDATA.ckid = FOURCC_DATA;
    mmr = mmioDescend(mmh, &mmckinfoDATA, &mmckinfoRIFF, MMIO_FINDCHUNK);
    if (mmr != MMSYSERR_NOERROR)
    {
        printf("Unable to find FMT section in RIFF file, possible file format error: %x\n", mmr);
        exit(1);
    }
    // Close the handle, we're done.
    mmr = mmioClose(mmh, 0);
    //
    // We now have all the info we need to calculate the file size. Use 64bit math
    // to avoid potential rounding issues.
    //
    LONGLONG fileLengthinMS= mmckinfoDATA.cksize * 1000;
    fileLengthinMS /= waveFormat.Format.nAvgBytesPerSec;
    return fileLengthinMS;
}

Essentially this function opens the WAV file specified, finds the RIFF chunk at the beginning, locates the WAVE chunk, then descends into the WAVE chunk.  It locates the "fmt " chunk within the WAVE chunk, reads it into a structure on the stack (making sure that it doesn't overflow the buffer).  It then pops up a level and finds the "data" chunk.  It doesn't bother to read the data chunk, the only thing needed from that is the length of the chunk which is then used to calculate the number of bytes that are occupied by the samples in the WAV file.

Once we have the format of the data, and the number of bytes in the data chunk, it's trivial to figure out how long the sample will take to play.

Btw, please note that this only looks for WAVE_FORMAT_PCM samples - there are other constant bitrate formats that could be supported but I wanted to hard code this to just PCM samples (it IS just a sample program). 

 

To verify that my calculation is correct, I took my function and dropped it into a tiny test harness:

int _tmain(int argc, _TCHAR* argv[])
{
    if (argc != 2)
    {
        printf("Usage: WaveLength <.WAV file name>\n");
        exit(1);
    }
    DWORD waveLengthInMilliseconds = CalculateWaveLength(argv[1]);
    printf("File %S is %d milliseconds long\n", argv[1], waveLengthInMilliseconds);
    DWORD soundStartTime = GetTickCount();
    PlaySound(argv[1], NULL, SND_SYNC);
    DWORD soundStopTime = GetTickCount();
    printf("Playing %S took %d milliseconds actually\n", argv[1], soundStopTime - soundStartTime);
    return 0;
}

 If I run this on some of the Vista sounds, I get:

C:\Users\larryo\Documents\Visual Studio 2005\Projects\WaveLength>debug\WaveLength.exe "c:\Windows\Media\Windows Exclamation.wav"
File c:\Windows\Media\Windows Exclamation.wav is 2020 milliseconds long
Playing c:\Windows\Media\Windows Exclamation.wav took 2281 milliseconds actually

The difference between the actual time and the calculated time is the overhead of the PlaySound API itself.  You can see this by trying it on other .WAV files - there appears to be about 200ms of overhead (on my dev machine) associated with building the audio graph and tearing it down.

  • Hey this is great! One more question though. How long does it take to run an empty loop? I'll just do an empty loop until enough time goes by for the sound to play.

    :)

  • I guess this would be good information to have if you need to display an estimate to the user, but I hope nobody uses it to determine when to free the sample buffer after an asynchronous call.

  • A bit of a minor nitpick, but you're mixing your CHARs, TCHARs, and WCHARs, as Michael Kaplan might say.

    You should explicitly make that wmain instead of main, especially given the use of %S.  Otherwise, somebody's going to have a nasty run-off-the-end-of-the-string-buffer crashy-surprise if they build your sample without /DUNICODE and printf("...%S...") runs off the end of argv[1], since it's looking for a "\0\0" and not a "\0" (Unicode string, not ANSI string).

  • I'll add that non-WAVE_FORMAT_PCM .wav files have a "fact" chunk which contains the length of the stream in frames.  Divide this by the number of frames per second (nSamplesPerSec in the format section) and you have the length of the file in seconds.

  • > The difference between the actual time and the calculated

    > time is the overhead of the PlaySound API itself.

    Sure.  It still means that it's not easy for the calling application to really know when to free the memory.  The caller still has to do polling.

    In Windows 95 OSR2 the difference was even longer.  Sure it was due to a bug and Microsoft developed a fix internally, but Microsoft didn't allow existing customers to get the fix.

    Wednesday, January 10, 2007 7:28 PM by Skywing

    > A bit of a minor nitpick, but you're mixing your CHARs,

    > TCHARs, and WCHARs,

    I thought it wasn't socially acceptable to notice that?  I was thermonucleated a few days ago for noticing things like that.

    > as Michael Kaplan might say.

    Oops.  Does that mean a(nother) thermonuclear civil war will start inside Microsoft?

    > especially given the use of %S

    By the way does that mean that %S works in ordinary Windows versions?  I didn't test it.  I only found it broken in Windows CE and had to make a bunch of calls to MultiByteToWideChar as a workaround.

  • Skywing: He could also have just used _tprintf with an _T("") format arg. But yeah, I prefer to just explicitly use the WCHAR version of things as well, rather than the TCHAR version. Much less error-prone.

  • Shouldn't the cast to LONGLONG take place before the multiply by 1000?

  • Phaeron: Hmm, I think you might be right.  I never remember the order of promotion (whether the multiply happens before the promotion or after).

  • Explicit C-style casts in C and C++ have higher precedence than binary mathematical operators, but lower precedence than unary operations like function calls, the . or -> operators, and the ++ and -- operators. C++ casts (e.g. static_cast<type>(x))and the use of the function-style cast (e.g. LONGLONG(x)) have equal precedence with function calls.

    Promotion in the C and C++ standard means changing a datatype to a larger compatible datatype in order that both operands of a binary operator are the same type. In the case of:

    mmckinfoDATA.ckSize * 1000

    mmckinfoDATA.ckSize is a DWORD (unsigned long) and 1000 is a literal of type int. Therefore 1000 is promoted to an unsigned long and the calculation will be done in 32-bit. If you want 64-bit accuracy you must cast one size of the multiplication to a 64-bit operand.

  • Norman: %S always means 'the opposite of the version of printf() you're calling'. That is, if you call sprintf() it means interpret the argument as a string of WCHAR, while if you call swprintf() it means interpret the argument as a string of CHAR.

    If you want the argument type to be _invariant_ over the use of the UNICODE macro, use %hs for strings that are always CHAR, and %ls for strings that are always WCHAR.

    I never had a problem with %S on Windows CE when I was using it correctly, but I haven't honestly found much use for it.

  • TCHAR is pretty much dead nowadays anyway.  The Win9x product line is toast, and the amount of Win9x users has fallen dramatically in recent years.  I see no reason to go through the headache of TCHAR, _T(), and prepending _tcs* to all of my API calls.  Unicode is the way of the future, and the future is now, as they say.

    It is unfortunate that VS2005 still defaults to TCHAR even today, for the new project wizards (targetting Win32 x86/x64/ia64), given the level of completely unsupported-ness about all of the Microsoft Win9x platforms.  It does default to /DUNICODE, which is a saving grace, but you have all of the ugly (and now supurfluous) TCHAR goo there by default.

  • Thursday, January 11, 2007 9:28 AM by Mike Dimmick

    > Norman: %S always means 'the opposite of the version of

    > printf() you're calling'.

    My question to Skywing wasn't whether it means that, my question was whether it works in ordinary (non-CE) versions of Windows, where I didn't try it.

    In Windows CE it doesn't work.  I had coded a bunch of StringCchPrintf( ... %S ... ) calls, and then had to change them.  Calling MultiByteToWideChar and then using %s worked.

    In one or two of Microsoft's public newsgroups, someone (not a Microsoft employee) pointed out that the MSDN page for the format codes usable by StringCchPrintf doesn't actually say that %S should work so in a way Windows CE's implementation isn't completely broken.  No one pointed out but I notice now,  the way %s makes wide strings work means that in a way Windows CE's implementation is broken (MSDN says %s will take single-byte strings).

    In comparison to that, in maybe the same newsgroup(s) in a thread shortly before that one, a Microsoft employee asserted that ANSI strings are always one byte per character and asserted that %S works even in Windows CE.  Two false assertions in one posting.

    > I never had a problem with %S on Windows CE when I was

    > using it correctly,

    Well sure, I'm sure it is possible to use StringCchPrintf correctly even on some foreign language versions of Windows CE.  It might even work correctly because there are rumours that Microsoft sometimes tests one foreign language version of some Windows products.  But in Japanese Windows CE it's broken.  I called gethostname() and the %S format converted the hostname to the null string instead of to the Unicode version of the hostname.  I had to call WideCharToMultiByte and then use the %s format in order to get the hostname into a displayable message.

    > but I haven't honestly found much use for it.

    Sure, I also honestly only needed it about a dozen times or so, of which one was for the result of gethostname.

    Thursday, January 11, 2007 12:14 PM by Skywing

    > The Win9x product line is toast

    In some ways yes, in some ways no.  There are a lot of countries where an average salary allows the purchase of a used computer for a price equivalent to around 10,000 yen, and Windows XP (including pirated versions) won't run on them.

  • Could these 261 ms come in part from the audio latency introduced by the buffersize of the audio card?

  • > Could these 261 ms come in part from the audio latency introduced by the buffersize of the audio card?

    No, that latency would be introduced downstream and wouldn't affect the return of the PlaySound call.

  • If you continue to use TCHAR and attendant warts, you're insulated from the future transition to UTF-32. :-)

Page 1 of 1 (15 items)