Extracting Metadata from Windows Media files

Extracting Metadata from Windows Media files

  • Comments 19

In this article, I'll describe how to use the Windows Media Format SDK to access the metadata embedded in Windows Media files for cataloguing purposes. Also included is two managed classes written in C# that vastly simplify the usage of this SDK.

Download MediaCatalog 1.0 (35KB)

Introduction

Over the last year, I've been gradually filling a spare hard disk with rips of all my CDs. It's fantastic to be able to play any CD from my catalogue so easily, and it means I can hide the CDs themselves away from my young daughter's sticky fingers! The problem is that as my digital collection has accumulated, it's getting harder to see what I've got. I've painstakingly tagged all my CDs with metadata, but Windows doesn't currently provide any easy mechanism to sort or manipulate that metadata. So I thought I'd follow Duncan Mackenzie's example and hack together a media cataloguing application.

The trouble is that it's difficult to extract the metadata from a Windows Media file. The Windows Media Player SDK provides a nice interop library you can use to embed Windows Media Player in your managed application and drive it programmatically, but I definitely wanted to avoid driving GUI controls, given the number of files I want to catalogue. Instead, I fired up MSDN Library and discovered the Windows Media Format SDK, a low-level API into the file format itself.

This SDK isn't easy to program against from managed code, however - it's pretty grungy COM interop. Fortunately, with the aid of MSDN, Adam Nathan's .NET and COM book and a quick look at some pretty dodgy samples, I was able to build a fairly clean managed wrapper that provides a straightforward interface into the SDK. Ironically, I haven't finished writing the graphical front-end catalogue application that generated the itch in the first place, but I thought the managed library was interesting enough in its own right to share.

Using MediaCatalog

I've divided up the managed library into a high-level API and a low-level API. The low-level API is a class that allows you to open a media file, examine the attributes by index or name, and enumerate through them using a foreach loop. The high-level API abstracts the previous class and provides methods to allow recursive or non-recursive iteration through a database structure, creating a strongly-typed DataSet object that contains all the most common attributes in the audio files it finds. You could bind the output to a Windows Forms DataGrid, for example, and indeed the sample test harness included with the code does exactly that.

Low-Level API

To access the low-level API, you instantiate an object of type MetadataEditor, passing the constructor the filename of the media file you're interested in. You can then either enumerate through the object using a foreach statement, or query it by name or using an indexer. The object supports an int-based indexer or alternatively an enum-based indexer that simplifies access using common attributes. The following C# code sample demonstrates each of these choices.

   using (MetadataEditor md = new MetadataEditor("britney.wma"))
   {
      // Enumerate through each of the attributes in the file
      foreach(Attribute attr in md)
      {
         Console.Write(attr.Name);
         Console.Write(": ");
         Console.WriteLine(attr.Value.ToString());
      }
      
      // Set author to be the bitrate of the media file
      string author = md.GetAttributeByName("ID3/TPE1");
      // Set d to the duration of the media file (e.g. 3m 45s)
      TimeSpan d = md[MediaMetadata.Duration];
   }

Remember that since this object uses unmanaged resources, it's important to call the MetadataEditor.Dispose() method when you've finished using it in order to close the underlying resources. Alternatively wrap it inside a using statement as demonstrated above.

High-Level API

This API contains three main methods that can be used to extract album information across multiple directories if necessary:

Method Description
RetrieveTrackInfo Retrieves structured property information for the given media file. Returns a TrackInfo object containing commonly-used fields.
RetrieveSingleDirectoryInfo Retrieves media information for a single directory. Returns a MediaData object (a strongly-typed DataSet)
RetrieveRecursiveDirectoryInfo Recursively trawls through a directory structure for media files, using them to build a DataSet of media metadata. Returns a MediaData object.

As a quick example, the following C# code snippet binds a Windows Forms DataGrid to the output of RetrieveRecursiveDirectoryInfo:

   MediaDataManager mdm = new MediaDataManager();
   musicData = mdm.RetrieveRecursiveDirectoryInfo(@"\\timserver\music");
   mediaInfo.DataSource = musicData;
   mediaInfo.DataMember = "Track";

The MediaDataManager object also exposes an event that can be used to track progress (particularly useful during a long recursive directory search). Use the following syntax to enable it:

   mdm.TrackAdded += new MediaDataManager.TrackAddedEventHandler(mdm_TrackAdded);

Things To Do

The wrappers aren't complete by any means, and I'd love to hear your suggestions of how they might be improved (or even some code!). Several things on my own personal list:

  • Improve the intuitiveness of some of the class names
  • Add setters for the attributes to allow metadata to be modified
  • Add greater flexibility to the recursive searches to allow them to execute on a background thread
  • Write a decent cataloguing engine that takes advantage of the tools!
Attachment: mediacatalog10.zip
  • LSN WebLog » getting meta data from WMA files
  • Hello.
    I have a very large list of links to WMV streams, however, the files are very large and there's great number of them, and I need a way to extract meta information from them without full download, sort of like loading Media Player and it gets meta information in the first few K.
    Do you have any idea how to implement this?
    The key assumption here is that we don't have full files. They are not downloaded.
  • Great article but the link to the Windows Media Format SDK doesn't work. I arrived here b/c I'm looking to download that SDK b/c according to Microsoft it includes a sample file for accessing and editing WM file metadata. However, I can't find the SDK to download. Has it been renamed and bundled with the Windows Media 9 Series SDK?
  • The WMFSDK can be found at http://msdn.microsoft.com/windowsmedia/downloads/default.aspx
  • hmmm when trying to compile this i get errors saying that the type Mediadata cant be found. what is this.
  • hi,
    i want extract the time line of .wma file that means i want get time of currently running file in ms or ns how can do that
  • hi, your code was quite interesting, but i didn't manage to retrieve WM/VideoHeight and WM/VideoWidth Attributes from various wmv files. always caught on ASF_E_NOTFOUND. Obviously the video resolution isn't stored as an attribute? ... any comment on this issue would be great.
  • I was having trouble with this code throwing an exception if the process was repeated during the same execution session. It turns out that the MetadataEditor object was not closing the file handle properly. I added a function called CloseStream inside the MetadataEditor class and called it in the last line of the Using block that created the MetadataEditor. It could be added to the Dispose() function instead but this serves my purpose. Here is the code:

    public void CloseStream()
    {
    ((IWMMetadataEditor2) header).Flush();
    ((IWMMetadataEditor2) header).Close();
    }
  • Hmm.... I'm still getting errors when reading a second media file. I added the CloseStream function and I'm calling it before the end of my using block, as well. :-(
  • hi,
    a great article.
    i would appreciate if you could help me in gettign attributes of other files like *.mpeg,*.mov,*.rm.
    Any pointers to these would be very helpful.

    Regards,
    Sama
  • gettign attributes of other files like *.mpeg,*.mov,*.rm.

    can be done with DirectX SDK, very simple... much easier to use than Windows Media Format SDK.


    to retrieve WM/VideoHeight and WM/VideoWidth Attributes from various wmv files. can be done through accessing the video WMVIDEOINFOHEADER
    and you'll have everything you want from a wmv. and that's in the WMF SDK as well.
  • Very good article
  • Interesting. I never knew i could do this.

    Is there any complete software for this ?
  • There's a bug in the COM wrapper vtable definition for IWMMetadataEditor2. You have Flush and Close in the wrong order and you left off OpenEx as well. Here is the correct code (sorry if the formatting doesn't look good):

    public interface IWMMetadataEditor2
    {
    // HRESULT Open(const WCHAR* pwszFilename);
    void Open([In,MarshalAs(UnmanagedType.LPWStr)] string pwszFilename);

    // HRESULT Close();
    void Close();

    // HRESULT Flush();
    void Flush();

    // HRESULT OpenEx(const WCHAR* pwszFilename, DWORD dwDesiredAccess, DWORD dwShareMode);
    void OpenEx([In, MarshalAs(UnmanagedType.LPWStr)] string pwszFilename, [In] uint dwDesiredAccess, [In] uint dwShareMode);

    }
  • Hi
    I need something that will help me get the lyrics of the song asI play it so that I can do a karaoke to any song..is there any meta data that could be used for this???


Page 1 of 2 (19 items) 12