I've been scratching an itch - of the coding variety.
For some while now, I've wanted to catalogue all the Windows Media files that I've (legally) ripped over the last year or so, but documentation on MSDN or the broader Internet was scant. So I've hacked together a rather nice little managed wrapper that makes it really easy to get at the metadata in any WMA files. You can use the wrapper to pull a single attribute out of a file, or to recursively trawl through a directory structure and build a strongly-typed dataset containing all the populated attributes from every file.
I think it's rather cool, even if I say so myself! Rather than drone on at greater length here, I've put an article online which describes the code and provides a download location for the sample code. Have fun with it and let me know if you put it to any interesting uses.
The one where Tim sells out to the marketing droids...
OK, so this is more of a sales pitch than I would normally include in my blog, but I've heard a couple of people over the last week say that they thought TechEd was primarily aimed at IT Pros rather than developers, and I wanted to correct that perception. In Europe, at least, TechEd actually has a slight bias towards developers (mostly because there's another, slightly smaller event called IT Forum that's pure IT Pro territory).
I'm responsible for the content of TechEd Europe this year - perhaps the most exciting and daunting challenge I've had since joining Microsoft, and I'm hugely keen to ensure that it's the best event we can put on for developers, architects and IT Pros. I'd love to hear your ideas about how we could make the event more worthwhile. Whether you've attended in previous years and think we're missing something big, or whether you've never attended because it didn't seem to meet your needs, I'd be very interested to read your feedback.
Anyway, having declared my vested interest, here's my non-marketing marketing pitch :-)
Hope to see some of you there...
In this article, I'll describe how to use the Windows Media Format SDK to access the metadata embedded in Windows Media files for cataloguing purposes. Also included is two managed classes written in C# that vastly simplify the usage of this SDK.
Download MediaCatalog 1.0 (35KB)
Introduction
Over the last year, I've been gradually filling a spare hard disk with rips of all my CDs. It's fantastic to be able to play any CD from my catalogue so easily, and it means I can hide the CDs themselves away from my young daughter's sticky fingers! The problem is that as my digital collection has accumulated, it's getting harder to see what I've got. I've painstakingly tagged all my CDs with metadata, but Windows doesn't currently provide any easy mechanism to sort or manipulate that metadata. So I thought I'd follow Duncan Mackenzie's example and hack together a media cataloguing application.
The trouble is that it's difficult to extract the metadata from a Windows Media file. The Windows Media Player SDK provides a nice interop library you can use to embed Windows Media Player in your managed application and drive it programmatically, but I definitely wanted to avoid driving GUI controls, given the number of files I want to catalogue. Instead, I fired up MSDN Library and discovered the Windows Media Format SDK, a low-level API into the file format itself.
This SDK isn't easy to program against from managed code, however - it's pretty grungy COM interop. Fortunately, with the aid of MSDN, Adam Nathan's .NET and COM book and a quick look at some pretty dodgy samples, I was able to build a fairly clean managed wrapper that provides a straightforward interface into the SDK. Ironically, I haven't finished writing the graphical front-end catalogue application that generated the itch in the first place, but I thought the managed library was interesting enough in its own right to share.
Using MediaCatalog
I've divided up the managed library into a high-level API and a low-level API. The low-level API is a class that allows you to open a media file, examine the attributes by index or name, and enumerate through them using a foreach loop. The high-level API abstracts the previous class and provides methods to allow recursive or non-recursive iteration through a database structure, creating a strongly-typed DataSet object that contains all the most common attributes in the audio files it finds. You could bind the output to a Windows Forms DataGrid, for example, and indeed the sample test harness included with the code does exactly that.
Low-Level API
To access the low-level API, you instantiate an object of type MetadataEditor, passing the constructor the filename of the media file you're interested in. You can then either enumerate through the object using a foreach statement, or query it by name or using an indexer. The object supports an int-based indexer or alternatively an enum-based indexer that simplifies access using common attributes. The following C# code sample demonstrates each of these choices.
using (MetadataEditor md = new MetadataEditor("britney.wma")) { // Enumerate through each of the attributes in the file foreach(Attribute attr in md) { Console.Write(attr.Name); Console.Write(": "); Console.WriteLine(attr.Value.ToString()); } // Set author to be the bitrate of the media file string author = md.GetAttributeByName("ID3/TPE1"); // Set d to the duration of the media file (e.g. 3m 45s) TimeSpan d = md[MediaMetadata.Duration]; }
Remember that since this object uses unmanaged resources, it's important to call the MetadataEditor.Dispose() method when you've finished using it in order to close the underlying resources. Alternatively wrap it inside a using statement as demonstrated above.
High-Level API
This API contains three main methods that can be used to extract album information across multiple directories if necessary:
As a quick example, the following C# code snippet binds a Windows Forms DataGrid to the output of RetrieveRecursiveDirectoryInfo:
MediaDataManager mdm = new MediaDataManager(); musicData = mdm.RetrieveRecursiveDirectoryInfo(@"\\timserver\music"); mediaInfo.DataSource = musicData; mediaInfo.DataMember = "Track";
The MediaDataManager object also exposes an event that can be used to track progress (particularly useful during a long recursive directory search). Use the following syntax to enable it:
mdm.TrackAdded += new MediaDataManager.TrackAddedEventHandler(mdm_TrackAdded);
Things To Do
The wrappers aren't complete by any means, and I'd love to hear your suggestions of how they might be improved (or even some code!). Several things on my own personal list:
I've just come across a nasty bug in some sample code (from us, I'm ashamed to say), that highlights the pitfalls of passing string buffers between managed and unmanaged code.
To go back a step or two, I've been trying to create a small application to pull metadata out of Windows Media files so that I can catalogue my music collection. (Incidentallly, there are several supported ways to achieve this, including the Windows Media Player SDK and the Windows Media Format SDK.) I'd come across this little function that iterated through all the metadata attributes in a file and dumped them to the console. But for some reason, the function only seemed to be printing the attribute names and not the associated values. The statement looked something like this:
Console.WriteLine("* {0, 3} {1, 25} {2, 3} {3, 3} {4, 7} {5}", wIndex, pwszName, wStream, wLangID, pwszType, pwszValue);
According to the debugger, I was seeing the contents of wIndex and pwszName, but none of the other parameters. Stranger still, when I preceded the Console.WriteLine call with a similar call to MessageBox.Show, the function printed all the parameters. Needless to say, when you get into the kind of debugging situation where you're seeing truly unexpected results, you often disappear down a blind alley trying to solve a problem that doesn't exist. In my case, I started testing the hypothesis that it was a timing issue that the message box display eradicated; I wasted several hours experimenting with wait loops and searching through the documentation for references to file status that with hindsight couldn't have fixed the problem.
Suddenly it came to me in a flash: the debugger was showing the value of pwszName as "Duration\0". Of course! There was a null-termination character at the end of the string that shouldn't have been there. It wasn't that the call to Console.WriteLine didn't contain the right parameters - it was simply seeing the \0 and terminating the string at that point. MessageBox.Show obviously deals with this differently.
So how had pwszName got created like this? Looking back at the sample code that generated the values, I saw something like the following:
string pwszName = null; ushort wNameLen = 0; HeaderInfo3.GetAttributeByIndex( wAttribIndex, ref wStreamNum, pwszName, ref wNameLen, out wAttribType, pbAttribValue, ref wAttribValueLen ); pwszName = new String( (char)0, wAttribNameLen ); HeaderInfo3.GetAttributeByIndex( wAttribIndex, ref wStreamNum, pwszName, ref wNameLen, out wAttribType, pbAttribValue, ref wAttribValueLen );
It's pretty clear from this piece of code what's wrong: the creator (presumably a C++ programmer judging by the code style) has called the function once to determine the length of the retrieved string and then called it a second time to fill a pre-populated string. They forgot to trim the final null value(s), with a statement such as the following:
pwszName = pwszName.Substring(0, wNameLen);
Even this is not a great way of handling string buffers. A far better approach would have been to have used the System.Text.StringBuilder class - a mutable string type that can be passed wherever a string is required by an API function. Rather than trimming the returned string, I rewrote the API declaration to use a StringBuilder rather than a fixed-length string and changed the sample code accordingly:
StringBuilder pwszName = null; ushort wNameLen = 0; HeaderInfo3.GetAttributeByIndex( wAttribIndex, ref wStreamNum, pwszName, ref wNameLen, out wAttribType, pbAttribValue, ref wAttribValueLen ); pwszName = new StringBuilder(wNameLen); HeaderInfo3.GetAttributeByIndex( wAttribIndex, ref wStreamNum, pwszName, ref wNameLen, out wAttribType, pbAttribValue, ref wAttribValueLen );
The moral of the story: whenever you need to pass a string buffer to a Windows API call, use StringBuilder. (Of course, string is just fine if the unmanaged function doesn't modify its contents.) And if you're wondering why a string is being prematurely truncated, make sure you check for rogue null-termination characters!
I love this site - I've been wanting to create something similar for a couple of years:
Which countries in the world have you visited? Here's the link you can use to create your own "visited country" map. (Thanks to Paul Bartlett for the link.)