What better motivation to get back into blogging than a challenge from fellow Microsoftie Dare:

The WinFS folks and Longhorn evangelists will probably keep focusing on what I have termed “bad scenarios” because they demo well but I suspect that there'll be difficulty getting traction with them in the real world.

 

I’m willing to try my hand at coming up with some not-bad scenarios, or maybe even some good scenarios, but first I want to repeat some of the principles I covered in my older blog posts.  Or at least I hope I covered them there ;)

 

The flurry of WinFS “metacrap” posts seems to have started with Simon Fell askingWhy will tagging 100 photos with 'Wedding' make things magically better than having the photo's in a 'Wedding' directory?”, which lead to Scoble posting about his dream scenarios.  Dare responded to that post with several of his own, the first one from above responding to Scoble's scenarios, and then a followup asserting that “Effectively tagging the content so it can be categorized in a way you can do interesting things with it search-wise is unfeasible”.  He also linked out to Cory Doctorow’s write-up from 2001 of why meta-data is more often meta-crap.

 

I agree in principal with a some of these criticisms of big huge dream scenarios.  We are a long way from the day when any file on any website is filled with valuable meta-data in a schema that every PC can understand.  It may well be that we never get there.

 

But that’s okay, because my dream scenarios for WinFS aren’t quite that grandiose.  I’ll be satisfied if WinFS helps me find, relate and act on my information, in a way that makes sense to me.  Bonus points if it helps me find, relate and act on information created by the people with whom I interact most closely: my family, friends and co-workers.

 

I think there are plenty of compelling benefits that open up with even a slight amount of metadata on the files I work with every day, and what’s more, I think that there are ways to mitigate the concerns Cory and other raise about people generally being lying, lazy, stupid self-deceivers (* applies to meta-data only, no promise about helping lying, lazy, stupid, self-deceiving politicians.)

 

My first assertion is that meta-data on a local, individual scale is interesting enough without asking all these questions about how it will scale to the entire Internet (although they’re good questions that we should address over time.)  If I can organize my own personal information in a better way than what I get today with the filesystem, that's a win.

 

My second assertion is that in many cases, people actually are creating accurate meta-data today.  At Microsoft, for example, every slide deck from PDC is named something like “DATA201 Clark.ppt”, including the session title and speaker’s last name.  Not only that, they are all stored in a folder named, IIRC, “2003-10-27 PDC”.  There’s some interesting, accurate meta-data.  Another example: almost every feature specification at Microsoft uses some sort of template, at the top of which are a bunch of fields like Feature Name, Program Manager, Tester, Developer, Milestone, Review Status, etc.  These fields all tend to be filled out accurately (or at least they start out accurate, and then decay over the years during which the product is actually built.)  The decay is a problem to address, but hey, the meta-data is there.  Every photo on my hard drive has useful metadata built in, whether it’s the timestamp in the EXIF header, the filename (I use the XP photo wizard to get names like “Winter2003Holidays01.jpg”), or the folder path that leads to them (“\photos\family\Thanksgiving in LA”).  My Money2004 file is filled with great meta-data, most of which I didn't even have to enter, about credit card charges and checks I've written.  My calendar in Outlook also is filled with accurate meta-data, including the time, location and subject of almost everything I do (and in many cases, it also has the list of other people who participated in the activity or meeting.)

 

See?  Plenty of handy meta-data, but today it’s pretty much inaccessible to any kind of centralized search and organization tool.  It’s mostly not in headers, or OLE doc props, it’s encoded into the filesystem and file streams – because today, that’s pretty much all we’ve got, files and folders.  Sure, we’ve tried to come up with better alternatives, but they haven’t shown any real benefits.  In Eric Newton’s response to Dare’s metacrap posting, he notes that “people didnt use office's meta data because frankly it wasmt on the beaten path. and frankly most people just simply arent organized and dont care to be organized, until they want to find something.

 

My third assertion is that even where explicit meta-data isn’t just lying around in the filesystem, meta-data can in some cases be inferred.  In comments to Scoble’s post, Richard Talent writes “The real win of WinFS will be that there are multiple contributors of that metadata: for instance, my address book knows how to spell names of people I know, why should my photo software require me to duplicate that effort?”  Another commenter, Malach, says “your PIM has ‘22 December, Aspen, Skiiing’ and you take a lot of photos in that day, then whatever handles the meta data management side of things should be able to put one and one together and ask you if they equal two”.  I think this is the right direction.

 

Okay, so we’re back to the same challenge from Simon and Dare.  Can we come up with scenarios that are so good that they motivate people to accurately capture meta-data into WinFS ahead of time?  And why would meta-data in WinFS be more useful than meta-data encoded into the file system (Simon’s example of the folder called Wedding)?

 

I’ll try to post a few compelling scenarios in the next week.  The foundation for them all, of course, is WinFS.  All my scenarios will take advantage of this new item storage functionality that defines a common place to store data and meta-data, a common, discoverable schema for meta-data, and a data-model that allows you to establish relationships between items.