Well, June is here so it must be time to make a post.  Let's talk about...  Hmmm... What should we talk about?  Hmmm....  How about WinFS!

You know WinFS... It's one of the three main pillars of Longhorn Client (Avalon, Indigo, and WinFS).  Or maybe you don't know much about WinFS.  I haven't seen any WinFS bloggers on this site yet and maybe you missed PDC.  So perhaps a short WinFS tutorial is in order. 

Consider the world of today.  The files on our hard drives are basically binary blobs.  We open the file with a Win32 FileOpen() command and then index into the byte stream.  That works just fine because your code knows that the jpg image starts just past the header at byte 186.  That's pretty much all you get from the file system: A stream of bytes.  Sad really, when you think about it.

It’s unreasonable to think that people could really write meaningful, rich applications with only this “byte indexing” scheme that the file system provides.  So application developers have augmented this file system shortcoming for years by providing “APIs” to their file formats.  Basically, the application developer creates a DLL which has some method called “GetImage()” for jpgs.  Then an ISV who wants to get at the image in the jpg file just calls the “GetImage()” method in the DLL, and the DLL encapsulates the fact that you need to index 186 bytes into the file to find the image.  

Well, that's certainly an improvement over a stream of bytes.  But it is an imperfect solution at best.  Each ISV builds these “File Access APIs“ in a different way.  If your application wants to read Adobe Acrobat files, then you have to hand-code knowledge of the Adobe Acrobat file access API into your code.  But what if you wanted to build an application which worked with many different file types in a rich way?  For example, what if you wanted to build a shell application something like Explorer.exe that could do rich queries across the 100,000 heterogeneous files you've got stored on your new 250 gig hard drive?

To accomplish this, you would need a standard mechanism for discovering the content in your files.  What would that look like?  Well, let’s start by thinking about how it might work in the world of SQL.  Say I've got an ERP database with 2,000 different tables.  I have tables for Customers, Vendors, Orders, Invoices, etc... 

I could fairly easily build a general purpose application which allowed me to browse over any of the tables in a generic, yet rich way.  My application could discover the list of tables with an sp_ command.  (And to be fair, the file system would let me get a list of files by traversing the directory structure).  But my SQL application could go way beyond the file system application.  My SQL application could call another sp_ command and get the column list for each table.  That's where the file system gets left in the dust.  The file system sees the file as a big binary blob, while SQL sees its relational data as structured.   My SQL browser could then fetch the data from each column, along with the datatype information and display it in a generic, yet fairly rich way.

So the original WinFS vision (long, long ago) had its original goal to “move the Windows file system into SQL“.   But that (perhaps simplistic) approach didn't last long.  Sure, we could have moved the file system to SQL.  But a fundamental change to the OS file system is a big, big, traumatic deal.  We can only afford to make this kind of change perhaps once a decade.  Do we really want to inflict this pain, only to have applications using “SELECT Blah FROM filesystem“ SQL commands to directly access file data for the next decade?  Is there something better?

If we only cared about browsing file system data (read only), then “SELECT blah“ with a result set of primitive data returned might have been OK.  But this is our big chance; shouldn't we maximize our opportunity?  Raw data is Dumb data...  In the next decade, people want to act on their data.  They want their data to have behavior.  When I find a jpg picture on my hard drive, I want to edit it.  I want to email it.  All these things require logic.  And that's something “SELECT blah“ can't easily provide.

Hmmm...  data + behavior.  Maybe this CLR \ Managed Code thing that Microsoft is working on might apply?

So there you have WinFS...  WinFS is about transforming the file system from a “dumb as a rock“ byte stream into an active layer of persistent objects.  These persistent objects should be queryable in a rich fashion.  I should be able to easily go to a metadata catalog and get a list of all persistent object types.  I should be able to update these objects and be assured that any logic to validate the data is present in the persistent object.  And I should be able to update a group of related persistent objects in a transactional manner.

WOW.... that’s really cool!  Persistent objects with behavior.  What a concept!  Oh... wait a minute.  That sounds like MBF.  MBF is an environment which allows you to create a set of rich business objects (with behavior) which can be persisted to a database.

Hmmm...  Is there more to this story?  How do MBF and WinFS relate?  Inquiring minds want to know!

I guess you'll just have to wait for my next post.  (Don't you just love cliffhangers)!