Hi there, my name is Pat Miller, and I am the development lead for the Enterprise Metadata / Taxonomy features in SharePoint 2010. I've been working on the ECM team and its fore-bearers for the better part of 11 years now, first with NCompass Labs which was acquired by Microsoft in 2001, then on the Content Management Server team, then with the CMS team as part of MOSS 2007. This is the first of many blog posts on the Enterprise Metadata Management (EMM) system in the 2010 release. This will be the overview of the system, and future posts will drill into specific areas like event receivers, field editing and search refinements.
First, some background. At one point during the development of Content Management Server 2002, we spent some time with the folks that run the Microsoft.com set of websites. One of the things they were very keen on was this taxonomy system that they had built. It seemed fairly useful, and we considered implementing something like it, but didn't have the time, and there was a general concern that no one would actually do the work of tagging data. During the development of MOSS 2007, we were spending most of our time rewriting our feature set to run on top of SharePoint, and once again, taxonomy fell off the list of things we were willing to tackle (and still, people would consistently say that people just don't tag).
Around this time people started tagging things in their own world. The rise of digital cameras and mp3 players brought a huge amount of data that for the most part, had to be marked up with metadata in order to be searchable. Some metadata was added to the files automatically (things like date, size, camera model, etc.), but specific user information wasn't there. You quickly learned that if you categorized the images (either through folder location or tags) you could navigate your way through 10's of thousands of files (images, music, etc.) the way that works for you personally, rather than relying on default information like date the picture was taken. People became more familiar with the concept of navigating their content via metadata - "Let's listen to all my Pearl Jam albums, I feel like listening to Electronica, find me photos of Dad". It's only a small step from that to wanting to impose some sort of hierarchy - find me photos of my whole family, my extended family, I want to listen to all classical music, or perhaps just from the Baroque period. Tagging all that data really unlocked a lot of potential.
Perhaps the landscape had changed...
We decided to run with it in the 2010 release. There were a few main tenets that we tried to let guide us:
To that end, we set out to enable a bunch of new user scenarios for SharePoint 2010.
We started out the release with a blank sheet of paper and some very knowledgeable people in the information management space. We also found that most people started twitching uncontrollably when the word "ontology" was mentioned. 'Tagging' was fine, 'metadata' was OK, at 'taxonomy' they started looking for an exit. Telling people that a taxonomy was just a hierarchy calmed them down, but the whole ontology thing was too much of a stretch. It also complicated things considerably, and we could still get a huge amount of value out of a taxonomy, so this was our starting point.
Some features were very obvious - filtering list views based on hierarchy inclusion, search refinement, etc. Some were a small step from this - if you have a consistent vocabulary across an enterprise, you can start to do some interesting things. You can match areas of expertise to specific content or workflows. You can start to relate content in totally different systems based on something with more context than a simple string. What if you could relate your analytics content to your taxonomy system and get a real-time view of what topics people are viewing instead of simply guessing based on their position in a URL namespace? How about overlaying your security model with your metadata so that certain people had rights to view content based on the metadata applied to it? How about we get down to business and focus our resources and ship a compelling collection of features.
To that end, we came up with the following components in the system:
The taxonomy repository itself, we call it the Term Store. Some companies have very top down strict taxonomies, so some term stores might have a very few people allowed to edit them. We'll have to support having multiple term stores.
The taxonomy system needs to be able to support a complex enterprise. A simple flat list of strings isn't going to be sufficient. To that end, we support the following concepts and behaviors:
OK, that's a nice set of features in the taxonomy system. What do we want to do with all those terms and termsets?
The next set of features involve integrating the taxonomy system with SharePoint. The primary place this happens is in the new managed metadata field type. Think of it as a choice field that went to the gym. It's much more powerful. The metadata field type is a normal field that can be applied to any content type (list or document library). However it has a few nice things associated with it:
Once data is in SharePoint, other SharePoint features can deliver additional goodness:
Now that we have all that nice consistent metadata on our content, we can do a few more things:
And since we know that we can't possibly implement every feature that everyone would want, everything is accessible through our API. In future blog posts, we'll go over how to use this API to deliver some compelling features.
Hopefully this is a nice introduction to the work we did around taxonomies and enterprise metadata. We had a lot of fun coming up with the design and implementation, and hope that it resonates with you.
Thanks for reading.
Pat.Miller at Microsoft.com