IMMSharepoint - Video Accent

How long should it take you to build out an IMM site like this?

What about if you need to extend our ontology because it doesn’t meet all of your needs?

Just last week we had to do this very thing for one of our internal customers here at Microsoft. In the process a few things really hit us. First: We knew our documentation was poor, but when all of the custimizations we did are undocumented that is just a sad thing. Secondly: It only took us about 4 days. This is how long we want it to take when we deploy to customers; however it is nowhere near this timeframe. This is primarily related to my first point.

So what is to follow is a primer on the Media Library web part. In case you are wondering, the Media Library web part is that thing highlighted in red in the screen shot above.

What do you mean that's not a list, I thought everything was a list in SharePoint?

As some of you may have already thought about or experienced, SharePoint can be pretty unforgiving when working with hundreds of large files. So how then, with IMM being built on SharePoint, do we provide you the ability to manage, dare I say “millions”, of video files through a single SharePoint site? Does IMM use the external storage API identified in KB938499?

While you are pondering that, let us also take a look at what metadata is required to work with these media files. After all, what good is a 4 GB movie if you don't know if it's The Matrix, Jaws, or my high school prom video? While we can all agree on the title, how do we decide how to capture the rest of the metadata? There are already two very popular websites that solve this problem, The Internet Movie Database (IMDb) and Wikipedia. Both websites take two very different approaches. Now being built using SharePoint the obvious choice would have been to go with storing all data in a SharePoint list and using our own content types. While this approach would have given us the ability to inherit content types and allow our customers to add their own metadata onto each list, we still come back to a fundmental problem: performance. As we all know this would have slowed down SharePoint to the point where it was a very poor experience.

So as you already guessed, we aren’t using a list to store our data. Instead what we decided to do with IMM is to store all of our metadata in Resource Description Framework (RDF). RDF is a specification outlined by the W3C at http://www.w3.org/RDF/. Since RDF by itself can be a very unstructured way of describing things a Web Ontology Language (OWL) (http://www.w3.org/2004/OWL/) was created by the W3C to create a vocabulary so that things could be described consistently and related to each other. In this article I don't want to go into all of the details on why we chose RDF over using a SharePoint list or something else. This is saved for another day.

Do I still have your attention? Are you wondering how we store RDF metadata in SharePoint? We dont’ store our RDF metadata in SharePoint, we use our own proprietory RDF store. Where then do we store our large essence files? The typical scenario for this is a large SAN; however, this could be another Digital Asset Management (DAM) system or even a tape. So here you have it, we do not use any lists in SharePoint to store our data!

Getting the data about the data

The media library provides you a glimpse into the metadata repository. It's not meant to show you the whole thing, but just that thing that you need to know in order to do your job. Its meant to help you see the forest from the trees, or however that saying goes. The media library consists of views. A view is simply two things: 1) the query on how to get the data, and 2) the way that the grid should look. This allows us to have multiple ways of looking at the same set of data. This view here is the default view that we use for our demos.

Media Library - Default View

While this is the same media library in a different view.

Media Library - Additional Details View

As you can see this is the same data, just a different way of looking at it. In a typical SharePoint environment you would define your views at the list level and would have somewhat limited options available to you unless you created your own XSLT. With IMM we do things a little differently. Our media library doesn't pull its configuration from SharePoint, instead it is all stored inside of an XML file that is defined in the web.config file. By default this configuration file is stored in ~\wpresources\ImmMediaLibrary\LibraryTemplates.xml. Instead of going into detail on every piece of this file I'll instead highlight the key sections.

LibraryTemplateXML

As with all large XML files the first step is to know what you need to pay attention to when. There are usually two different reasons to be editing this file. You are going to either be defining your view/grid/columns or defining your data source. This is why I like to use an XML editor where I can collapse all unnecessary elements as I have done above.

What is a Column?

As you can see from the example above a Grid is made up of Columns. This naturally leads to two questions, What kinds of columns are supported out of box? and How can I make my own Column?. With the later question being more important. So what do we ship with? Since we are currently sold as a "solution" we only ship with two core column types: Menu and Workflow Selector. Menu is simply a column that can be bound to a result column and contain drop down menus that contain actions.

Menu Field

The next field is the Workflow Selector. This field is simply a checkbox that can be tied back to the workflow drop down menu. Now you may be wondering how useful the media library is if we only ship with those two columns. Remember how I mentioned that we are sold as a "solution"? This means that this isn't an out of box piece of software that you can buy, like SharePoint or Exchange, but is something that requires customization to fit into your scenario. As a part of this customization we ship with sample code and extensions that have more column types. Included in these is a very handy column type called Html Field and Proxy Thumbnail Field. The Html Field column is one of the more flexible that allows us to  customize the grid with HTML markup that is based upon the type of result being returned. So for example if a TV Season is returned we can display it differently than a Movie.

Getting the Data

As I mentioned earlier the data is stored in an RDF store that we have running outside of SharePoint. In order to get to this data we use SPARQL queries. SPARQL is a W3C specification (http://www.w3.org/TR/rdf-sparql-query/) that specifies how to query RDF data. For the media library these queries are stored in the Data Source section of the template XML file.

LibraryXML-SPARQL

The syntax for these queries appears to be similar to SQL but that is as far as it goes. The similarity is truly only skin deep. Does this make it a shallow language? Not really as SPARQL is only able to do perform SELECT queries but it provides a lot of power around it.

A SPARQL query consists of at least three different sections: PREFIX, SELECT, and WHERE. The PREFIX is used to define the namespaces for the types that will be used. The SELECT section is used to indicate which variables should be returned. This is similar to a SQL query. Then the WHERE section is used to put any conditions on the data. Unlike SQL this is a required field as without it nothing would be returned. Additionally unlike SQL instead of specifying a table to return results from, you specify what triple pattern you want results returned for by specifying any of the subject, predicate, or object. Please refer to the SPARQL specification for more information as this article is not intended to go into detail on this subject.

Instead of dealing with a single query that does it all, the media library is broken down into two different queries. The first query gets back the list of subjects that will be displayed (RowIdSparql) and then the second is used to define what data to display about those subjects (RowDetailsSparql). Additionally since the media library allows you drill down the row query takes an optional "{CURRENTSUBJECT}" variable. This variable is passed is inserted automatically when being drilled down into. Then the same is true for the row details query exception that a different variable name is used. In this case "{ROWSUBJECT}" is used. ROWSUBJECT is taken from the ?rowId variable used in row query. Now that I'm bound to have you confused as to how this actually works, let us walk through an example.

In my test environment my SharePoint Web ID is "58cedcd8-9398-4dfe-8205-45e3ef988799" and since I have not specified anything in the properties of the web part for the Root ID, the sites web id will be used as the container. The query then that is executing looks like the following:

	PREFIX dc: 
PREFIX did: <urn:mpeg:mpeg21:2002:02-didmodel-ns#>
PREFIX rdf: <http: 22-rdf-syntax-ns# 02 1999 www.w3.org>

SELECT ?rowId ?title
WHERE
{
<guid:_58cedcd8-9398-4dfe-8205-45e3ef988799> <did:itemcollection> ?rowId.
?rowId <dc:title> ?title.
?rowId <did:resourcecollection> ?proxyRes.
}

With the following result set being returned:

 

rowId title
guid:_c0ea9f7d-02f1-4ea6-868b-e82855c4ed73 Gelatin Video
guid:_f6da28c7-3f86-4ec0-847c-82ff9bcd32cf Fighter Pilot
guid:_bdeca860-127d-41e0-9530-f2653b90a7de Amsterdam Street Performer
guid:_275b1c0a-3d88-4ec0-8332-f154b845293a Media Player
guid:_973fe83b-8d39-4388-8916-85bff9a51d70 Amsterdam Street Performer 2
guid:_69151f9e-2c57-40de-8c89-a4f05c4c5f16 London Stock Exchange
guid:_0801d764-2ea6-49ea-a2c1-6e3c8aa5975a DEMO
guid:_cb9a8e86-8b7f-4332-81c4-eb3c1378f7ce BP MOSS Case Study
guid:_e8090739-e6b4-47cf-94fc-9028d5b7034c IBC 2007
guid:_d3ac0886-a6c7-4380-add9-234c0df3dc9d Universitat
guid:_fe67eca4-e480-40e2-9d2e-3b84d3d139d5 helloworld
guid:_30411e27-2c80-4e13-b87e-9a7d007e2a5d Harris
guid:_cfa6f18f-6734-4a7e-88bd-e86d69c3edac Coffee
guid:_966eed5d-6e3d-4aee-97d5-dbcb6f9657da Party Boat
guid:_7aa32b57-19f6-491c-a4df-d48fbe899970 Mediaset Demo
guid:_70447a00-a5d5-4a94-a323-4d543d130106 Mikhail
guid:_fbbdeedc-68db-48e0-9fe6-06a3e86e6bb7 Alliance
guid:_ca10bc92-0a60-4b78-a568-ec49e8eb8208 Demo 123
guid:_10937f47-e088-4e41-a5d9-6c0c04ea10fa T-Online Video
guid:_476b5dc7-95c3-4cfc-864e-a15f0379c11b HP DEMO
guid:_1576c8ac-1199-467d-b403-4aeffb1180ad Last Presentation of IBC
guid:_b5389571-2641-4ce6-bd06-0dda75f67ed4 Connections

With this data then another query is generated using the rowId this time as the key. This query is similar to the following:

	PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX did: <urn:mpeg:mpeg21:2002:02-DIDMODEL-NS#>
PREFIX imm: <http://schemas.microsoft.com/imm/core/1.0#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?rowId ?title ?thumbnail ?description
WHERE
{
?rowId <dc:title> ?title.
?rowId <dc:description> ?description.
?rowId <did:ResourceCollection> ?proxyRes.
?proxyRes <imm:IsProxy> 'true'^^<xsd:boolean>.
?proxyRes <did:Ref> ?thumbnail.
FILTER(?rowId = <guid:_c0ea9f7d-02f1-4ea6-868b-e82855c4ed73>)
}

Please note that this is not the actual query that is used as that would result in X + 1 queries being executed where X is the number of results returned in the first query. Instead all of the queries are appended together and one large batch is sent to the RDF repository. The results for the previous query look like such:

rowId Title Thumbnail description
guid:_c0ea9f7d-02f1-4ea6-868b-e82855c4ed73 Gelatin Video \2007 09 08\guid__a6c5bbe6-dd1e-400f-ae7c-a3821158b857.wmv This is a video about Gelatin. This is a video about Gelatin. This is a video about Gelatin. This is a video about Gelatin. This is a video about Gelatin.

Since all of this SPARQL stuff can be very confusing, we have created a SPARQL Cheat Sheet that should be referenced. (I have mine on my wall beside me, where’s yours?)