In the Whidbey release we finally provided a mechanism to retrieve metadata/schema information from a backend.  It’s actually quite cool (caveat:  I was the PM for this feature so that opinion might be a bit skewed) and is used throughout Visual Studio 2005.  Things like Server Explorer, the Data tools in Visual Studio 2005 and a number of other features rely heavily on this schema functionality.  We worked across quite a number of teams to implement this feature and in the end it turned out nice.  It’s even cooler that 3rd Party Provider writers can plug into Visual Studio by implementing our new common model (Db* classes, which subsume the IDB* interfaces), our schema implementation (*connection.GetSchema), and then the Visual Studio specific functionality required to show up in Server Explorer (called DDEX for which an excellent overview can be found here).  Anybody who writes a provider can plug into VS and get all these great features for their provider.  It’s quite powerful. 

 

However, I digress from my original point.  When we began talking about how we go about implementing the ability to retrieve schema we looked at the previous technologies we’ve shipped (OleDB, Odbc, ADO, etc) and we used them as a starting point.  There were a couple of things we wanted to do when we provided this solution:  1) Ensure the new schema API integrated with the existing ADO.NET API and 2) try to address previous concerns with our other schema API’s.  I do believe we fully accomplished the first item because I believe the API fits quite well into ADO.NET (simple things like using DataTable to return the results and making them very provider specific, more on this in a moment, as opposed to making everybody conform to the same format and collections).  For the second item, the jury’s still out.  However, there were 2 decisions we made which I believe help us accomplish that goal.  One of the first things that we looked at is what we would need to expose in our schema implementation:  Tables, Columns, Stored Procedures, Views, and Databases jumped to mind.  But as you take a closer look not all these collections mean the same thing to different backends or are even supported.  The way that Access defines what a stored procedure is differs from how Microsoft Sql Server and Oracle does.  If I have a standard query, without parameters, defined in my MDB, it is reported as a View, not a Stored Procedure; if there are parameters then it’s a stored procedure.  Another example is that Access and Oracle don’t have the concept of a Databases collection while SQL Server does.  One set of collections to rule them all doesn’t work here.

 

So as we looked at the second problem above we made a very specific decision:  we’ll only define a small set of collections and within them provide a way to express backend specific collections.  This decision was difficult for people to see the value in.  The first question we always got, and it was always this exact question, is “Well every database has tables so why don’t you define a tables collection?”.  Well that’s true.  And then, inevitably, the next question would be “And since we have tables, why don’t you define a columns collection, because every table has columns?”.  And so on.  Every other data access API had defined a pretty large set of metadata collections that you had to implement.  The trouble is that not every ADO.NET provider and backend might support the concept of a table.  What about an Object Oriented or an XML database?  Sure they could skew their metadata to fit these collections but we didn’t want people to have to conform to a specific definition of Metadata.  Believe me, it took some time to convince people this was the best decision.  Even today I get these questions and we’ve had our schema implementation for over 2½ years internally.  Time will tell if we’ve made the best decision but so far I think it looks good.  The nice thing about this is that the provider simply describes the backend metadata and we can add layers on top of it that describe the structure, names, and relationships between the collections (notice that DDEX does just this).  It’s actually quite powerful to be able to abstract this out and have it provided other ways. 

 

Another key decision we made as we looked at the second problem is how we enable visibility of new backend specific metadata as new releases come to market.  An example of this is Microsoft Sql Server 2005.  In this release we added UDT’s, the XML DataType, and extensions of existing types making them max types (NVarChar, VarChar, and VarBinary).  Since the Sql Server organization is the one we work on, we had the ability to update System.Data.SqlClient to reflect these new types and metadata.  However, what if another backend was updated and developers needed to express those types in GetSchema()?  In the previous API’s anybody who had a provider that was for use against that backend would have to re-release the provider with updated bits.  Another problem we found was that some customer’s needs around retrieving and using metadata were very specific to their solution and that they ended up having to work around and/or not utilize our schema solution.  This becomes problematic from the API perspective because depending on the application you have to program against a different API or you have to implement added layers of API abstraction to expose this metadata similarly.  With this in mind we went about designing a solution that enabled users to override our existing schema implementation and provide their own collections.  This functionality, the ability to specify your own schema format and content, is available for all the .NET Data Providers that ship in the framework.  Another benefit, although completely unintended, is that if we were to leave out obvious collections (ahem, like Primary Keys in SqlClient) there is a way for developers to override our implementation and provide their own solution.  Below I break out exactly what you must do and provide a sample that enables the PrimaryKeys collection in SqlClient.

 

What you need to do

There are 3 things you must do to enable this functionality:

1)     Provide your own XML File that describes both the collections and their population mechanism. (Example provided below)

2)     That XML File must reside in the config directory for the .NET Framework (on my machine it’s located at c:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\CONFIG).

a.      This is required because only power users and administrators are able to override this location on NTFS enabled drives.

3)     Add the following content to one of your config files (the name must be “MetaDataXml” but the value can be your own filename):

  <system.data.sqlclient>

    <settings>

      <add name="MetaDataXml" value="SqlClientSchema.xml" />

    </settings>

  </system.data.sqlclient>

 

So far it’s pretty easy but I must provide 1 caveat:  The following xml file (which describes the collections and their population mechanism) is very large and we didn’t have time to provide a way to override only certain collections.  Currently you must provide the entire xml file for us to use. 

 

Finally, before putting the XML file below, along with some sample code, I need to explain 1 thing in the XML file:  The PopulationMechanism element.  We have defined, in our schema implementation, a way to have a specific collection populated in 3 ways:  1) Hard coded in the XML file (delineated by DataTable as the population mechanism), 2) query based to the same server you are connected to (delineated by SqlCommand as the population mechanism), and 3) an internal implementation that the provider does (delineated by PrepareCollection as the population mechanism).

 

The Code and XML Files

So now let’s see this in action.  The following code will spit out all the collections that are exposed in SqlClient before we do the override:

 

Code

 

        static void Main(string[] args)

        {

 

            using (SqlConnection con = new SqlConnection(ConfigurationManager.ConnectionStrings["MyLocalConnectionString"].ConnectionString))

            {

                try

                {

                    con.Open();

                    DataTable schemaDataTable = con.GetSchema(System.Data.Common.DbMetaDataCollectionNames.MetaDataCollections);

                    foreach (DataRow row in schemaDataTable.Rows)

                    {

                        Console.WriteLine(row[0].ToString());

                    }

                }

                catch (Exception exc)

                {

                    Console.WriteLine(exc);

                }

            }

            Console.WriteLine("hit any key(s) and return to end the program");

            Console.ReadLine();

        }

 

 

Console Output

MetaDataCollections

DataSourceInformation

DataTypes

Restrictions

ReservedWords

Users

Databases

Tables

Columns

Views

ViewColumns

ProcedureParameters

Procedures

ForeignKeys

IndexColumns

Indexes

UserDefinedTypes

hit return to end the program

 

 

As you can see there is no PrimaryKeys collection when in fact there should be (we missed this in the initial implementation and in the end it didn’t make it unfortunately; a problem which bothers me daily). 

 

So now if you add an application configuration file to your Console and/or Winforms application you can have PrimaryKeys added to the Schema implementation of SqlClient.  The SqlClientSchema.xml file must be put in the .Net Framework configuration directory.  A copy of this file is attached to this post.  Once we’ve put the schema xml file in the config directory and we’ve added to our application configuration that we 1) want to override the schema implementation and 2) with this specific file (the full format to add to your configuration file is found above in the “what we need to do” section) we can run the application again and you will get the following output:

 

Console Output

MetaDataCollections

DataSourceInformation

DataTypes

Restrictions

ReservedWords

MyContactInformation

Users

Databases

Tables

Columns

Views

ViewColumns

ProcedureParameters

Procedures

ForeignKeys

IndexColumns

Indexes

UserDefinedTypes

PrimaryKeys

hit any key(s) and return to end the program

 

I’ve bolded and italicized the PrimaryKeys collection to note it’s addition to the return from SqlClient!  I’ve also added another collection that I’ll leave you to find. 

 

It’s interesting to note that this particular issue, the lack of a primary keys collection, isn’t the only problem that our schema override functionality provides.  If the collections we expose are not rich enough for your application or there are other custom collections that you want to expose which are required by your application these can all be provided through the GetSchema implementation.  I’d limit the type of information exposed through the GetSchema API to backend specific information that your application requires in order to accomplish certain metadata driven actions (e.g. building a form that’s based on the format of a table or statistical information about your backends objects in order to do analysis on the clientside).  Have fun and tell me what you think!

 

Finally we offer this schema override mechanism for all our providers (OleDb, Odbc, and OracleClient) with minor adjustments to each provider. With OleDb you can specify an override down to the OleDb provider and with Odbc you can specify an override down to the native ODBC Driver.

 

-- Carl