Welcome to MSDN Blogs Sign in | Join | Help

This is the fifteenth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you might try searching for the audio tapes on www.Bing.com.  That would be a lot easier than reading. You won't find any, but you'll feel better for having tried.

Complete list of posts in the Building an IQueryable Provider series

Getting this version of the toolkit together has taken a lot of sleepless nights.  Thank goodness for Netflix or I'd have had to spend those sleepless nights actually working on the Toolkit.

Okay, enough with the flavor text, let's get to the crunch.

What's inside:

More Providers - MySQL and SQLite join the previous MS only line up.

Transactions - Use ADO transactions to control the isolation of your queries & updates.

Entity Providers - The provider concept is expanded to include tables of entities

Entity Sessions - The session concept adds identity caching, change tracking and deferred updates via SubmitChanges

Provider Factory - Create providers on the fly w/o knowing anything more than the database name and mapping.

Madness


The full source code and redistributable DLL's can be found at:

http://www.codeplex.com/IQToolkit

More Providers

  • MySQL -- With the stewardship of MySQL in doubt after the purchase of Sun by Oracle, I was leery of taking on the challenge of making a MySQL provider for the toolkit. Yet, the benefits of doing so turned out to be very significant and not just for the MySQL users, as it forced me to challenge some of my very assumptions about SQL which in turn made some of my meager testing better, which in turn made the toolkit better as a whole. 

    I was surprised when I went to download the free and open-source database product that I had to fill out a bunch of information on myself and wait for a formal approval (that took many days) before I was *allowed* to access the product. This seemed like an awful lot of hassle and odd for an open source project; way too corporate. Fortunately, I persisted and got the server up an running with no problem. Next I searched high and low for MySQL version of Northwind so I could have something to run the tests against. If only Bing were available then!  I had to settle for plain ol' Live Search. Fortunately I found what I was looking for, even if it didn't offer me any cash back.

    MySQL seems to me what you get if you take a bunch of C programmers and tell them to make a SQL database product and yet offer no guidance. This is not criticism, just a bit of snarky wit over many of the underscore endowed API's that make up the product. Luckily, you'll not need to worry yourself about the manual dexterity needed to type such awkward function names, as you'll be using LINQ and the MySQL provider will do the work for you. On the positive side, the function library of MySQL is quite large and feature rich. I had no problem finding appropriate translations. MySQL even had many interesting features that I'd like to go back and refactor the toolkit to take advantage of.  Next release maybe.

    There were a few problems I uncovered while trying to get MySQL to pass the test suites.  I took the opportunity to fill out the execution suite that has languished for a few releases now, so I could actually assert what the correct output ought to be for specific queries. In almost all the cases MySQL did the right thing (as far as I was expecting.)  In most of the cases that it did not appear to do the right thing, it often turned out to be bad assumptions on my part about collation ordering and so removing those assumptions from the tests fixed everything up. The remaining problem still boggles my mind.

    In a test designed merely to prove that the translation of simple joins succeeded to generate the right query, the database did not return the correct number of rows. 

    from c in db.Customers
    from o in c.Orders
    from d in o.Details
    select d.ProductId

    which executes a MySql query that looks like this:

    SELECT d.ProductId
    FROM [Customers] AS c
    LEFT OUTER JOIN [Orders] AS o ON o.CustomerID = c.CustomerID
    LEFT OUTER JOIN [Order_Details] AS d ON d.OrderID = o.OrderID

    I expected to get a number of rows corresponding to the total number of order-details in the database.  But I didn't.  I got a whole lot less.  So I started experimenting with different forms of the query.  If I selected o.OrderID instead of d.ProductID, I got a different number from either of the other two. If I selected c.CustomerID, again an entirely different number of rows.  It was only when I chose to select the entire order-detail object did I get the number of rows I was expecting.

    Something strange seems to be going on.  On nobody's definition of SQL should I be getting a different number of rows depending on the columns in the selection (except if that SELECT has a DISTINCT operator specified.)  It was as if all queries were getting a default DISTINCT operation. When I took a look a the results when I selected out only ProductID, sure enough, all I got back were distinct product-id's.  MySQL has an 'ALL' keyword that is the semantic opposite of DISTINCT, so I tried adding that to the query, but to no avail. 

    I don't know if this is a problem in the query engine or the transport layer, or if the MySQL folks actually think it is appropriate behavior.  As far as I'm concerned it is catastrophically bad; E_FAIL. For this problem alone I would recommend not using MySQL. Still, it may not be as bad as it sounds. It is likely very rare that you'd ever actually write a query that was expected to retrieve duplicate rows of data. So it may not ever impact you at all. I do recommend testing all your queries (and your application) before you put it into production.
  • SQLite - This is an open-source database product that is similar in nature to MS Access and SQL Compact, since it runs in-proc and is not really a server. After wrangling with MySQL, I thought I knew all I needed to know about different forms of SQL, until Jonathan Peppers sent me his go at making a SQLite provider. ( See, I told everyone that I'd add new providers if they would send them to me.)  Of course, SQLite had a few problems of its own.

    While MySQL has a huge library of function, SQLite has a tiny one; many .Net API's don't have supported translations.  (This is of the set I had translated for MS SQL.  To be truthful, MS Access falls short on some of these too.)  So queries using API's other than some simple string manipulation or equality testing probably won't work against SQLite. 

    In addition one of SQLite's big drawbacks is its lack of a rich type system.  SQLite's developers claim this to be a feature, and it may afford great flexibility, but is often a death sentence for LINQ queries, which are all strongly typed.  SQLite does have types, but only a few of them, like number, text and binary.  Problems arise when you try to use DateTime's, as they are not really their own type in SQLite, but a text layout.  For example, if you were trying to find all orders for a customer that happen in January, you might write this query.

    from o in db.Orders
    where o.CustomerID == cust && o.OrderDate.Month = 1
    select o

    this would produce this query.

    SELECT o.OrderID, o.OrderDate
    FROM Orders AS o
    WHERE o.CustomerID = @cust AND STRFTIME('%m', o.OrderDate) = 1

    The date function to extract the month out of the OrderDate column is really a text formatting function that extracts the month portion of the date-time text layout, which in this case is the text '01', since the date-time format is always padded to a specific width.  When this is compared against the number (1), another subtle difference crops up. For all other SQL's I've come across, text is considered the weakest form of type, so when two types are compared for equality text is always converted to the other form.  Yet, in SQLite, the opposite it true (which truthfully is more like C# and Java).  So when the two types are compared they are found incompatible, because the number (1) is turned into the text '1' and that is not the same as the text '01'. Adding insult to injury, there appears to be no type conversion functions at all, so I can't even work around the problem by injecting a conversion. My ignorance of SQLite may just be showing here, or a lack of sufficient documentation.

    So my recommendation if you are using SQLite, to stay away from most API functions in your where clauses and such. API calls in the projection are okay, because these get executed on the client.

Transactions

I'm surprised no one's called me on this before. The DbQueryProvider and its ilk have been suspiciously lacking in support for transactions. The providers work, LINQ queries are converted into ADO Commands and executed, yet those ADO Command objects are never assigned an ADO transaction, even if you started one explicitly. 

Of course, the official word from Microsoft is to stop using the ADO transactions altogether and instead use the System.Transactions.TransactionScope object, that is newer, better and enables automatic use of distributed transactions, etc, etc, etc. And if you did use TransactionScope, then the problem I'm referring to would not be a problem. SqlCommand object's would implicitly enlist in the transation without me having to specify anything. 

Unfortunately, TransactionScope is mired with many problems and is not supported by all ADO providers so ADO transactions are still a necessity.  You can now use ADO transactions with query providers in a manner similar to LINQ to SQL.

provider.Transaction = provider.Connection.BeginTransaction();

// use the provider here to execute queries and updates, etc.

provider.Transaction.Commit();
provider.Transaction = null;

The provider will use whatever transaction object you give it when it creates new ADO command objects.

Entity Providers

I decided to formalize the pairing of query providers with a Table object that enables updates and other facilities.  The definition of an entity provider is now defined by these three interfaces.

public interface IEntityProvider : IQueryProvider
{
    IEntityTable<T> GetTable<T>(string tableId);
    IEntityTable GetTable(Type type, string tableId);
}

public interface IEntityTable : IQueryable, IUpdatable
{
    new IEntityProvider Provider { get; }
    string TableId { get; }
    object GetById(object id);
    int Insert(object instance);
    int Update(object instance);
    int Delete(object instance);
    int InsertOrUpdate(object instance);
}

public interface IEntityTable<T> : IQueryable<T>, IEntityTable, IUpdatable<T>
{
    new T GetById(object id);
    int Insert(T instance);
    int Update(T instance);
    int Delete(T instance);
    int InsertOrUpdate(T instance);
}

You can now always get to a table directly from a provider.  The two concepts are coupled together.  An entity table also has explicit CRUD methods and implements IUpdatable, so no more separation between normal tables and updatable tables. In my mind this simplifies things quite a bit. 

Of course, this caused me to want to rename DbQueryProvider.  Ooops.  This will likely cause you some grief as any of your existing code that was using DbQueryProvider directly is now not going to compile.  The new name for this class is now DbEntityProvider. It might not matter so much now that there is a nifty IEntityProvider interface.

Entity Sessions

One thing missing from the Toolkit so far has been all of that context stuff that LINQ to SQL and LINQ to Entities have.  When you use LINQ to SQL you have a change tracking service that detects when your objects change, and sends the updates for you all at the same time when you call SubmitChanges. 

An entity session is all of this change-tracking, deferred updating stuff packaged up together. It is distinctly different from an entity provider in these ways, yet similar to one in many others.

An entity session is defined below:

public interface IEntitySession
{
    IEntityProvider Provider { get; }
    ISessionTable<T> GetTable<T>(string tableId);
    ISessionTable GetTable(Type elementType, string tableId);
    void SubmitChanges();
}

public interface ISessionTable : IQueryable
{
    IEntitySession Session { get; }
    IEntityTable ProviderTable { get; }
    object GetById(object id);
    void SetSubmitAction(object instance, SubmitAction action);
    SubmitAction GetSubmitAction(object instance);
}

public interface ISessionTable<T> : IQueryable<T>, ISessionTable
{
    new IEntityTable<T> ProviderTable { get; }
    new T GetById(object id);
    void SetSubmitAction(T instance, SubmitAction action);
    SubmitAction GetSubmitAction(T instance);
}

public enum SubmitAction
{
    None,
    Update,
    PossibleUpdate,
    Insert,
    InsertOrUpdate,
    Delete
}

public static class SessionTableExtensions
{
    public static void InsertOnSubmit<T>(this ISessionTable<T> table, T instance)
    {
        table.SetSubmitAction(instance, SubmitAction.Insert);
    }

    public static void InsertOnSubmit(this ISessionTable table, object instance)
    {
        table.SetSubmitAction(instance, SubmitAction.Insert);
    }

    public static void InsertOrUpdateOnSubmit<T>(this ISessionTable<T> table, T instance)
    {
        table.SetSubmitAction(instance, SubmitAction.InsertOrUpdate);
    }

    public static void InsertOrUpdateOnSubmit(this ISessionTable table, object instance)
    {
        table.SetSubmitAction(instance, SubmitAction.InsertOrUpdate);
    }

    public static void UpdateOnSubmit<T>(this ISessionTable<T> table, T instance)
    {
        table.SetSubmitAction(instance, SubmitAction.Update);
    }

    public static void UpdateOnSubmit(this ISessionTable table, object instance)
    {
        table.SetSubmitAction(instance, SubmitAction.Update);
    }

    public static void DeleteOnSubmit<T>(this ISessionTable<T> table, T instance)
    {
        table.SetSubmitAction(instance, SubmitAction.Delete);
    }

    public static void DeleteOnSubmit(this ISessionTable table, object instance)
    {
        table.SetSubmitAction(instance, SubmitAction.Delete);
    }
}

As you can see, an entity session has tables, just like a provider.  Yet, those tables are not directly updatable.  Instead you can assign entity instances submit actions. These are the actions that take place later when you call SubmitChanges.  There are a bunch of extension methods defined to add the LINQ to SQL like InsertOnSubmit() methods to the interface.  These simply call the SetSubmitAction() method for you.

Also note that a session is not a provider. It is a service used in conjunction with a provider. You can use multiple different sessions with the same provider instance.   

There is one current implementation of an entity session in the Toolkit called (you guessed it) DbEntitySession.  You create a DbEntitySession by giving it an existing DbEntityProvider. The DbEntitySession hooks the provider in such a way that it gets first crack at all materialized objects before they are returned to you. In this way, the DbEntitySession can employ an identity cache so queries that retrieve the same entity will always return the same entity instance, and it can start automatic change tracking on all entities returned.

You are also not locked into the session's behavior.  At any time you can interact with the underlying provider instead for retrieving entities without passing through the identity cache or being changed tracked.  You can even get to the provider's table directly off a session table.

Provider Factory

Now with so many providers and one single way to write queries you'd think it would be easy to switch between them.  In reality it is not. You have to pick the provider you want, reference its library (IQToolkit.Data.XXX), reference its corresponding ADO library (System.Data.XXX), create the ADO connection, the mapping object and construct the provider.

var connection = new SqlConnection("...");
var mapping = new AttributeMapping(typeof(Northwind));
var provider = new SqlProvider(connection, mapping, QueryPolicy.Default, null);
var db = new Northwind(provider);

You can hide this all inside your database context class (or whatever you want to call yours), so you only have to write it once, but then your context class is tied to a specific provider.  Instead, you could wrap this code up into a factory method of your own devising, but then calls to the factory would be spread throughout your codebase.  There no good way to defer all this work to some configuration setting. Until now.

Introducing the new factory methods built into DbEntityProvider.

public static DbEntityProvider FromApplicationSettings();
public static DbEntityProvider From(string filename, string mappingId); public static DbEntityProvider From(string provider, string connectionString, string mappingId);

These methods allow you to get up and running with only knowing a few bits of information.  You don't have to hard link you application to any particular provider.

The FromApplicationSettings method creates you a new instance of a provider from information found in the config file.  It looks for the "Provider", "Connection" and "Mapping" properties in the configuration and feeds them to the other factories.  It is also possible to look this information up in web settings, but I have not formalized that one yet.

The provider argument is a string that refers to the name of an assembly that contains the query provider.  These are generally of the form IQToolkit.Data.XXX.  If that assembly is not loaded, it will be loaded dynamically. This assembly can be in the assembly cache or in the same directory as your app (or other places that the runtime might look.)  From this assembly it will look for a type in the same namespace (as the name of the assembly) that derives from DbEntityProvider.

The connectionString and filename arguments are really the same thing. You can specify either the name of a database file or a full ADO connection string. If a file is specified, a correct connection string is obtained by calling the static GetConnectionString(string) method on the provider. A provider may be inferred from the file extension of a database file if none is specified.

The mappingId can either refer to the name of a context class (like Northwind) that has mapping attributes on it or the name of an xml file. 

So now you can write code like this to get your provider.

var db = new Northwind(DbEntityProvider.From(somedbfile, somemapfile));

Or better yet, you can use the FromApplicationSettings() method in the constructor of your context and still be configurable at runtime.

Madness

Of course, it wouldn't be a new toolkit release without some additional crazy changes.  One significant change is namespaces again. This time its not going to conflict with your code too much.  Most of the classes that where in IQToolkit.Data have been demoted into the namespace IQToolkit.Data.Common.  This includes most all classes that are implementation detail or base classes.  Mapping attributes and the like are now in IQToolkit.Data.Mapping.  This makes the namespace clean and obvious when you start looking for things via intellisense. 

DbEntityProvider and DbEntitySession are the only classes sitting in IQToolkit.Data, as these are the ones you'll likely need to reference when writing code.  IEntityProvider and IEntitySession are in IQToolkit namespace, because they are not specific to ADO (System.Data classes).


I hope you find this version feature rich enough to either build application directly on top of it, or model your own provider or data layer by using these techniques.

Don't forget the audio tapes.

 

This is the fourteenth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you might request a weeks vacation, sit back, relax with a mochacino in one hand a netbook in the other, or if you've got better things to do with your time print them all out and stuff them under your pillow. Who knows, it might work better.

Complete list of posts in the Building an IQueryable Provider series

Okay, enough with all the post-is-late guilt! It's done now, so breathe a sigh of relief and get on with the reading.

What's inside:

More Mapping - Finally a real mapping system, with attributes and XML.

More Providers - MS Access and MS SQL Server Compact Edition

More POCO - Constructors, Enum and Interfaces.

More More More

The full source code can be found at:

http://www.codeplex.com/IQToolkit

More Mapping

  • Attribute Mapping - put attributes on the properties in the class that declares your tables.

    This differs from LINQ to SQL mapping attributes which are placed on the entities themselves and is more like the proposed LINQ to Entity mapping attributes. However, I've not actually gone out of my way to make them the same. The advantages to this approach are 1) keeping the mapping separate from the entity objects (more POCO), and 2) being able to supply different mapping for the same entity type based on the table the entities are accessed from. 

    Mapping attributes look like this:
  • [Table]
    [Column(Member = "CustomerId", IsPrimaryKey = true)]
    [Column(Member = "ContactName")]
    [Column(Member = "CompanyName")]
    [Column(Member = "Phone")]
    [Column(Member = "City", DbType="NVARCHAR(20)")]
    [Column(Member = "Country")]
    [Association(Member = "Orders", KeyMembers = "CustomerID", RelatedEntityID = "Orders", RelatedKeyMembers = "CustomerID")]
    public IUpdatableTable<Customer> Customers

You specify the Table, Column and Association attributes as necessary.  The 'Member' refers to the member in the entity type. If this is the same name as the database's column name you don't need to repeat it by specifying 'Name' too. 

You can specify nested mapping information by using a dot in the Member name. This allows you to have what some call value types, but to keep from clashing with .Net terminology I don't. For example, if you've defined an Address type that you want to use in a nested relationship (actually embedded in the same table row) you can do that like this:

[Table]
[Column(Member = "EmployeeID", IsPrimaryKey = true)]
[Column(Member = "LastName")]
[Column(Member = "FirstName")]
[Column(Member = "Title")]
[Column(Member = "Address.Street", Name = "Address")]
[Column(Member = "Address.City")]
[Column(Member = "Address.Region")]
[Column(Member = "Address.PostalCode")]
public IUpdatable<Employee> Employees
  • Xml Mapping -- this is same as attribute based mapping but data is read from an XML file. 

    Xml mapping looks like this:
  • <?xml version="1.0" encoding="utf-8" ?>
    <map>
      <Entity Id="Customers">
        <Table Name="Customers" />
        <Column Member = "CustomerId" IsPrimaryKey = "true" />
        <Column Member = "ContactName" />
        <Column Member = "CompanyName" />
        <Column Member = "Phone" />
        <Column Member = "City" DbType="NVARCHAR(20)" />
        <Column Member = "Country" />
        <Association Member = "Orders" KeyMembers = "CustomerID" RelatedEntityID = "Orders" RelatedKeyMembers = "CustomerID" />
      </Entity>
      <Entity Id="Orders">
        <Column Member = "OrderID" IsPrimaryKey = "true" IsGenerated = "true"/>
        <Column Member = "CustomerID" />
        <Column Member = "OrderDate" />
        <Association Member = "Customer" KeyMembers = "CustomerID" RelatedEntityID = "Customers" RelatedKeyMembers = "CustomerID" />
        <Association Member = "Details" KeyMembers = "OrderID" RelatedEntityID = "OrderDetails" RelatedKeyMembers = "OrderID" />
      </Entity>
      <Entity Id="OrderDetails">
        <Table Name="Order Details"/>
        <Column Member = "OrderID" IsPrimaryKey = "true" />
        <Column Member = "ProductID" IsPrimaryKey = "true" />
        <Association Member = "Product" KeyMembers = "ProductID" RelatedEntityID = "Products" RelatedKeyMembers = "ProductID" />
      </Entity>            
    </map>

You use it like this:

XmlMapping mapping = XmlMapping.FromXml(TSqlLanguage.Default, File.ReadAllText(@"northwind.xml"));
SqlQueryProvider provider = new SqlQueryProvider(connection, mapping);
  • Multi-table mapping -- Map multiple tables into a single entity.  If you've got entity data spread out over multiple tables with a 1:1 association between them you can now specify the additional tables in mapping using the ExtensionTable attribute or equivalent XML element. 

    Here's what a multi-table mapping looks like:
  • [Table(Name = "TestTable1", Alias = "TT1")]
    [ExtensionTable(Name = "TestTable2", Alias = "TT2", KeyColumns = "ID", RelatedAlias = "TT1", RelatedKeyColumns = "ID")]
    [ExtensionTable(Name = "TestTable3", Alias = "TT3", KeyColumns = "ID", RelatedAlias = "TT1", RelatedKeyColumns = "ID")]
    [Column(Member = "ID", Alias = "TT1", IsPrimaryKey = true, IsGenerated = true)]
    [Column(Member = "Value1", Alias = "TT1")]
    [Column(Member = "Value2", Alias = "TT2")]
    [Column(Member = "Value3", Alias = "TT3")]
    public IUpdatable<MultiTableEntity> MultiTableEntities

Extension tables are specified similar to how Associations are specified, except you are never referring to members, only column names.  You use the 'Alias' value to connect column & association mappings with columns from particular tables.  All queries for this multi-table entity treat the 'Table' as the primary table queried, all other tables are queried with left-outer joins.  All keys for associations must be from the same alias.

Can I mix nested mapping with multi-table mapping?  I have not tried it, but in theory it should work. It should not matter which table your nested entity gets it's data from, so in effect you can have a composition relationship between one table and another as long as it is 1:1.  

What about many-to-many?  Not yet. Making the system query a many-to-many relationship is relatively easy.  I haven't yet figured out the right semantics for inserts & updates. Right now, all insert, updates and deletes are explicit via calls to the IUpdatable with real-live entities. Yet how do you make an explicit update to the link table that you don't have an entity directly mapped to?  I need to ponder this some more.  Possibly if one side of the relationship is a composition as opposed to an association, then it would be implied when that side is updated.  Yet what if you chose not to load the relationship, how do you tell the system to not delete all previous relationships?

More Providers

  • MS Access -- This new query provider works with both Access 2000 - 2003 and Access 2007 data files. I don't know what the true differences are between the Jet and the Ace engines; the query language appears to be identical (as per my limited tests so far), yet the filename extension changed in Access 2007 to 'accdb' instead of 'mdb' and the northwind sample database plumped up an extra 66% in disk size without any additional data.

In order to make this work I've added an AccessLanguage object that is necessary to get the correct semantics for MS Access queries and an AccessFormatter object that handles generating the correct command text. In order to salvage as much as I could from the TSqlFormatter, I moved most of this code to a common SqlFormatter base class, and now the TSQL and Access formatters only supply the deviations from the standard syntax.  (Of course, 'standard' is currently whatever I deem it to be so don't go getting some actual online specification and prove me wrong.) Access only allows one command at a time, so that added an extra wrinkle, but in the end there is now support in the system for providers that can only do one command at a time. This means there are multiple round-trips to the engine for things like inserting a record and getting back the computed keys. Luckily, the access engine is in-proc so this is not really a burden. A new property on QueryLanguage, 'AllowMultipleCommands' determines how the execution plan is generated and whether multiple commands can be lumped together into a single ADO command.

The good news is that the access engine passes almost all the Northwind tests; some are not possible (mostly ones testing translation of framework methods that have no apparent equivalent in the access expression engine).  There were a lot of hairy strange & subtle differences in syntax between Access and TSQL, but most were handled by having different format rules, some required new expression visitors to change the query tree, like no explicit cross joins!  This caused me to write a visitor to attempt to get rid of cross joins (often injected by my visitor that tries to get rid of cross-apply joins) which is now generally useful to everyone, and if that didn't do it, another visitor that would attempt to isolate out the cross joins from any other joins and push them into sub-queries where Access lets me use the old-style comma-list, which is truly a cross join, though it just can't be mixed with other kinds of joins in the same from clause.

  • SQL Compact -- Yes, even more SQL Server.  Though to be truthful, SQL Server Compact Edition (aka SQL CE, aka SQL Compact, aka Skweelzy) is not really SQL Server, it is some other entirely different product that handles a subset of TSQL syntax, and is not a server at all since it runs in-proc just like MS Access.

  • What about MySQL or Oracle?
    One day. The fact is that MS SQL and MS Access are easy for me to get to, they are already on my box. Getting something else up and running would take actual effort, and the MS secret database police might come get me. Meanwhile, if someone out there wants to put together a provider implementation I'll add it into the drop.

  • Where did the SqlQueryProvider go?
    I moved it. With the addition of the new providers it became apparent that I'd have to start factoring out all this ADO provider specific nonsense, otherwise all uses of the toolkit would have direct dependencies to way more than necessary. So I made separate projects, each building its own library. I may end up separating all the core 'data' stuff out into its own project too.

    The solution builds these libraries now:

- IQToolkit.dll
- IQToolkit.Data.Access.dll
- IQToolkit.Data.SqlClient.dll
- IQToolkit.Data.SqlServerCe.dll

More POCO

  • Constructors -- Use entities that don't have default constructors.  It is now possible to have entities that require invocation of a constructor with parameters.  The binding process will figure out how to call your constructor and the client side code will call it for you as long as the constructor parameter names match property names. You can even have fully read-only entities if all data member are accounted for in the constructor.

  • Enums -- They actually sort of work now. You can have a member in your entity typed as some enum and you get automatic mapping between that enum and a numeric type in the database.

  • Interfaces and abstract base classes -- You can now declare you IQueryable's as IQueryable<SomeInterface> or IQueryable<SomeAbstractClass> and have the provider create instances of a compatible type under the covers, automatically translating all your references to interface or abstract class members to the appropriate mapped members. You can have mutiple entities share a common interface or base class and get different mapping for each. You can write code using generics and constrain generic parameters based on an interface and write queries that will get correct translation at runtime.  (Note, variation of mapping likely won't work with compiled queries, since the translation is fixed on the first execution.)

Less Policy

There's not a whole lot of policy being used right now and the policy objects dependence on the mapping object was no where near as deep as the mapping object's dependence on the language.  So policy is now independent of mapping, which means you can construct providers without specifying policy and/or reusing mapping with different policies.  Now if I could only make it simpler to specify/construct mapping without needing to know the language.  Back to the drawing board.

More Insanity

I apologize for the churn. The namespace changed so now all heck is going to break loose. Gone is the simple 'IQ' namespace and in its place is the 'IQToolkit' namespace.  I really did like the 'IQ' name, it was short, classy and made you feel intelligent just by looking at it.  Yet, it was hard to guess at if you did not already know what it was.  I chose to change the namespace name to match the product name and the DLL name. You add reference to the IQToolkit.dll and you import/use the IQToolkit namespace. No fuss, no muss.  Except for all those files you'll have to edit now. But hey, this is pre-pre-pre-pre beta stuff. Some people may think they are something special by snarkily keeping all their products in beta. They've got a lot to learn.

I hope this toolkit is becoming useful to many. I realize there have been a variety of requests for new things in the toolkit that I just have not gotten time to put in yet.  So you can expect plenty more in the future.

So enough with reading.  It's time to code!

This is the thirteenth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you probably have a life beyond the keyboard, but if you don't then follow the link below to find oodles more to help fill your meaningless existence.

Complete list of posts in the Building an IQueryable Provider series

It's been precisely the correct amount of time that it took for me to complete the additional goodness that is jam packed into this drop, less actual work, dinners out, dinners in, any interesting film and televisions programs, housework, trips out to the store, family game night, time reading fiction, napping on the couch and other assorted unavoidable activities.

The full source code can be found at:

http://www.codeplex.com/IQToolkit

I'll try to cover as much as I can in this post, however you'll like find other gems by scouring the source itself.

What's inside:

Updates - Insert, Update & Delete operations.

Batch processing - true SQL Server batch processing.

Server language type systems - correct parameter types.

Mapping Changes - use the same class with multiple tables, etc.

Insert, Update and Delete

It's about time that this toolkit actually got usable right out of the box.  My original intention with the series was to show how to build an IQueryable provider and that turned more and more into a fully working query engine that you could actually use to get real work done. Yet, how many real world applications only ever need to pull data out of a database and never push it back?  Not many.

So I knew I'd eventually want to add updates, because I knew that you'd eventually need to do it too. Yet, every time I started thinking about updates I always fell into the trap of thinking about full blown ORM's with object tracking, et al, and I did not really want to go there, at least not yet. As a toolkit I think its just fine to define the primitives that a more advanced system might be built out of. And there is nothing wrong with those primitives being generally useful on their own. So you should be able to use the toolkit as-is and not only get a pretty good query engine but also something that at least works as a rudimentary data access layer.

Common Primitives

So then I set about thinking about just the primitives for updating data. They should have semantics similar to the underlying SQL equivalent commands. That means they should not defer work until some time later, but execute immediately. There should be at least the familiar commands, Insert, Update and Delete; but also Upsert (both Insert & Update combined) since its so often the right thing for many situations.

Also, like other LINQ operations, update commands should be a pattern, and be available for any kind of provider. So I set out thinking about what the pattern would look like and how it might be specified.  This is what I came up with.

public interface IUpdatable : IQueryable
{
}

public interface IUpdatable<T> : IUpdatable, IQueryable<T>
{
}

public static class Updatable
{
    public static S Insert<T, S>(this IUpdatable<T> collection, T instance, Expression<Func<T, S>> resultSelector)
    public static S Update<T, S>(this IUpdatable<T> collection, T instance, Expression<Func<T, bool>> updateCheck, Expression<Func<T, S>> resultSelector)
    public static S InsertOrUpdate<T, S>(this IUpdatable<T> collection, T instance, Expression<Func<T, bool>> updateCheck, Expression<Func<T, S>> resultSelector)
    public static int Delete<T>(this IUpdatable<T> collection, T instance, Expression<Func<T, bool>> deleteCheck)
    public static int Delete<T>(this IUpdatable<T> collection, Expression<Func<T, bool>> predicate)
    public static IEnumerable<S> Batch<T,S>(this IUpdatable collection, IEnumerable<T> instances, Expression<Func<T, S>> fnOperation, int batchSize, bool stream)
... }

This pattern works just like the LINQ Enumerable and Queryable patterns.  I've declared an interface 'IUpdatable' that extends IQueryable, so anything that is updatable is also queryable, and then an Updatable class with a bunch of new extension methods that encapsulate the pattern.  (I realize the IUpdatable name may be in conflict with some other library, but until I think of something better this is what it is.)

The Insert method inserts an object instance into a collection. It's not an ordinary insert, like with List<T>. The collection is considered to be remote and inserting into it copies the data from your instance. It has an optional result-selector argument that can be a function you supply to construct a result out of the object after it has been inserted. This, of course, is intended to occur on the server and can be used to read back auto-generated state and computed expressions.

IUpdatable<Customer> customers = ...;
Customer c = ...;
customers.Insert(c);

IUpdatable<Order> orders = ...;
Order o = ...;
var id = orders.Insert(o, d => d.OrderID);

The Update method updates a corresponding object already in the collection with the values in the instance supplied. This is a complete overwrite, not a dynamic update like LINQ to SQL would have generated. I have not yet defined a piecemeal update operation, but I still can.  We'll see how it goes.  In addition to an update selector (like the one for insert) you can also specify an update-check predicate. This is an expression evaluated against the server's state and can be used to implement optimistic concurrency by basically checking to see if the server's state is still the same as you remembered it. An ORM layer built on top of this primitive might choose to generate this expression automatically, based on mapping information, but here you must specify it manually if you want to use it.

IUpdatable<Customer> customers = ...;
Customer c = ...;
var computedValue = customers.Update(c, d => d.City == originalCity, d => d.ComputedColumn);

The InsertOrUpdate is the 'Upsert' operation.  It will basically insert an object into the collection if a corresponding one does not exist, or update the one that does with the new values. You specify it just like you'd specify an update, instead you call InsertOrUpdate.

There are two flavors of Delete. The first one lets you delete the object in the collection corresponding to the instance. You can optionally specify a delete-check, which is similar to the update-check, a predicate function evaluated against the server's state. The delete will only occur if the check passes. The second flavor just lets you specify a predicate. It's basically a delete-all-where method and will delete all objects from the collection that match the predicate. So far, its the only SQL-like 'set-based' operation I've defined.

IUpdatable<Customer> customers = ...;
Customer c = ...;
customers.Delete(c, d => d.City == originalCity);

IUpdatable<Customer> customers = ...;
Customer c = ...;
customers.Delete(c => c.CustomerID == "ALFKI");

The last operation is Batch.  It will allow you to specify an operation to apply to a whole set of instances. The operation can be one of the other commands like Insert or Update.  You can use this method Insert, Update or Delete a whole bunch of objects all at the same time. If possible, the provider will use optimized batching techniques to give you extra performance.

IUpdatable<Customer> customers = ...;
Customer[] custs = new Customer[] { ... };

customers.Batch(custs, c => customers.Insert(c));

If you've got many objects to update and you want to have instance specific update-checks done, you can sneak the extra information into the batch process by combining the data together into a single collection and then piecing them apart in the operation.

IUpdatable<Customer> customers = ...;
var oldAndNew = new [] { new { Old = oldCustomer, New = newCustomer }, ...};
customers.Batch(oldAndNew, (u, x) => u.Update(x.New, d => d.City == x.Old.City));
 

Updates and DbQueryProvider

In order to make use of this new capability I'm going to need a new object to declare the IUpdatable interface.  The Query<T> class only implemented IQueryable<T>, and that was fine as long as I only ever want to query.  Now I also want to be able to update, so I need a new class to represent the root of my query that I can also update. These things in databases are called tables, so that's what I'll stick with. 

public interface IQueryableTable : IQueryable
{
    string TableID { get; }
}

public interface IQueryableTable<T> : IQueryable<T>, IQueryableTable
{
}

public class QueryableTable<T> : Query<T>, IQueryableTable<T>
{
    string id;

    public QueryableTable(IQueryProvider provider, string id)
        : base(provider)
    {
        this.id = id;
    }

    public QueryableTable(IQueryProvider provider)
        : this(provider, null)
    {
    }

    public string TableID
    {
        get { return this.id; }
    }
}

public interface IUpdatableTable : IQueryableTable, IUpdatable
{
}

public interface IUpdatableTable<T> : IQueryableTable<T>, IUpdatable<T>, IUpdatableTable
{
}

public class UpdatableTable<T> : QueryableTable<T>, IUpdatableTable<T>
{
    public UpdatableTable(IQueryProvider provider, string id)
        : base(provider, id)
    {
    }

    public UpdatableTable(IQueryProvider provider)
        : this(provider, null)
    {
    }
}

You'll note that not only did I define a UpdatableTable<T> class, which is specifically what I wanted, I also went ahead and made a QueryableTable<T>, and extra interfaces to correspond to them.  This is intentional.  Eventually, I may want to add more methods specific to tables here and I'll need a place to put them.  Right now I've only added a property 'TableID'.  You can ignore it for now, though it will get more interesting when I discuss the mapping changes.

Take a look at the Northwind class in the test source code and you'll see how I made use of my new table class.

The Plumbing

Of course, update commands work in the query provider just like queries do.  First there are a bunch of new DbExpression nodes to represent them.

public abstract class CommandExpression : DbExpression
{
}

public abstract class
CommandWithResultExpression : CommandExpression { public abstract Expression Result { get; } }
public class InsertExpression : CommandWithResultExpression { public TableExpression Table { get; } public ReadOnlyCollection<ColumnAssignment> Assignments { get; } public override Expression Result { get; } } public class ColumnAssignment { public ColumnExpression Column { get; } public Expression Expression { get; } } public class UpdateExpression : CommandWithResultExpression { public TableExpression Table { get; } public Expression Where { get; } public ReadOnlyCollection<ColumnAssignment> Assignments { get; } public override Expression Result { get; } } public class UpsertExpression : CommandWithResultExpression { public Expression Check { get; } public InsertExpression Insert { get; } public UpdateExpression Update { get; } public override Expression Result { get; } } public class DeleteExpression : CommandExpression { public TableExpression Table { get; } public Expression Where { get; } } public class BatchExpression : CommandExpression { public Expression Input { get; } public LambdaExpression Operation { get; } public Expression BatchSize { get; } public Expression Stream { get; } }

Then there's the standard visit method in DbExpressionVisitor, DbExpressionWriter, etc.  Binding them happens in the QueryBinder just like all other query operations, but the work of deciding what nodes to generate gets doled out to the QueryMapping object.  Luckily, the base QueryMapping class has a default implementation that builds the correct DbExpression node.  If you want to map a single object into multiple tables or some other crazy scheme you'll probably have to have a more advanced mapping implementation. :-)

These nodes get plumbed through the system until they are encountered by the ExecutionBuilder and are formatted using the QueryLanguage rules. The TSQL formatter converts the nodes into corresponding TSQL text.  Depending on the contents of the command expression, the generated SQL may have one or more actual TSQL operations.

Batch Processing

ADO.Net has this nice feature built into its SqlClient API; the ability to get high-performance batch processing. Yet, the only way to get at it is through use of DataSet's or DataReaders. As far as I'm concerned this is rather low level and a bit complicated to use if you are starting out with domain objects and not DataSets. Your data access layer should do this for you. Yet, in order for it to do it, the abstraction for batch processing has to exist, which is why I added it to the updatable pattern.  After that it was a cinch. :-)  Not really. 

What I needed fundamentally was something that would execute the same database command over and over again with different sets of parameters. This is basically what TSQL batching does as it is sent over the wire. So I needed to add this abstraction to DbQueryProvider. Yet, since only SqlClient supports this actually behavior I'd need a fall back plan. So DbQueryProvider implements a method to do batch processing, but it does not actually do it optimally. 

public virtual IEnumerable<int> ExecuteBatch(QueryCommand query, IEnumerable<object[]> paramSets, int batchSize, bool stream)
{
    var result = this.ExecuteBatch(query, paramSets);
    if (!stream)
    {
        return result.ToList();
    }
    else
    {
        return new EnumerateOnce<int>(result);
    }
}

private IEnumerable<int> ExecuteBatch(QueryCommand query, IEnumerable<object[]> paramSets)
{
    this.LogCommand(query, null);
    DbCommand cmd = this.GetCommand(query, null);
    foreach (var paramValues in paramSets)
    {
        this.LogMessage("");
        this.LogParameters(query, paramValues);
        this.SetParameterValues(cmd, paramValues);
        int result = cmd.ExecuteNonQuery();
        yield return result;
    }
}

public virtual IEnumerable<T> ExecuteBatch<T>(QueryCommand query, IEnumerable<object[]> paramSets, Func<DbDataReader, T> fnProjector, int batchSize, bool stream)
{
    var result = this.ExecuteBatch(query, paramSets, fnProjector);
    if (!stream)
    {
        return result.ToList();
    }
    else
    {
        return new EnumerateOnce<T>(result);
    }
}

private IEnumerable<T> ExecuteBatch<T>(QueryCommand query, IEnumerable<object[]> paramSets, Func<DbDataReader, T> fnProjector)
{
    this.LogCommand(query, null);
    DbCommand cmd = this.GetCommand(query, null);
    cmd.Prepare();
    foreach (var paramValues in paramSets)
    {
        this.LogMessage("");
        this.LogParameters(query, paramValues);
        this.SetParameterValues(cmd, paramValues);
        var reader = cmd.ExecuteReader();
        if (reader.HasRows)
        {
            reader.Read();
            yield return fnProjector(reader);
        }
        else
        {
            yield return default(T);
        }
        reader.Close();
    }
}

What I have here are four methods, two of which are just private implementations, but two others that are virtual so can be overridden. Batching can work via streaming or not. If streamed the results of each execution (or batch) is yielded out. This works great if the number of individual items is large, but takes a lot of discipline to remember to actually inspect the results or nothing gets executed at all!  By requesting no streaming (stream == false) the execution occurs immediately and the results are packaged into a list that you can conveniently ignore if you so choose. That's why the implementation is separated out, so that the enumerable can be captured and converted to a list, enabling either behavior.  The two types of ExecuteBatch differ in whether a result is computed via information coming back from the server or not.

Now that these are defined, I can implement a new kind of provider, a SqlClient specific version that makes automatic use of optimized batching.

public class SqlQueryProvider : DbQueryProvider
{
    ...
public override IEnumerable<int> ExecuteBatch(QueryCommand query, IEnumerable<object[]> paramSets, int batchSize, bool stream) { var result = this.ExecuteBatch(query, paramSets, batchSize); if (!stream) { return result.ToList(); } else { return new EnumerateOnce<int>(result); } } private IEnumerable<int> ExecuteBatch(QueryCommand query, IEnumerable<object[]> paramSets, int batchSize) { SqlCommand cmd = (SqlCommand)this.GetCommand(query, null); DataTable dataTable = new DataTable(); for (int i = 0, n = query.Parameters.Count; i < n; i++) { var qp = query.Parameters[i]; cmd.Parameters[i].SourceColumn = qp.Name; dataTable.Columns.Add(qp.Name, qp.Type); } SqlDataAdapter dataAdapter = new SqlDataAdapter(); dataAdapter.InsertCommand = cmd; dataAdapter.InsertCommand.UpdatedRowSource = UpdateRowSource.None; dataAdapter.UpdateBatchSize = batchSize; this.LogMessage("-- Start SQL Batching --"); this.LogCommand(query, null); IEnumerator<object[]> en = paramSets.GetEnumerator(); using (en) { bool hasNext = true; while (hasNext) { int count = 0; for (; count < dataAdapter.UpdateBatchSize && (hasNext = en.MoveNext()); count++) { var paramValues = en.Current; dataTable.Rows.Add(paramValues); this.LogMessage(""); this.LogParameters(query, paramValues); } if (count > 0) { int n = dataAdapter.Update(dataTable); for (int i = 0; i < count; i++) { yield return (i < n) ? 1 : 0; } dataTable.Rows.Clear(); } } } this.LogMessage(string.Format("-- End SQL Batching --")); } }

Note that I only have an implementation for the variation of ExecuteBatch that computes no user specified result. This is due to there being no back-channel available when using SqlClient batching.

The implementation uses DataTable's and DataAdapters to make this work.  A DataTable is created filled with the parameters necessary for executing the command.  The DataAdapter is used to invoke the batch using the Update method.  Of course, this doesn't actually have to be an update command. I can also use this to batch inserts and deletes too, or really any TSQL that I want to execute as long as I don't expected to get a bunch of data back.

Server Language Types

One thing that has always bugged me about SQL Server was the need to get the command parameters right. If I declare the parameter to be the wrong text flavor I can cause serious performance issues for the query. So just setting the parameter values and having the SqlCommand object guess at the right SqlType encoding is really not a good plan. Fortunately, it is often possible to figure out the correct parameter types to use if the information is available via the mapping. Parameters are often never just sent to the server for no reason. If I use a parameter in a query I'm usually comparing it against a column.  In most cases I can simply infer that the parameter should have the same server type as the column.

So I've gone ahead and defined a new server type system abstraction and threaded it into some of the DbExpressions and make use of it in some of the visitors. 

public abstract class QueryType
{
    public abstract DbType DbType { get; }
    public abstract bool NotNull { get; }
    public abstract int Length { get; }
    public abstract short Precision { get; }
    public abstract short Scale { get; }
}

public abstract class QueryTypeSystem 
{
    public abstract QueryType Parse(string typeDeclaration);
    public abstract QueryType GetColumnType(Type type);        
}

A QueryType encodes the typical database scalar type. It has a few properties that you can use to inspect common attributes of a server type, but most code won't really care to know the details, it will just pass the information along until it ends up where I need it.  A QueryTypeSystem is basically a factory for producing QueryType's.  The Parse method will construct a language-specific QueryType out of some text encoding.  This is typically the server language syntax for declaring a column of that particular type, like 'VARCHAR(10)'. 

A QueryTypeSystem is specific to a language, so QueryLanguage is where you go to get one.

public abstract class QueryLanguage
{
    public abstract QueryTypeSystem TypeSystem { get; }
}

One place I definitely know where to encode server types is in the ColumnExpression.  If a ColumnExpression knows what its server type is, then when I get to the point of comparing parameters to columns I know which server type is in play.

public class ColumnExpression : DbExpression, IEquatable<ColumnExpression>
{
    public QueryType QueryType { get; }
}

I've also stuck it into NamedValueExpression, because that's the type I'm using for parameters.

public class NamedValueExpression : DbExpression
{
    public QueryType QueryType { get; }
}

And I've basically modified Parameterizer, so that if a column and parameter (named-value expression) are ever paired together in any binary expression, I'll infer the parameter to have the same server type as the column.

public class Parameterizer : DbExpressionVisitor
{
    protected override Expression VisitBinary(BinaryExpression b)
    {
        Expression left = this.Visit(b.Left);
        Expression right = this.Visit(b.Right);
        if (left.NodeType == (ExpressionType)DbExpressionType.NamedValue
         && right.NodeType == (ExpressionType)DbExpressionType.Column)
        {
            NamedValueExpression nv = (NamedValueExpression)left;
            ColumnExpression c = (ColumnExpression)right;
            left = new NamedValueExpression(nv.Name, c.QueryType, nv.Value);
        }
        else if (b.Right.NodeType == (ExpressionType)DbExpressionType.NamedValue
         && b.Left.NodeType == (ExpressionType)DbExpressionType.Column)
        {
            NamedValueExpression nv = (NamedValueExpression)right;
            ColumnExpression c = (ColumnExpression)left;
            right = new NamedValueExpression(nv.Name, c.QueryType, nv.Value);
        }
        return this.UpdateBinary(b, left, right, b.Conversion, b.IsLiftedToNull, b.Method);
    }

    protected override ColumnAssignment VisitColumnAssignment(ColumnAssignment ca)
    {
        ca = base.VisitColumnAssignment(ca);
        Expression expression = ca.Expression;
        NamedValueExpression nv = expression as NamedValueExpression;
        if (nv != null)
        {
            expression = new NamedValueExpression(nv.Name, ca.Column.QueryType, nv.Value);
        }
        return this.UpdateColumnAssignment(ca, ca.Column, expression);
    }
}

Of course, the same goes for ColumnAssignment used by Insert and Update commands. You'll notice that I'm not having these types flow throughout the expression tree like normal types.  I could probably get more edge cases correct if I did, but for now this handles most of the cases. 

The GetCommand method in  DbQueryProvider will now make use of this info when constructing parameters. The SqlQueryProvider expects to see a new TSqlType that's made available by a new TSqlTypeSystem found on the TSqlLanguage object.  :-)

public class SqlQueryProvider : DbQueryProvider
{
    protected override DbCommand GetCommand(QueryCommand query, object[] paramValues)
    {
        // create command object (and fill in parameters)
        SqlCommand cmd = new SqlCommand(query.CommandText, (SqlConnection)this.Connection);
        for (int i = 0, n = query.Parameters.Count; i < n; i++)
        {
            QueryParameter qp = query.Parameters[i];
            TSqlType sqlType = (TSqlType)qp.QueryType;
            if (sqlType == null)
                sqlType = (TSqlType)this.Language.TypeSystem.GetColumnType(qp.Type);
            var p = cmd.Parameters.Add("@" + qp.Name, sqlType.SqlDbType, sqlType.Length);
            if (sqlType.Precision != 0)
                p.Precision = (byte)sqlType.Precision;
            if (sqlType.Scale != 0)
                p.Scale = (byte)sqlType.Scale;
            if (paramValues != null)
            {
                p.Value = paramValues[i] ?? DBNull.Value;
            }
        }
        return cmd;
    }
}

Mapping Changes

I've made some changes to the mapping system (or QueryMapping to be particular.)  I came across a variety of odd behavior while developing update logic that basically boiled down to the ImplictMapping object not being able to tell the difference between a type that was intended to correspond to a database table and others that appeared there just for the sake of representation, like LINQ anonymous types.  Some other mapping implementations might be able to tell the difference, but the simplest one couldn't so I needed to find another solution.

Obviously, everything that is an entity in a query comes from somewhere, and that's either from one of the roots of the query (a table) or via a relationship property. It was a mistake to think otherwise (or not think about it at all.) What I needed was a more explicit representation in the expression tree of what was an entity and what was not.  I figured I could either add annotations to the tree in every node, or find some nominal solution that would do the trick. 

I chose to make a new expression node, EntityExpression, which I use as a wrapper around any expression that would be normally constructing an entitiy.  This node is placed into the system when the QueryMapping first creates the sub-express for constructing an entity or relationship, at the time I actually know that I'm dealing with an entity and in particular which entity it is.

public class EntityExpression : DbExpression
{
    public MappingEntity Entity { get ; }
    public Expression Expression { get; }
}

I've also introduced a new abstraction called MappingEntity.  This how I let the QueryMapping object place a bread-crumb into the expression tree so it can be reminded which exact entity was being referred to.

public class MappingEntity
{
    public string TableID { get; }
    public Type Type { get; }
}

It' really just as simple little class that minimally remembers the correspondence between the runtime type and the table its being mapped to.  If you've been paying attention you'll realize that this 'TableID' is the same property that was added to the IQueryableTable interface.  That's how the query engine gets the table-id in the first place, right from the start of the query.  Of course, the IQueryableTable<T> interface also knows the runtime type, that's the 'T' part.  So your table's have all the information needed to make a MappingEntity.  Except that job is deferred to the QueryMapping object so it can do whatever bookkeeping it wants.

public abstract class QueryMapping
{
public virtual MappingEntity GetEntity(Type type); public virtual MappingEntity GetEntity(Type type, string tableID); }

You'll also notice that most of the other methods on QueryMapping are now modified to take a MappingEntity as an argument.

public abstract class QueryMapping
{
    public virtual bool IsMapped(MappingEntity entity, MemberInfo member)
    public virtual bool IsColumn(MappingEntity entity, MemberInfo member)
    public virtual bool IsIdentity(MappingEntity entity, MemberInfo member)
    public virtual bool IsComputed(MappingEntity entity, MemberInfo member)
    public virtual bool IsGenerated(MappingEntity entity, MemberInfo member)
    public virtual bool IsRelationship(MappingEntity entity, MemberInfo member)
    public virtual MappingEntity GetRelatedEntity(MappingEntity entity, MemberInfo member)
    public virtual bool IsAssociationRelationship(MappingEntity entity, MemberInfo member)
... }

Now, what you're probably saying is "Gee, Matt, that's looks a bit ominous. Why aren't these all just methods on MappingEntity now?"  And you'd be right. They probably should be.  Yet, it makes it a lot more difficult to subclass the mapping object and merely override some of the behavior.  Not a big deal for complete mapping sub-systems that someday might exists, but painful for simple uses such as overriding the ImplicitMapping with a few additional rules.  So I'm leaving it as-is for now until I can think about it more.  Any thoughts from the peanut-gallery?

Also, its worth nothing, that given this new arrangement, with explicit entity information in the query tree and connecting each entity info back to the table it originated from, it is now possible to support mapping systems that allow individual runtime types to be mapped to more than one table.  So you can have your cake and eat it too! 

That's All Folks!

At least for today.  The future may hold more goodies.  Any suggestions are welcome, either as ideas or source-code contributions.

Remember, you can get the current sources at CodePlex:  http://www.codeplex.com/IQToolkit

The LINQ IQueryable Toolkit is now a CodePlex project. 

http://www.codeplex.com/IQToolkit 

Going forward this will the be official site to find the latest greatest source bits.  I'll continue to post here about the toolkit, how to use it and to show off new features in this blog, but you'll have to follow a link to download.

You can also start discussions over how utterly fabulous the source code is and how you can't wait for the next drop. 

It's up to you.

This is the twelfth in a series of posts on how to build a LINQ IQueryable provider. If you have not read the previous posts you probably were born yesterday. How could you possibly make sense of this post without any context at all?  At least make an attempt. Sometimes I don't know why I bother.

Complete list of posts in the Building an IQueryable Provider series 

<Insert standard disclaimer for why there have been no updates on this for 'like' forever>

It's been so long since I last posted we must be up to Web 5.0 by now. I suspect there are not any actual web developers left. Its all just twitter application programming and cybernetic mind melds now. Does anybody still write code? In a text based language? Without an electric shunt wired directly into your cerebral cortex?  Nobody?

Sigh. I'm probably just talking to an empty internet; everyone's moved on to Planck Space. But since I've got nothing better to do, I might as well get on with it.

IQueryable Toolkit

One of the first things you'll notice when you take a look at the source is that I changed it quite a bit.  I moved code around, changed names gratuitously, added & removed classes and broke a lot of continuity with the prior versions. One of the biggest changes is that the code is no longer just a sample.  All those internal classes are public, the project builds as a DLL, the tests are hosted separately and the namespace is no longer 'Sample.'  Why, oh why would I do this?  Simple.

I originally started the project to give developers a structured hands-on walk through of the construction of an IQueryable provider; hoping to inspire many of you to build your own. (Many of you have.) The source code was made available so you could learn from it and as a cheap way to get you to debug it for me. (Many of you haven't.) And so you could take from it what you would, copy a class here, steal a method there, etc, to make your job easier to get your own LINQ to XYZ up off the  ground quicker.

Yet, I received so many requests to re-use the entire code base wholesale, that I now realize what you really want is a toolkit not just some sample code. As a toolkit it is still sample code. The sources are still there and you can take from it what you will.  Or you can build it into a shiny DLL and then build your code on top of it.  All the pieces I've built so far are publicly re-usable. Many of the classes have been enhanced for extensibility, so you can mix & match and override to your hearts content to build the system you want out of the pieces I already have.

How it breaks down

At the top there is a namespace 'IQ'.  I thought that it was cleaver & consise. Maybe someone out there will come up with a better one, something catchy, like a cold, and I'll feel compelled to catch it, fall under-the-weather for a few weeks and emerge blissfully sedated enough to go with it. Maybe. Under the 'IQ' namespace exists all the bits and pieces that are generally useful for any IQueryable impementation.  This is where you'll find the basic ExpressionVisitor class, the generic Query and base QueryProvider class.  You'll also get these:

ExpressionComparer - compares two expression trees for equality. immensely useful.

ExpressionWriter - Get a better translation of that narly expression tree into a  C#-ish syntax that you can actually read.  Helps debugging a lot.

Grouping - It's that dead simple implementation of the LINQ IGrouping interface. 

CompoundKey - A class that helps you represent compound key values.

ScopedDictionary - 'cuz I needed it and now you can have it too.  Works sort of like a Dictionary but with nested scopes.

TypeHelper - Used to be called TypeSystem (how pretentious of me).  Helps you know a few things about types.

DeferredList - An implementation of IList that enables deferred loading.  (Bells are probably ringing in your head about now.)

Nested inside this namespace is another one, called simply 'Data'  (I know, original, huh). In here you'll find all those other classes that made up most of my previous posts like the provider itself, DbQueryProvider, that works on top of any DbConnection.  To really understand what has gone on in my head you need to get into this class and look around.

Logically, it still does the same thing.  It primarily just implements the IQueryable.Execute method, converting query expressions into SQL commands, executing them using the DbConnection and translating the results into objects. That's all still there, its just been re-arranged a bit and made a lot more extensible.

How it is extended is the interesting bit that flavors everything else to come. The DbQueryProvider's brains are now fully pluggable. I took a look at what was going on during execution and broke it down into logically atomic steps. The prior version used to first translate the query, build a generic reader and then execute and return.  That's still sort-of what happens, but there's an even better break-down.

Queries are translated, but the translation goes through three distinct phases: 

1) mapping is applied

2) policy is enforced

3) the target query language has the last say. 

Each of these steps is pluggable.  How?  There are three new classes to get acquainted to.

QueryMapping - Defines information & rules to map an object model onto a database model

QueryPolicy - Defines information & rules to determine inclusion of related data & execution plans

QueryLanguage - Defines information & rules to adhere to a target language, including converting the query into text

Every DbQueryProvider is now supplied these three at runtime, each can be overriden by you to take control at different points in then translation and execution process. The mapping, policy & language can each override a portion of the query translation pipeline, injecting its own rules as rewrites of the query expression using the common DbExpression primitives provided.

In addition, the QueryLanguage controls the final translation of the tree to SQL text (or whatever target language is being used), and the QueryPolicy is invoked to build the execution plan.  Note, the execution plan is not the simple object-reader of yore.  It is now an entire runtime generated program for completely executing the database query & constructing objects out of the results.  The QueryPolicy is allowed to do whatever final rewrite is necessary to turn the translated query expression into an executable piece of code.

Of course, by default, these three mostly do what was being done before, and if you care to look you'll see that the QueryLanguage's Format method (for turning queries into text) simply defers to the TSqlFormatter class (which used to be called QueryFormatter.)  You can overidde this method and have it call your own formatter.  You can even override the TSqlFormatter and make a few minor changes if your back end SQL is largely the same.

If you want to implement a different strategy to retrieving hierarchical results, you can inject your own logic both during query translation and you can take over the entire process of building the execution plan.  The default BuildExecutionPlan on the QueryPolicy class defers to the ExecutionBuilder (used to be ProjectionBuilder in its former life), but you can change that and do it your own way.  Of course, all the source is available, so you can build your version based off of the one that exists in the toolkit.

The crazy logic for inferring mapping information by simply using the names of types and members is retained, but walled off in a specific implementation of mapping called the ImplicitMapping.  You can re-use this one if your needs are as meager as my demos.  Or you can build your own.

Of course, now that you've had a moment to think about it, given so much flexibility, you'll inevitably start asking about what database providers are supported.  (Still just TSQL.)  You'll also want to know what complex mapping models I've invented and how they compare to ones used by ORM's.  (Still just the implicit demo one.)  And then you'll want to know what I did to improve reading data hierarchies because tha