The official source of information on Managed Providers, DataSet & Entity Framework from Microsoft
There’s been a lot of interest in the ADO.NET CTP that we posted a few days back, and lots of questions about the internal details. I wanted to briefly describe how we handle queries in the ADO.NET CTP – and hopefully address some of your questions in this area.
The Lay of the Land
The ADO.NET VNext stack provides for two broad programming surfaces – the Object Services layer and the Map Provider layer – in addition to the existing provider programming surface. The Map Provider is a regular .NET data provider that accepts connections, commands, and returns DataReaders. The Map Provider allows you to describe your store data in terms of a conceptual model (the EDM) and allows you to specify client-side views over your store tables. The query language for the Map Provider is Entity SQL (eSQL).
The Object Services layer provides a strongly typed object abstraction, and allows you to program and reason in terms of programming language objects. The Object Services layer provides you a cache of objects, allows you to perform CRUD operations on these objects, allows you to query your data via LINQ or eSQL. The Object Services layer is stacked on top of the Map Provider, and gives you rich objects on top of the client views described by the conceptual model over store tables.
How do I express queries?
At the Object Services layer, you have a choice of expressing queries either via LINQ or via eSQL. At the Map Provider, you need to express queries in terms of eSQL. We may also allow – in a future version – queries to be specified in terms of canonical query trees (CQTs).
Ok, what are Canonical Query Trees (CQTs)?
We use Canonical Query Trees (CQTs) as a declarative representation for queries in ADO.NET. We use this tree representation for multiple purposes – as a common representation for all queries (regardless of how they were expressed); as a provider-neutral mechanism of communicating queries to stores (since every store has its own dialect of SQL); and in the future, as a means of programmatically constructing queries, especially in tooling and query-editing scenarios. More details on CQTs in a follow-on post
The Query Flow
A user query is always converted into a CQT. If the query is expressed in eSQL, the eSQL parser is invoked, the query is validated against the conceptual model, and then converted into a CQT. If the query is expressed in LINQ terms, the LINQ expression tree is converted into a CQT. All further processing of the query happens on the CQT.
We first “expand” (unfold) all references to client views (aka Conceptual Entity Sets) in the query. The client views themselves are simply queries against one or more Store-level Entity Sets (aka tables) and, even possibly other client views; and the view expansion process is identical to what databases do with database views. These views are generated by a mapping compiler (a subject for another blog post – but suffice it to say the mapping compiler is very powerful, and handles a number of different mappings in a very clean fashion) based on a user-supplied mapping specification. The query now targets store tables, but may have references to one or more conceptual types (entity types, complex types etc.).
The query is then run through a number of transformations to produce one or more queries that can be executed at the store. Two of the most important pieces here are the elimination of EDM type constructs (entity types, row types etc.) and the elimination of nesting operations from the query – these require some significant tweaking of the original queries – if you see some funny constants (‘0X1X2X’ etc.) in the generated SQL queries, this is where they come from. We need these pieces of information – as well as some information that we maintain outside the query - to tell us how to reassemble results later. At the end of this stage, we have queries that are entirely in terms of what the store providers support (ie) flat relational queries. We perform a few simplification operations on the query to produce “better” and more compact SQL. In the bits that you’re seeing now, some of this may still be a bit raw, but we’re working on them, and by RTM, we’ll be much better in terms of the queries we generate.
We now have one or more queries (in the form of CQTs) that can be executed by the underlying store provider. Each of these is then sent to the provider to convert into its native SQL dialect (more precisely, the provider provides a mechanism to produce a native DbCommand object from the query tree). We had a ProviderFest earlier this month, where we had provider writers for a number of databases come in to Redmond for a couple of days, and work with us on how to produce their native dialects of SQL from the CQTs – and all of them were able to get their apps up and running very quickly against their respective databases via ADO.NET. We anticipate that by RTM, many databases would have upgraded their providers to plug in easily into ADO.NET VNext.
The queries are then executed, and the results from the different readers are then reassembled (using information that’s been gathered earlier) to produce a single data reader. If the query originated at the Object Services layer, then there’s one additional step. The records from the DataReader are transformed into objects – we call this object materialization – based on the mapping information maintained by our metadata system.
Note that query execution happens entirely at the store. The client side performs result assembly (shaping operations, really), and object materialization; but the real guts of query execution is entirely in the store.
So why eSQL?
SQL has been around for a number of years, and there’s a vast number of folks that are familiar with it. SQL’s great for relational systems, but starts getting a little hairier for more advanced concepts. Large parts of SQL are still interesting though; so we designed Entity SQL (aka eSQL) to be an extension of SQL – very SQL-like in most cases, but more appropriate for the conceptual model (EDM) we’re building. We borrowed heavily from SQL-99, OQL, and earlier internal Microsoft incarnations like WinQL. You’ll see most of your friendly SQL constructs (select, from, where), some new EDM-specific constructs (type constructors, multiset constructors, relationship navigation etc.). More eSQL details in a follow-on post.
I mentioned earlier that every database has a slightly different variant of SQL that it supports. On similar lines, every database supports a slightly different set of builtin functions, and a different set of builtin types. We tried hard to keep the eSQL language agnostic about the builtin functions supported by a database. In fact, eSQL has no builtin functions – it has a very small number of builtin operators (things like +, -, <, >, = etc.). Instead, we rely on a Provider Manifest (I’ll talk about this in a follow-on post) to define the set of builtin functions and builtin types supported by the provider, and these are automatically available to you when you write eSQL queries. We also have the notion of a small set of canonical functions (I’ll describe this in a later post as well) that every provider is expected to support; and you can use these to write provider-independent queries.
When do I use LINQ? When do I use eSQL?
If you’re using the Map Provider, you will need to use eSQL. If you’re using the Object Services layer (Query<T>), you have a choice of using eSQL or LINQ. In a future release, we may also allow queries to be specified in terms of (publicly constructible) CQTs.
Some queries are easier to express in eSQL than in LINQ; some queries are easier to express in LINQ. LINQ queries give you stronger compile-time checking and IntelliSense. On the other hand, eSQL may be more appropriate for dynamic queries. Some constructs of the EDM, and some provider-specific constructs can only be exercised via eSQL. We expect to add a lot more capabilities to eSQL as part of future releases – including query-based DMLs, analytic functions, better aggregation (cube/rollup) support, native span support etc. At the end of the day, however, use the approach that is most suitable for your application. As I described earlier, both eSQL and LINQ queries are first converted into a common tree representation (the CQT) - from that point on, all processing is identical. You can mix eSQL and LINQ queries in your application logic without any trouble.
I hope that gave you all a somewhat reasonable understanding of the way we handle queries in ADO.NET. We’d love to hear what you think. What are we doing right? What parts can we do better? Where are we just plain wrong ?
We also want to get your thoughts on some of these questions. When would you use eSQL, and when would you use LINQ? What kind of constructs do you think are missing from eSQL? Would you like to program against CQTs directly (ie) would you want to build queries directly in terms of CQTs?
Things I will talk about in subsequent posts
· Provider Manifests
· More details on the query pipeline