The official source of information on Managed Providers, DataSet & Entity Framework from Microsoft
The information in this post is out of date.
Visit msdn.com/data/ef for the latest information on current and past releases of EF.
One of the things that we are continuously working on improving is the quality and the readability of the SQL generated when querying using the Entity Framework. We have already made some improvements in .NET 4.0 Beta 1, and we are working on more for .NET 4.0 post Beta 1.
Most of the improvements we have made are in the Entity Framework query pipeline as opposed to specific SqlClient changes. Because the changes were in the query pipeline, this results in simpler output command trees and thus would affect the SQL generated by any backend, not only by our SQL Server provider.
Below we’ve highlighted the biggest improvements available in Beta 1 and then the ones that will be in the next public release of .NET Framework 4.0. Unless otherwise noted, the examples are based on the NorthwindEF Entity Data Model provided with the ADO.NET Entity Framework Query Samples (http://code.msdn.microsoft.com/EFQuerySamples). Also, in all the examples below, the parts of the SQL query highlighted in red have been removed and those in yellow have been added.
In some nested queries in the generated SQL, we used to explicitly project all columns that are brought in scope by the corresponding FROM clause, like the fragments marked red in the sample generated SQL below. Now, we instead only project the columns that are later referenced. This improvement is specific to our T-Sql Generation in SqlClient.
Example:
query = (
p in context.Products
p.ProductName
p.ProductName) .Skip(2).Take(2);
When translating joins in LINQ to Entities, we check whether the columns on which the join is specified are equal or both are null. In the cases when one (or both) of the columns is non-nullable, the “IS NULL” check is redundant and has been removed. This improvement is in the Entity Framework query pipeline.
query = context.Products.Join(context.Categories, p => p.ProductID,c => c.CategoryID,(p, c) =>
{ p.ProductID, c.CategoryID, p.ProductName, c.CategoryName });
In scenarios where a given table is explicitly or implicitly (e.g. via a navigation property) joined to a parent table (i.e. table to which it has a foreign key constraint) we were not eliminating the parent table even if the only columns referenced from that table were the columns comprising the primary key and could have been reused from the child table. (Note: The parent-child terminology assumes that there is foreign key relationship between these tables specified in the SSDL). This improvement is in the Entity Framework query pipeline.
Note: For this example the highlighted following fragments need to be added to the SSDL in the ADO.NET Entity Framework Query Samples noted above:
string entitySQL = "SELECT p.Category.CategoryId FROM Products as p";
For LINQ to Entities in .NET Framework 3.5 SP1 the equivalent query resulted in the same generated query for Entity SQL shown in the After sample.
Comparing a column to multiple values, either as a result of the explicit use of Entity SQL’s IN expression or as result of internally generated checks over a type discriminator value used to result in nested OR expressions in the generated SQL. Now, we are instead producing an IN expression. This improvement is specific to our T-Sql Generation in SqlClient.
For example:
string entitySQL = "SELECT p FROM Products as p where p.ProductId in {1, 2, 3, 4, 5}";
query = context.Products.Where(p =>
[] { 1, 2, 3, 4, 5 }.Contains(p.ProductID));
The resulting generated TSQL also takes advantage of this improvement:
LINQ’s GroupBy operator is richer then SQL’s group by clause. For example, in addition to aggregates over a group it can return the entire group. When translating LINQ’s GroupBy, we try to recognize if it can be expressed by SQL’s group by (i.e. DbGroupByExpression without a group partition), otherwise we translate it into a more complex expression involving a join to the input table.
In Entity Framework 4.0 Beta1 we have expanded the cases that we are able to translate into SQL’s GroupBy. This improvement is in the Entity Framework query pipeline.
query = context.Products.Where(p => p.ProductID < 4) .GroupBy(p => p.ProductID, p => p.ProductID, (key, group) => new { Key = key, Max = group.Max() });
In cases when we need an internally generated constant, like when translating EntitySQL Exists expression or IEnumerable.Count in LINQ to Entities, we previously used “true”, which translates to “cast(1 as bit)” on SQL Server. We now instead use the integer constant 1, which translates to “1”. This improvement is in the Entity Framework query pipeline.
query = context.Categories.Select(c => new { c.CategoryID, count = c.Products.Count() });
In some cases when a query is only interested in entities of a particular subtype (specified via OfType or IsOf for example) we used to query using a view over the base type and thus possibly generate some unnecessary case statements to also check for the types that are not of the desired subtype. This improvement allows us to use the simplified query view that is only over the desired subtype and thereby generate a simpler provider query.
The following example that illustrates that is over a schema with a TPH mapping with the inheritance hierarchy as shown in the figure. This improvement is in the Entity Framework query pipeline.
query = this.tph.Entities.OfType<Tph.
>().GroupBy(d2 => d2).Select(p => p.Max(g => g.Id));
Here is a quick overview of the improvements that we are working on for this release but are not included in Beta1:
Kati IcevaSoftware Development Engineer, Entity Framework