Improvements to the Generated SQL in .NET 4.0 Beta1

Published 05 August 09 09:31 PM | dpblogs 

One of the things that we are continuously working on improving is the quality and the readability of the SQL generated when querying using the Entity Framework. We have already made some improvements in .NET 4.0 Beta 1, and we are working on more for .NET 4.0 post Beta 1.

Most of the improvements we have made are in the Entity Framework query pipeline as opposed to specific SqlClient changes. Because the changes were in the query pipeline, this results in simpler output command trees and thus would affect the SQL generated by any backend, not only by our SQL Server provider.

Below we’ve highlighted the biggest improvements available in Beta 1 and then the ones that will be in the next public release of .NET Framework 4.0. Unless otherwise noted, the examples are based on the NorthwindEF Entity Data Model provided with the ADO.NET Entity Framework Query Samples (http://code.msdn.microsoft.com/EFQuerySamples). Also, in all the examples below, the parts of the SQL query highlighted in red have been removed and those in yellow have been added.

1. Avoid projecting all columns in nested queries

In some nested queries in the generated SQL, we used to explicitly project all columns that are brought in scope by the corresponding FROM clause, like the fragments marked red in the sample generated SQL below. Now, we instead only project the columns that are later referenced. This improvement is specific to our T-Sql Generation in SqlClient.

Example:

var query = (from p in context.Products
orderby p.ProductName
select p.ProductName)
.Skip(2).Take(2);

blog1

2. Avoiding Unnecessary IS NULL Checks in LINQ to Entities Queries

When translating joins in LINQ to Entities, we check whether the columns on which the join is specified are equal or both are null. In the cases when one (or both) of the columns is non-nullable, the “IS NULL” check is redundant and has been removed. This improvement is in the Entity Framework query pipeline.

Example:

var query = context.Products.Join(
context.Categories,
p => p.ProductID,
c => c.CategoryID,
(p, c) => new { p.ProductID, c.CategoryID, p.ProductName, c.CategoryName });

blog1

3. Additional join elimination:  Eliminate parent table in “Child Left Outer Join Parent”

In scenarios where a given table is explicitly or implicitly (e.g. via a navigation property) joined to a parent table (i.e. table to which it has a foreign key constraint) we were not eliminating the parent table even if the only columns referenced from that table were the columns comprising the primary key and could have been reused from the child table. (Note: The parent-child terminology assumes that there is foreign key relationship between these tables specified in the SSDL). This improvement is in the Entity Framework query pipeline.

Example:

Note: For this example the highlighted following fragments need to be added to the SSDL in the ADO.NET Entity Framework Query Samples noted above:

blog1

string entitySQL = "SELECT p.Category.CategoryId FROM Products as p";
blog1

For LINQ to Entities in .NET Framework 3.5 SP1 the equivalent query resulted in the same generated query for Entity SQL shown in the After sample.

4. Using IN instead of nested OR’s

Comparing a column to multiple values, either as a result of the explicit use of Entity SQL’s IN expression or as result of internally generated checks over a type discriminator value used to result in nested OR expressions in the generated SQL. Now, we are instead producing an IN expression. This improvement is specific to our T-Sql Generation in SqlClient.

For example:

string entitySQL = "SELECT p FROM Products as p where p.ProductId in {1, 2, 3, 4, 5}";
blog1

Starting with.NET 4.0 Beta 1 Entity Framework, a LINQ version of this query is also supported:
 
var query = context.Products.Where(p => new int[] { 1, 2, 3, 4, 5 }.Contains(p.ProductID));    

The resulting generated TSQL also takes advantage of this improvement:

blog1

5. Translating more of LINQ’s GroupBy operator into GROUP BY

LINQ’s GroupBy operator is richer then SQL’s group by clause. For example, in addition to aggregates over a group it can return the entire group. When translating LINQ’s GroupBy, we try to recognize if it can be expressed by SQL’s group by (i.e. DbGroupByExpression without a group partition), otherwise we translate it into a more complex expression involving a join to the input table.

In Entity Framework 4.0 Beta1 we have expanded the cases that we are able to translate into SQL’s GroupBy. This improvement is in the Entity Framework query pipeline.

Example:

var query = context.Products.Where(p => p.ProductID < 4)
.GroupBy(p => p.ProductID, p => p.ProductID, (key, group) =>
new { Key = key, Max = group.Max() });

blog1

6. Avoiding “Cast (1 as bit)”, using “1” instead

In cases when we need an internally generated constant, like when translating EntitySQL Exists expression or IEnumerable.Count in LINQ to Entities, we previously used “true”, which translates to “cast(1 as bit)” on SQL Server. We now instead use the integer constant 1, which translates to “1”. This improvement is in the Entity Framework query pipeline.

var query = context.Categories.Select(c => new { c.CategoryID, count = c.Products.Count() });
blog1

7. Simplifying some queries that go against a subtype in a type hierarchy

In some cases when a query is only interested in entities of a particular subtype (specified via OfType or IsOf for example) we used to query using a view over the base type and thus possibly generate some unnecessary case statements to also check for the types that are not of the desired subtype. This improvement allows us to use the simplified query view that is only over the desired subtype and thereby generate a simpler provider query.

The following example that illustrates that is over a schema with a TPH mapping with the inheritance hierarchy as shown in the figure. This improvement is in the Entity Framework query pipeline.

 

 

image

var query = this.tph.Entities.OfType<Tph.Derived1>().GroupBy(d2 => d2).Select(p => p.Max(g => g.Id));
blog1

 

blog1

Improvements in .NET 4.0 post Beta1

Here is a quick overview of the improvements that we are working on for this release but are not included in Beta1:

  • Elimination of null sentinels – In the output query there is often the constant 1 projected both in nested and in top-most queries. By avoiding it and reusing other columns for the same purpose in many cases we are able to avoid levels of nesting.
  • Additional Join Elimination
  • Use of Inner Joins instead of Left Outer Joins when possible
  • Provide mechanism for efficient queries on non-Unicode columns – We now generate non-unicode constants and parameters when these are used in LINQ to Entities queries in comparisons with non-unicode columns. This allows indexes to be used by SQL Server.
  • Improve the translation of the functions String.StartsWith, String.Contains and String.EndsWith in LINQ to Entites to use LIKE.
  • Improve the translation of the canonical functions StartsWith, Contains and EndsWith to use LIKE (these canonical functions became available in .NET 4.0 Beta1)
  • Collapse multiple Case statements into one

 Kati Iceva
Software Development Engineer, Entity Framework

 

Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# Marco Scheel said on August 6, 2009 4:29 AM:

Post Beta 1:

"Improve the translation of the functions String.StartsWith, String.Contains and String.EndsWith in LINQ to Entites to use LIKE.

"

In the current release (3.5) we use the ado.net data services to access the EF and one mayor problem is the performance of the above mentioned improvment.

Please... implement this feature in the final release of EF4.

Thanx

Marco

# Muhammad Mosa said on August 6, 2009 5:41 AM:

Thank you so much Kati,

This is just in time, I was about to start working on it to make this comparison, you just saved my time.

And good job EF team. Good luck with the rest

# Gabor Mezo said on August 7, 2009 9:17 AM:

Hi,

Will DefaultIfEmpty() operator be supported in EF v2 (LINQ outter joins)?

Thanks,

Gabor

# Kati Iceva said on August 7, 2009 3:54 PM:

Hi Gabor,

Yes, DefaultIfEmpty() will be supported in EF in .NET 4.0.

Thanks,

Kati

# Kati Iceva said on August 7, 2009 3:54 PM:

Hi Gabor,

Yes, DefaultIfEmpty() will be supported in EF in .NET 4.0.

Thanks,

Kati

# James Hancock said on August 10, 2009 10:51 AM:

Asside from not being able to do this Product.ID == (Guid) SomeObject in a where clause causing all kinds of variable declarations in our code all of the time for no particular reason, the most frustrating thing is doing Left Outer Joins. These are a major PITA in linq. PLEASE PLEASE PLEASE add an outerjoin command that simplifies this so that I can do simple left outer joins.

The only other thing is that I have all kinds of cases where I want to return a property that isn't part of the database in the select.  This should be supported, because it should know to select on the database the ones that are applicable to the database, then pull the proper properties for the rest from the object. Linq to SQL seems to do a pretty good job of this, but Linq to Entities is horrible.

Solving these issues and your support for functions will make .NET 4 fantastic.... well that and PICO and non diagram based implimentations so that people can inherit from our objects and add fields etc....

# shawn said on August 11, 2009 2:19 PM:

It's great to see improvements in the SQL generated by the EF.  One thing that I think could be a small improvement is to use shorter generated names for the tables/extents/projections.  Linq2Sql uses [t0], [t1], etc and the shorter names IMHO make for less "noise" in the SQL and a more readable query.  Plus every byte counts when sending SQL statements over the wire, especially when going from a local client to a remote database over an encrypted connection.  Just something to consider.

Also, +1 on what James said.

# Mark Rowe said on August 13, 2009 2:58 PM:

Coming from a biztalk environment, will For XML still be supported?

# Kati Iceva said on August 17, 2009 1:12 PM:

Thanks for your comments.

James, Entity Framework 4.0 will support DefaultIfEmpty, whic is used for achieving a LeftOuterJoin in Linq.

Shawn, we will consider your suggestion.

Thanks,

Kati

# danieldsmith said on August 18, 2009 9:16 AM:

Absolutely agree with Shawn about the shorter alias names.

Shorter names are cleaner, easier to read, cut bandwidth costs, and I'd imagine it would also mean slightly faster query parsing on the server side too.

# J-P said on August 28, 2009 3:24 AM:

Good to see those improvements.

Some time ago I had a query something like:

bool b = context.Categories.Any(c => c.Products.Count > 10);

Although mine was much more complicated this one serves quite well as an example.

The SQL according to the SQL Server Profiler for this query is: (with a little beautifying to make it readable)

---------------------------------------------------

CASE WHEN ( EXISTS (SELECT

1 AS [C1]

FROM ( SELECT

(SELECT

COUNT(1) AS [A1]

FROM [dbo].[Products] AS [Extent2]

WHERE [Extent1].[CategoryID] = [Extent2].[CategoryID]) AS [C1]

FROM [dbo].[Categories] AS [Extent1]

)  AS [Project1]

WHERE [Project1].[C1] > 10)

) THEN cast(1 as bit)

WHEN ( NOT EXISTS (SELECT

1 AS [C1]

FROM ( SELECT

(SELECT

COUNT(1) AS [A1]

FROM [dbo].[Products] AS [Extent4]

WHERE [Extent3].[CategoryID] = [Extent4].[CategoryID]) AS [C1]

FROM [dbo].[Categories] AS [Extent3]

)  AS [Project3]

WHERE [Project3].[C1] > 10)

) THEN cast(0 as bit)

END AS [C1]

FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]

---------------------------------------------------

With my limited knowledge of T-SQL performance I would think that this is a real performance killer when having to execute the, possible quite complex, sub-query again for the "NOT EXTIST" when the "EXISTS" fails. Please correct me if I'm wrong.

Why not 'replace' the "WHEN ( NOT EXISTS ....... THEN cast(0 as bit)" with "ELSE cast(0 as bit)". Even if it doesn't give better performance it is much more readable.

For my original complex query I ended up using Count() instead of Any():

bool b = context.Categories.Count(c => c.Products.Count > 10) > 0;

For me this did perform better then the Any() statement.

Just my two cents.

J-P

# Roger Hendriks said on September 4, 2009 6:19 AM:

Great! Good to see you are investing a lot in were we put our future in :)

"Use of Inner Joins instead of Left Outer Joins when possible" I hope this means: use inner joins when relation is 1 and use outer join when relation is 0..1. In this version the outer join does not work correctly in hierarchy classes f.e. order 0..1 to person where person is an extension of user becomes an inner join ;(

It would be preferrable if the developer could force it with a keyword.

Also an isnull on required fields in an outer join is really needed because ?? does not work.

# san_kan1gb@yahoo.co.uk said on September 27, 2009 3:21 AM:

1. It would be very helpfull if u make a menu command to refresh all tables .xml files by getting the last changes from the database instead of us going to each .xml file and refreshing each table one by one. at least for the tables that we submitted to get all the columns like: Select * from table1.

2. Also it would be very good if there is a command that will check the tables in all the files in the Visual Studio solution and their real ones in SQL Server.

# Phil said on October 3, 2009 10:58 PM:

Hello,

This simple query against AdventureWorks database using EF4.0 Beta 1 seems to produce some pretty horrible SQL – superfluous aliases everywhere, a redundant FROM clause, the same condition being tested twice etc.

LINQ TO EF:

bool exists = context.Customers.Any(c => c.AccountNumber == "AW00000003");

GENERATED SQL:

SELECT

CASE WHEN ( EXISTS (SELECT

1 AS [C1]

FROM [Sales].[Customer] AS [Extent1]

WHERE N'AW00000003' = [Extent1].[AccountNumber]

)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT

1 AS [C1]

FROM [Sales].[Customer] AS [Extent2]

WHERE N'AW00000003' = [Extent2].[AccountNumber]

)) THEN cast(0 as bit) END AS [C1]

FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]

EXPECTED:

SELECT

CASE WHEN ( EXISTS (SELECT

1

FROM [Sales].[Customer] AS [Extent1]

WHERE N'AW00000003' = [Extent1].[AccountNumber]

)) THEN 1

ELSE 0 END AS [C1]

# Dan said on October 4, 2009 2:32 AM:

For TPT Inheritance, how about the ability to specify a discriminator column in the base entity to help relate to derived entities?

Example: A Base entity of Vehicle would have a discriminator column called VehicleType.  Then the derived entity, Car, would be conditionally related to Vehicle when Vehicle.VehicleType = [some constant].

Seems like you could make the SQL more efficient if the join to the derived table uses the VehicleType in the condition instead of just an outer join based on the pk/fk.  It would save the DB from even scanning the derived table for the PK unless it's VehicleType matched first.

Don't a lot of databases have that discriminator column anyways in this type of situation?

# Aquilax said on October 23, 2009 8:19 AM:

It would nice if the Linq method Contains would be included and translated to the sql IN statement:

EntitySet.Where(e=>{1,2,3}.Contains(e.Id));

converted to sql:

select ... where Id IN (1,2,3)

# Luther said on October 28, 2009 3:04 PM:

Will you guys deveop a full text -- Contains, freetext, etc feature in Linq to Entities? If not when will this feature be available? I still find myself using inline sql for full text.

Leave a Comment

(required) 
(optional)
(required) 

  
Enter Code Here: Required

Search

This Blog

Syndication

Page view tracker