This post covers one of the most important and frequently misunderstood LINQ features. Understanding deferred execution is a rite of passage that LINQ developers must undergo before they can hope to harness the full power of this technology. The contents of this post assumes an intermediate understanding of LINQ.

Note: A video that accompanies this post can be found here. The sample code from post is found in the attached program called DeferredExecution.

Note: In several places in this explanation of deferred execution I will refer to a data structure called an expression tree. To understand this portion of the text, you need only grasp that an expression tree is a data structure like a list or queue. It holds a LINQ to SQL query -- not the results of the query, but the actual elements of the query itself. In some future post I will cover expression trees in more depth.

Consider this simple LINQ to SQL query:

var query = from customer in db.Customers  << Query does  
            where customer.City == "Paris" << not execute
            select customer;               << here 

It is easy to assume that these few lines of code execute a query against a database. In fact, they do not. This code merely captures the idea of the query in a data structure called an expression tree. The tree contains information about the table you want to query, the question you want to ask of the table, and the result you want to return. It does not, however, actually execute the query!

In LINQ, execution of a query is usually deferred until the moment when you actually request the data. For instance, the following code would cause the query to execute:

foreach (var Customer in query) << Query executes here
{
  Console.WriteLine(Customer.CompanyName);
}

Query expressions often do not cause code to be executed. They only define the question you want to ask. If this were not the case, then it would be difficult or impossible for LINQ to break queries down into a relational algebra which makes composability possible, and which allows developers who care about such things to optimize their code. Deferred execution makes it possible for you to compose quite complex queries from various components without expending the clock cycles necessary to actually query the data. Or at least the data will not be queried until it is absolutely necessary.

Let’s make a slight change to the query shown above:

var query = (from customer in db.Customers << Query executes here 
             where customer.City == "Paris"
             select customer).Count();

This time the query will be run when the execution point moves past it. The code executes when you call Count(), as it would when you call any of the other operators that must iterate over the result of a query in order to return a value that is not IEnumerable<T> or IQueryable<T>. The ToList() operator the query code to execute immediately because it returns a List<T> instead of IQueryable<T>. Consider the following code:

public void SimpleQuery01()
{
  db.Log = Console.Out;

  // Query executes here because it returns a List<T>
  List<Customer> list = (from customer in db.Customers
                         where customer.City == "Paris"
                         select customer).ToList();

  foreach (var Customer in list)
  {
    Console.WriteLine(Customer.CompanyName);
  }
} 

One of the last things the LINQ to SQL provider does before executing a query is to create the SQL it will send across the wire. This fact gives us a clue we can use when trying to determine the exact moment when a query executes.

Assigning db.Log to Console.Out as we do in the first line of this query ensures that the SQL for our query will be written to the console as soon as it is generated. When the code shown above is run, you will see the log written to the screen as the query expression executes.

Consider the following code, which does not call ToList():

db.Log = Console.Out;
var query = from customer in db.Customers
            where customer.City == "Paris"
            select customer;

foreach (var Customer in query)  << Query Executes here
{
  Console.WriteLine(Customer.CompanyName);
}

If you step through this code with the debugger while watching the console window, you will see that the SQL is not generated until the foreach statement executes. When we call ToList(), the SQL is written to the screen earlier, providing the evidence that the query itself executes earlier:

List<Customer> list = (from customer in db.Customers
                       where customer.City == "Paris"
                       select customer).ToList(); << Query Executes here 

Let’s look at deferred execution from a slightly different angle:

public void SimpleQuery03()
{
  string city = "London";

  var query = from c in db.Customers
              where c.City == city
              select new { c.City };

  city = "Madrid";

  foreach (var q in query) << Query Executes here 
  {
    Console.WriteLine(q);
  }
} 

This method will print Madrid to the screen rather than London. By changing the value of the variable city to Madrid just before the foreach statement we ensure that Madrid, rather than London is included in the SQL that is sent to the server. The point being that the execution of the query expression merely generates an expression tree, it does not send SQL to the database.

Let’s take a look at one final example, just to drive this very important point home. Consider this code:

public void DeleteInsert01()
{
  db.Log = Console.Out;

  string customerId = "WIDGE";

  // Query expression to be used repeatedly
  var query = from c in db.Customers
              where c.CustomerID == customerId
              select c;

  Console.WriteLine("Count before insert: {0}", query.Count()); << SQL Sent

  // Object Initializer
  var newCustomer = new Customer
  {
    CustomerID = customerId,
    CompanyName = "Microsoft",
    ContactName = "John Doe",
    ContactTitle = "Sales Manager",
    Address = "1 Microsoft Way",
    City = "Redmond",
    Region = "WA",
    PostalCode = "98052",
    Country = "USA",
    Phone = "(425) 555-1234",
    Fax = null
  };

  // Insert
  db.Customers.InsertOnSubmit(newCustomer);
  db.SubmitChanges();

  Console.WriteLine("Count after insert: {0}", query.Count()); << SQL Sent

  // Delete
  db.Customers.DeleteAllOnSubmit(query);
  db.SubmitChanges();

  Console.WriteLine("Count after delete: {0}", query.Count()); << SQL Sent
}

This first call to WriteLine() shows the initial number of records with a CustomerId of WIDGE. The value returned is zero. The second WriteLine shows the value at 1, because a record with that value has been inserted. The final call shows the count as back at zero, because the inserted record was deleted

Notice that query expression used to retrieve the count from the server is written only once near top of the method. At each of the three WriteLine statements this query is composed with the Count() operator and executed. The following SQL was sent to the server three times:

SELECT COUNT(*) AS [value]
FROM [dbo].[Customers] AS [t0]
WHERE [t0].[CustomerID] = @p0

SQL of a different sort was also sent to the server when DeleteAllOnSubmit was called.

Suppose we rewrote the method the method to look like this:

public void DeleteInsert02()
{
    db.Log = Console.Out;

    string customerId = "WIDGE";

    // Execution of the query occurs here
    var query = (from c in db.Customers  << Query executes
                 where c.CustomerID == customerId
                 select c).Count();

    Console.WriteLine("Count before insert: {0}", query); << No execution

    // Object Initializer
    var newCustomer = new Customer
    {
        CustomerID = customerId,
        CompanyName = "Microsoft",
        ContactName = "John Doe",
        ContactTitle = "Sales Manager",
        Address = "1 Microsoft Way",
        City = "Redmond",
        Region = "WA",
        PostalCode = "98052",
        Country = "USA",
        Phone = "(425) 555-1234",
        Fax = null
    };

    // Insert
    db.Customers.InsertOnSubmit(newCustomer);
    db.SubmitChanges();

    Console.WriteLine("Count after insert: {0}", query); << No execution

    var deleteQuery = from c in db.Customers
                      where c.CustomerID == customerId
                      select c;

    // Delete
    db.Customers.DeleteAllOnSubmit(deleteQuery);
    db.SubmitChanges();

    Console.WriteLine("Count after delete: {0}", query); << No execution
}

This time we get the count of records in the database when the query expression executes near the start of the method. The correct value, which is zero, is returned from the database at that time and is sent to the console when the first WriteLine statement executes. The second two times that WriteLine is called, the same value of zero that was returned from the initial query is sent to the server. As a result, invalid data is reported to the user! This illustrates why it is so important that you understand deferred execution.

Deferred execution is a two edged sword:

  • Composable queries and deferred execution work together to make LINQ an unusual rich query language. If you properly understand these features of LINQ you will be able to write less code, that executes faster, in order to accomplish more.
  • As you have seen, if we don’t properly understand deferred execution then we can end up with a sheepish grin on our face. Queries might not execute when we expect them to, and we might report erroneous results to the user.

Note: Deferred execution applies to all varieties of LINQ, including LINQ to SQL, LINQ to Objects and LINQ to XML. However, of the three, it is only LINQ to SQL that returns an expression tree by default. Or more specifically, it returns an instance of the IQueryable interface that references an expression tree. A query that returns IEnumerable still supports deferred execution, but at least some larger portion of the result is likely to have been generated than is the case with LINQ to SQL. In other words, all types of LINQ support deferred execution, but LINQ to SQL supports it more fully than LINQ to Objects or LINQ to XML.

This short post outlined some of the core facts about deferred execution. This is a subject that developers must understand if they want to use the full power of LINQ. It is not a particularly difficult subject, but it is one that requires us to follow chains of thought that we have not typically pursued when working with imperative code.

kick it on DotNetKicks.com