Server queries and identity resolution

I answered a Connect issue today that deals with a very common expectation for users of systems like Entity Framework and LINQ to SQL. The issue was something like this:

When I run a query, I expect entities that I have added to the context and that are still not saved but match the predicate of the query to show up in the results.

Reality is that Entity Framework queries are always server queries: all queries, LINQ or Entity SQL based, are translated to the database server’s native query language and then evaluated exclusively on the server.

Note: LINQ to SQL actually relaxes this principle in two ways:

1. Identity-based queries are resolved against the local identity map. For instance, the following query shall not hit the data store:

var c = context.Customers
    .Where(c => c.CustomerID == "ALFKI");

2. The outermost projection of the query is evaluated on the client. For instance, the following query will create a server query that projects CustomerID and will invoke a client-side WriteLineAndReturn method as code iterates through results:

var q = context.Customers
    .Select(c => WriteLineAndReturn(c.CustomerID));

But this does not affect the behavior explained in this post.

In sum, Entity Framework does not include a client-side or hybrid query processor.

MergeOption and Identity resolution

There are chances that you have seen unsaved modifications in entities included in the results of queries. This is due to the fact that for tracked queries (i.e. if the query’s MergeOption is set to a value different from NoTracking) Entity Framework performs “identity resolution”.

The process can be simply explained like this:

  1. The identity of each incoming entity is determined by building the corresponding EntityKey.
  2. The ObjectStateManager is looked up for an entity already present that has a matching EntityKey.
  3. If an entity with the same identity is already being tracked, the data coming from the server and the data already in the state manager are merged according to the MergeOption of the query.
  4. In the default case, MergeOption is AppendOnly, which means that the data of the entity in the state manager is left intact and is returned as part of the query results.

However, membership of an entity in the results of a given query is decided exclusively based on the state existing on the server. In this example, for instance, what will the query get?:

var customer1 = Customer.CreateCustomer(1, "Tiger");
var customer2 = Customer.CreateCustomer(2, "Zombie");
context.SaveChanges();
customer1.LastName = "Zebra";
var customer3 = Customer.CreateCustomer(100, "Zorro");
context.AddObject("Customers", customer3);
context.DeleteObject(customer2);
var customerQuery = context.Customers
    .Where(c => c.LastName.StartsWith("Z"));
foreach(var customer in customerQuery)
{
    if (customer == customer1)
    {
        Console.WriteLine(c.LastName);
    }
}

The answer is:

  1. The modified entity customer1 won’t show up in the query because its LastName is still Tiger on the database.
  2. The deleted entity customer2 will be returned by the query, although it is a deleted entity already, because it still exists in the database.
  3. The new entity customer3 won’t make it, because it only exists in the local ObjectStateManager and not in the database.

This behavior is by design and you need to be aware of it when writing your application.

Put in some other way, if the units of work in your application follow a pattern in which they query first, then make modifications to entities and finally save them, discrepancies between query results and the contents of the ObjectSateManager cannot be observed.

But as soon as queries are interleaved with modifications there is a chance that the server won’t contain an entity that exist in the state manager only and that that would match the predicate of the query. Those entities won’t be returned as part of the query.

Notice that the chances that this happens has to do with how long lived is the Unit of Work in your application (i.e. how much does it take from the initial query to the call to SaveChanges).

Hope this helps,
Diego