Workaround for performance with Enumerable.Contains and non-Unicode columns against EF in .NET 4.0

This week we announced the availability of EF6 alpha 2 (read the announcement post in our blog and Scott Guthrie's post for more information), which includes a great performance improvement for Enumerable.Contains. This reminded me of another improvement we did in EF5 (in EF core bits in .NET 4.5) to improve the translation of the query for Enumerable.Contains over lists of strings, so that the database server can take advantages of existing indexes on non-Unicode columns.

I got asked the question many times of how to make this work faster with .NET 4.0. Fortunately there is a workaround.

In .NET 4.0 we added the ability to EF to recognize when a comparison between a constant or a parameter and a non-Unicode column should not try to convert the column to Unicode, e.g. assuming that CompanyName is a non-Unicode string column, e.g. varchar(max), that for a query like this:

  var q = context.Customers.Where(c => c.CompanyName == "a");

We can produce a translation like this:

 SELECT 
 [Extent1].[CustomerID] AS [CustomerID],
 [Extent1].[CompanyName] AS [CompanyName],
 …
 FROM [dbo].[Customers] AS [Extent1]
 WHERE [Extent1].[CompanyName] = 'a'

Rather than this:

 SELECT 
 [Extent1].[CustomerID] AS [CustomerID],
 [Extent1].[CompanyName] AS [CompanyName],
 …
 FROM [dbo].[Customers] AS [Extent1]
 WHERE [Extent1].[CompanyName] = N'a'

For simple comparisons and simple string constants we can detect the pattern and do this automatically. Even for cases in which we don't do this automatically we provide a method that can be used to explicitly indicate to LINQ to Entities that a constant should be treated as Non-Unicode. EntityFunctions.AsNonUnicode. For instance, you can write the query this way:

 var q = context.Customers.Where(c => c.CompanyName != EntityFunctions.AsNonUnicode("a"));

But when Enumerable.Contains is involved, the workaround is more complicated because the source argument of Contains is a collection of strings and you cannot directly tell EF that each element has to be considered non-Unicode. E.g. let’s say that we want to write the following query, but we want EF to treat the strings in the list as non-Unicode strings:

 var values = new[] { "a", "b", "c" };
 var q = context.Customers.Where(c => values.Contains(c.CompanyName));

The following helper class defines a method that can be used to build a predicate with all the necessary LINQ expressions:

 public static class PredicateBuilder
 {
     private static readonly MethodInfo asNonUnicodeMethodInfo = 
                 typeof(EntityFunctions).GetMethod("AsNonUnicode");
     private static readonly MethodInfo stringEqualityMethodInfo = 
                 typeof(string).GetMethod("op_Equality");
  
     public static Expression<Func<TEntity, bool>> ContainsNonUnicodeString<TEntity>(
                 IEnumerable<string> source, 
                 Expression<Func<TEntity, string>> expression) 
     {
         if (source == null) throw new ArgumentNullException("source");
         if (expression == null) throw new ArgumentNullException("expression");
  
         Expression predicate = null;
         foreach (string value in source)
         {
             var fragment = Expression.Equal(
                 expression.Body, 
                 Expression.Call(null, 
                     asNonUnicodeMethodInfo, 
                     Expression.Constant(value, typeof(string))), 
                 false, 
                 stringEqualityMethodInfo);
             if (predicate == null)
             {
                 predicate = fragment;
             }
             else
             {
                 predicate = Expression.OrElse(predicate, fragment);
             }
         }
  
         return Expression.Lambda<Func<TEntity, bool>>(predicate,
             ((LambdaExpression)expression).Parameters);
     }
 }

Usage:

 var values = new[] { "a", "b", "c" };
 var q = context.Customers.Where(
     PredicateBuilder.ContainsNonUnicodeString<Customer>(values, a => a.CompanyName));

The translation looks like this:

 SELECT
 [Extent1].[CustomerID] AS [CustomerID],
 [Extent1].[CompanyName] AS [CompanyName],
 …
 FROM [dbo].[Customers] AS [Extent1]
 WHERE [Extent1].[CompanyName] IN ('a','b','c')

As always, treat this sample code carefully. This is something I only tried on my machine and it worked for me. You should test it and verify that it meets your needs.

What we are doing in this workaround is putting together a LINQ expression tree that is valid and that LINQ to Entities can parse and translate, but that would have been very hard (in fact I believe in this case it would not be possible) to get the compiler to produce directly.

Hope this helps,

Diego