Welcome to MSDN Blogs Sign in | Join | Help

Using LINQ Expressions to Generate Dynamic Methods II

A beta of Visual Studio 2008 SP1 was released on Monday and the ADO.NET Entity Framework (EF) is now in the box! You can download and install the Beta here. The EF Extensions library has been updated to work with the beta and includes several public and internal changes. Source code is available at http://code.msdn.com/EFExtensions. The latest release introduces some performance improvements in the materializer (you can read about the library here). These improvements illustrate another powerful expression pattern.

To improve code clarity, the EF Extensions API encourages you to write:

var products = command.Materialize<Product>(r => new Product {

    ProductID = r.Field<int>("ProductID"),

    Name = r.Field<string>("Name"),

   

}).ToList();


instead of:

List<Product> products = new List<Product>();

using (SqlDataReader reader = command.ExecuteReader()) {

    int idOrdinal = reader.GetOrdinal("ProductID");

    int nameOrdinal = reader.GetOrdinal("Name");

   

    while (reader.Read()) {

        Product product = new Product {

            ProductID = (int)reader.GetValue(idOrdinal),

            Name = reader.IsDBNull(nameOrdinal) ? (string)null : (string)reader.GetValue(nameOrdinal),

           

        };

        products.Add(product);

    }

}


There’s usually a tradeoff. This is no exception... While the code in the first example is easier to write, read and maintain, looking up column ordinals on each call to Field<T> is expensive: for every row in every column, I’m incurring the cost of the lookup. Field<T> also verifies arguments on every call and checks for DBNull whether or not the requested type accepts nulls. Most of this work is redundant or unnecessary for materialization.

Fortunately, there’s a simple solution to these problems. If we represent the “shaper” delegate as a LINQ expression, we can rewrite it (using the technique described here) for efficiency. Basically, we can rewrite calls to Field<T> to calls to the underlying reader, caching column ordinals for efficiency. In the above example, the expression:

r => new Product() {ProductID = r.Field("ProductID"), Name = r.Field("Name")}


now becomes

r => new Product() {ProductID = Convert(r.GetValue(0)), Name = Convert(IIF(r.IsDBNull(1), null, r.GetValue(1)))}


The rewritten version is identical to the more performant version we wrote by hand.

The EFExtensions library uses an extensible pattern to perform these optimizations.  Methods that can be optimized or rewritten are flagged with an attribute indicating a handler, in this example FieldMethodOptimizer:

[MaterializerOptimizedMethod(typeof(FieldMethodOptimizer))]

public static T Field<T>(this IDataRecord record, string name);


When materialization begins, field names from the reader are immediately retrieved. Whenever a method with this attribute is encountered in the shaper expression, the corresponding optimizer is called to rewrite the expression:

protected override Expression VisitMethodCall(MethodCallExpression m) {

    Expression result = base.VisitMethodCall(m);

    if (result.NodeType == ExpressionType.Call) {

        m = (MethodCallExpression)result;

        MaterializerOptimizedMethodAttribute attribute = m.Method.GetCustomAttributes(typeof(MaterializerOptimizedMethodAttribute), false)

            .Cast<MaterializerOptimizedMethodAttribute>()

            .SingleOrDefault(); // multiple attributes not permitted; not inherited

        if (null != attribute) {

            return attribute.Optimizer.OptimizeMethodCall(this.fieldNames, this.recordParameter, m);

        }

    }

    return result;

}


As in my previous post, I’m leveraging the ExpressionVisitor to do the rewrite. In this case, I’m intercepting and replacing MethodCallExpressions only.

End result: we can now use a more concise coding pattern without sacrificing performance. Unfortunately, we still need to pay the cost of compiling the materializer delegate, but this can be offset by reusing the delegate. To facilitate reuse, the Materializer class in EFExtensions is thread-safe and stores the optimized delegate on first use.

LINQ to Entities: Combining Predicates

Someone asked a great question on the ADO.NET Entity Framework forums yesterday: how do I compose predicates in LINQ to Entities? I’ll give three answers to the question.

Answer 1: Chaining query operators

Basically, you have some query and you have some predicates you want to apply to that query (“the car is red”, “the car costs less than $10”). If both conditions need to be satisfied, you can just chain together some calls to Where (“the car is red and costs less than $10”):

Expression<Func<Car, bool>> theCarIsRed = c => c.Color == "Red";

Expression<Func<Car, bool>> theCarIsCheap = c => c.Price < 10.0;

IQueryable<Car> carQuery = ;

var query = carQuery.Where(theCarIsRed).Where(theCarIsCheap);

 

If you’re willing to exceed the $10 budget for cars that are red, you can chain Unions instead (“the car is red or the car costs less than $10”):

var query2 = carQuery.Where(theCarIsRed).Union(carQuery.Where(theCarIsCheap));

 

This last query has a couple of problems: it’s inefficient (because of the unions) and it eliminates duplicates in the results, something that would not happen if I applied a single predicate.

Answer 2: Build expressions manually

The LINQ Expressions API includes factory methods that allow you to build up the predicate by hand. I can define the conditions (with respect to a “car” parameter) as follows:

ParameterExpression c = Expression.Parameter(typeof(Car), "car");

Expression theCarIsRed = Expression.Equal(Expression.Property(c, "Color"), Expression.Constant("Red"));

Expression theCarIsCheap = Expression.LessThan(Expression.Property(c, "Price"), Expression.Constant(10.0));

Expression<Func<Car, bool>> theCarIsRedOrCheap = Expression.Lambda<Func<Car, bool>>(

    Expression.Or(theCarIsRed, theCarIsCheap), c);

var query = carQuery.Where(theCarIsRedOrCheap);

 

Building queries by hand isn’t very convenient. If you’re already building expressions from scratch, this is a good approach but otherwise I’d suggest something different…

Answer 3: Composing Lambda Expresions

The Albaharis suggest combining bodies of lambda expressions in their C# 3.0 book (a great resource for all things C# and LINQ). This allows you to describe the parts of the expression using the lambda syntax and build an aggregate expression:

Expression<Func<Car, bool>> theCarIsRed = c1 => c1.Color == "Red";

Expression<Func<Car, bool>> theCarIsCheap = c2 => c2.Price < 10.0;

Expression<Func<Car, bool>> theCarIsRedOrCheap = Expression.Lambda<Func<Car, bool>>(

    Expression.Or(theCarIsRed.Body, theCarIsCheap.Body), theCarIsRed.Parameters.Single());

var query = carQuery.Where(theCarIsRedOrCheap);

 

I’m taking the bodies of the two conditions and Oring them in a new lambda expression. There is a subtle problem however: the parameter for the merged expression (c1) is taken from “theCarIsRed”, which leaves us with a dangling parameter (c2) from “theCarIsCheap”. The resulting query is invalid. How can I force “theCarIsCheap” to use the same parameter? The answer is to invoke the expression using the common parameter:

ParameterExpression p = theCarIsRed.Parameters.Single();

Expression<Func<Car, bool>> theCarIsRedOrCheap = Expression.Lambda<Func<Car, bool>>(

    Expression.Or(theCarIsRed.Body, Expression.Invoke(theCarIsCheap, p)), p);

 

Here’s the problem: LINQ to Entities does not support InvocationExpressions. Rather than invoking the expression with c1, I can manually rebind the parameter. Matt Warren’s series of articles on IQueryable providers includes an ExpressionVisitor implementation that makes it easy to rewrite expression trees. If you do any LINQ expression manipulation, this class is a crucial tool. Here’s an implementation of the visitor that rebinds parameters:

public class ParameterRebinder : ExpressionVisitor {

    private readonly Dictionary<ParameterExpression, ParameterExpression> map;

 

    public ParameterRebinder(Dictionary<ParameterExpression, ParameterExpression> map) {

        this.map = map ?? new Dictionary<ParameterExpression, ParameterExpression>();

    }

 

    public static Expression ReplaceParameters(Dictionary<ParameterExpression, ParameterExpression> map, Expression exp) {

        return new ParameterRebinder(map).Visit(exp);

    }

 

    protected override Expression VisitParameter(ParameterExpression p) {

        ParameterExpression replacement;

        if (map.TryGetValue(p, out replacement)) {

            p = replacement;

        }

        return base.VisitParameter(p);

    }

}

 

Now I can write a general utility method to compose lambda expressions without using invoke (I’ll call it Compose), and leverage it to implement EF-friendly And and Or builder methods:

public static class Utility {

    public static Expression<T> Compose<T>(this Expression<T> first, Expression<T> second, Func<Expression, Expression, Expression> merge) {

        // build parameter map (from parameters of second to parameters of first)

        var map = first.Parameters.Select((f, i) => new { f, s = second.Parameters[i] }).ToDictionary(p => p.s, p => p.f);

 

        // replace parameters in the second lambda expression with parameters from the first

        var secondBody = ParameterRebinder.ReplaceParameters(map, second.Body);

 

        // apply composition of lambda expression bodies to parameters from the first expression 

        return Expression.Lambda<T>(merge(first.Body, secondBody), first.Parameters);

    }

 

    public static Expression<Func<T, bool>> And<T>(this Expression<Func<T, bool>> first, Expression<Func<T, bool>> second) {

        return first.Compose(second, Expression.And);

    }

 

    public static Expression<Func<T, bool>> Or<T>(this Expression<Func<T, bool>> first, Expression<Func<T, bool>> second) {

        return first.Compose(second, Expression.Or);

    }

}

 

To combine lambda expressions, I can write:

Expression<Func<Car, bool>> theCarIsRed = c => c.Color == "Red";

Expression<Func<Car, bool>> theCarIsCheap = c => c.Price < 10.0;

Expression<Func<Car, bool>> theCarIsRedOrCheap = theCarIsRed.Or(theCarIsCheap);

var query = carQuery.Where(theCarIsRedOrCheap);

 

I’ll use this last answer as an excuse to discuss variations on the visitor pattern in a future post...

Using LINQ Expressions to Generate Dynamic Methods

This week at DevConnections in Orlando, I gave a “deep-dive” talk on LINQ. I wanted to give people a feel for what's possible with the new language features and core APIs in .NET 3.5. I spent most of the talk discussing a single example: take an ADO.NET 2.0 code sample and simplify. Instead of using an existing library like LINQ to SQL or LINQ to Entities, I built a (limited) LINQ provider from scratch. As boilerplate code is moved into helper methods and a proto-LINQ-to-SQL API evolves, the sample is boiled down to something much more compact:

static List<Customer> GetCustomers()

{

    using (Table<Customer> table = new Table<Customer>(GetConnectionString(), "Customers"))

    {

        IEnumerable<Customer> query = from customer in table

                                      where customer.City == "London"

                                      select customer;

        return query.ToList();

    }

}

 

In this post, I’ll drill down on one component developed in the talk, which was also included in the EFExtensions library. This component takes rows from a data reader and transforms them (or “shapes” them) into typed results. For instance, I may want to transform records into customers. Using a couple of extension methods (Field<T> and Materialize<T>), we can leverage a “shaper” delegate to do just this:

SqlCommand command = …;

 

return command.Materialize<Category>(r =>

    new Category

    {

        CategoryID = r.Field<int>("CategoryID"),

        CategoryName = r.Field<string>("CategoryName"),

       

    });

 

The shaper delegate code (highlighted above) is still annoying though: for every property of the customer, I’m retrieving a column of the same name. The code is mechanical and repetitive, the sort of thing you want a machine to do rather than a programmer.

Here are three strategies you can use to automate this pattern in .NET 2.0:

1.       Reflection: Write a general purpose delegate that uses reflection to construct an instance of T and then dynamically invokes property setters. While this code is relatively easy to write, the performance is sub-optimal. For information on the performance of various method dispatch patterns, take a look at this great talk by Joel Pobar and Joe Duffy.

2.       Automatically generate the code: You can automatically generate wrapper classes encapsulating shaping logic. Code generation has its challenges however, in particular integration with build systems and the development environment. It’s also a lot of code to maintain.

3.       Create a DynamicMethod implementing the pattern: .NET allows you to compile a delegate at runtime using dynamic methods. This resolves the performance and maintenance problems of solutions 1 and 2. MSIL generation is hard to get right unfortunately and the code does not reflect the intent of the generated delegate.

Digression: random thoughts on APIs

I recently heard the expression “Swiss Army Knife API”. These are interfaces that handle all kinds of little (possibly unrelated) problems. They can be useful but they are also hard to package and discover. The Zip method in EFExtensions illustrates these problems. It’s a handy method, but what is it doing in an EF library? It has nothing to do with the EF or with the scenarios addressed by the library (it’s poorly packaged), and no one trying to pair the elements of two iterators would think to look in that particular library (it’s not discoverable). If you need to install a PCI card, fillet a fish and hand-stitch a saddle, you might need a Swiss Army API.

At the other extreme, there are narrowly targeted APIs that can solve complex problems but most often require specialized knowledge or training. To make matters worse, once you’ve mastered them, you can rarely apply your knowledge to different domains. You can probably think of a few examples of this pattern.

LINQ achieves a useful balance. While the LINQ project was motivated by a specific requirement – seamless support for non-object data within .NET applications – all components of the solution are generically useful. Consider…

The System.Linq.Expressions API serves a specific need for integrated queries: it allows the compiler to describe the user’s code as a data structure that can then be translated to targets other than MSIL at runtime, like SQL, Web Services, etc. Expressions can also be compiled into delegates at runtime, which brings me to a .NET 3.5 solution to the default shaper problem… If the compiler can use expressions to describe code, so can we!

Here’s the code pattern we want to generate:

r => new T

{

    Property1 = r.Field<Type[Property1]>("Property1"),

    Property2 = r.Field<Type[Property2]>("Property2"),

   

}

 

Shortcut: learning how to build an expression programmatically

If you want to figure out how to build expressions programmatically, a simple trick will probably save you some time. Just follow the compiler’s lead. First, write an example of the pattern, e.g.: