Welcome to MSDN Blogs Sign in | Join | Help

Improving ObjectQuery<T>.Include – Updated

Having spent some time using the sample from my previous post on ObjectQuery.Include, I’ve encountered a bug! It turns out that the code generates the wrong include string for

context.Customers.Include(c => c.Order.SubInclude(o=>o.OrderDetail))

The fix for this is a small change to the BuildString method to recurse up the MemberExpression if necessary. The updated code is below  - usual disclaimers apply!

    public static class ObjectQueryExtensions
    {
        public static ObjectQuery<TSource> Include<TSource, TPropType>(this ObjectQuery<TSource> source, Expression<Func<TSource, TPropType>> propertySelector)
        {
            string includeString = BuildString(propertySelector);
            return source.Include(includeString);
        }
        private static string BuildString(Expression propertySelector)
        {
            switch (propertySelector.NodeType)
            {
                case ExpressionType.Lambda:
                    LambdaExpression lambdaExpression = (LambdaExpression)propertySelector;
                    return BuildString(lambdaExpression.Body);

                case ExpressionType.Quote:
                    UnaryExpression unaryExpression = (UnaryExpression)propertySelector;
                    return BuildString(unaryExpression.Operand);

                case ExpressionType.MemberAccess:

                    MemberExpression memberExpression = (MemberExpression)propertySelector;
                    MemberInfo propertyInfo = memberExpression.Member;

                    if (memberExpression.Expression is ParameterExpression)
                    {
                        return propertyInfo.Name;
                    }
                    else
                    {
                        // we've got a nested property (e.g. MyType.SomeProperty.SomeNestedProperty)
                        return BuildString(memberExpression.Expression) + "." + propertyInfo.Name;
                    }

                case ExpressionType.Call:
                    MethodCallExpression methodCallExpression = (MethodCallExpression)propertySelector;
                    if (IsSubInclude(methodCallExpression.Method)) // check that it's a SubInclude call
                    {
                        // argument 0 is the expression to which the SubInclude is applied (this could be member access or another SubInclude)
                        // argument 1 is the expression to apply to get the included property
                        // Pass both to BuildString to get the full expression
                        return BuildString(methodCallExpression.Arguments[0]) + "." +
                               BuildString(methodCallExpression.Arguments[1]);
                    }
                    // else drop out and throw
                    break;
            }
            throw new InvalidOperationException("Expression must be a member expression or an SubInclude call: " + propertySelector.ToString());

        }

        private static readonly MethodInfo[] SubIncludeMethods;
        static ObjectQueryExtensions()
        {
            Type type = typeof(ObjectQueryExtensions);
            SubIncludeMethods = type.GetMethods().Where(mi => mi.Name == "SubInclude").ToArray();
        }
        private static bool IsSubInclude(MethodInfo methodInfo)
        {
            if (methodInfo.IsGenericMethod)
            {
                if (!methodInfo.IsGenericMethodDefinition)
                {
                    methodInfo = methodInfo.GetGenericMethodDefinition();
                }
            }
            return SubIncludeMethods.Contains(methodInfo);
        }

        public static TPropType SubInclude<TSource, TPropType>(this EntityCollection<TSource> source, Expression<Func<TSource, TPropType>> propertySelector)
            where TSource : class, IEntityWithRelationships
            where TPropType : class
        {
            throw new InvalidOperationException("This method is only intended for use with ObjectQueryExtensions.Include to generate expressions trees"); // no actually using this - just want the expression!
        }
        public static TPropType SubInclude<TSource, TPropType>(this TSource source, Expression<Func<TSource, TPropType>> propertySelector)
            where TSource : class, IEntityWithRelationships
            where TPropType : class
        {
            throw new InvalidOperationException("This method is only intended for use with ObjectQueryExtensions.Include to generate expressions trees"); // no actually using this - just want the expression!
        }
    }
Posted by stuartle | 1 Comments
Filed under: ,

Cheating at Scrabble with LINQ to Objects

I read an interesting post on Eric Lippert’s blog recently where he was using LINQ to Objects to find possible words from a set of letters in Scrabble (he wasn’t actually cheating – that just makes for a more interesting title!). Eric made a great follow-up post that highlights the need for both defining performance goals and for profiling when working to improve performance. I thoroughly recommend reading both Eric’s posts because they are great posts - plus they set the scene for this post ;-). In Eric’s posts, he noted that he’d met his performance goals for the code but having taken a copy of the code and started to play with it, I became the new user. I guess that I can be a little impatient at times and wanted it to feel as though the results were returned instantly! I stuck with Eric’s decision to not simply cache the dictionary in memory – after all, I may want to use a dictionary that won’t fit into memory at some point.

Running the profiler against the code on my machine shows that even before the modifications in Eric’s second post, reading the data from disk is the bottleneck on my machine. This highlights another issue with profiling your application: you need to perform the profiling on representative hardware. In my case I’m running the application on a laptop so it may be that my hard drive is slower. Having said that, the changes from Eric’s post still made a noticeable difference to performance:

Method Search time(ms)
Original 2700
Updated 1658

Having determined that reading from disk is the culprit (at least on my machine!), I looked through the code again. In the original code, the SearchDictionary method is called once with the original rack and then 26 further times as a result of adding an extra letter to the rack (to allow matching to existing tiles on the board).

First off, I refactored the code slightly and introduced a SearchResult class

class SearchResult
{
    public string Rack { get; set; }
    public IEnumerable<string> Words { get; set; }
}

This allows my Search method to return an IEnumerable<SearchResult> that includes the original rack plus the virtual racks formed by using a letter from the board. My implementation of the SearchDictionary method now handles searching with the additional letters. With this change in place I can now update the search method so that it only hits the file once. Without boring you with the details too much, I could have just added the extra items to the HashSet that Eric added in his second post but I wanted to be able to correlate the matches with the original rack for outputting the results (essentially I constrained myself to producing the same output as the original method). I created a type to correlate the original and expanded/canonicalised rack info (called RackInfo) and set up a dictionary keyed on the rack value after expansion for placeholders and canonicalisation. This dictionary can then replace the HashSet.

The original LINQ to Objects query to perform the matching was

from line in FileLines(DictionaryFilename)
where line.Length == originalRack.Length
where racks.Contains(Canonicalize(line))
select line;

Since we’re working with the original rack and the rack combined with additional letters, the line length clause needs a minor tweak. The racks.Contains also needs tweaking to use ContainsKey. Other than that, the main part of the query remains unchanged!

from line in FileLines(DictionaryFilename)
where line.Length == originalRack.Length || line.Length == originalRack.Length+1
where groupedRackInfos.ContainsKey(Canonicalize(line))
select new { Word = line, OriginalRacks = groupedRackInfos[Canonicalize(line)] };

Notice that the results of this query is an IEnumerable that returns each matched word along with the racks that produced a match for that word. The results need to be re-shaped back into the matches for each rack, but even with this small amount of extra work (done with LINQ to Objects, naturally!) the timings speak for themselves:

Method Search time(ms)
Original 2700
Updated 1658
Single file pass 108

Not a bad result. In fact, it feels near enough to instantaneous to me so I’m calling it a day on this one!

I should probably point out that this post isn’t a dig at the method that Eric used. He was the only consumer and decided (perfectly validly) that the performance was sufficient. On my machine I decided that it wasn’t (hmm, maybe I need to become more patient!). The key things about this for me are that you need to define your performance goals, measure the performance against those goals and then profile your app to determine where the performance bottlenecks are. Oh, and that LINQ to Objects is pretty powerful – I was able to make this change very easily and with minimal change to the original query.

Posted by stuartle | 4 Comments
Filed under: ,

Creating database connections with Unity – part 2

Last time we looked at how to set up the configuration file so that Unity would wire up an object that took an IDbConnection parameter in its constructor. Whilst the solution works, it is easy for the various connection strings to become buried away in the Unity configuration. Also, the application configuration file already has somewhere to put connection details: the connectionStrings Element. This seemed like the ideal place to store my connection strings so I wanted a way to hook this up using Unity. After hunting around the documentation for a while, I noticed that as well as specifying parameters as dependencies (as in the previous post) you can also specify a value. This value has to be entered as a string in the configuration file but you can use the typeConverter attribute to specify the type converter that should be used to turn the string into the appropriate type. At this point the light bulb flicked on and DbConnectionNameTypeConverter entered stage left.

NOTE: the code below isn’t production ready (and as always, is subject to the standard disclaimer: “These postings are provided "AS IS" with no warranties, and confer no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm”).

public class DbConnectionNameTypeConverter : TypeConverter
{
    public override object ConvertFrom(ITypeDescriptorContext context, System.Globalization.CultureInfo culture, object value)
    {
        string connectionStringName = (string) value;
        ConnectionStringSettings connectionStringSettings = ConfigurationManager.ConnectionStrings[connectionStringName];

        DbProviderFactory providerFactory = DbProviderFactories.GetFactory(connectionStringSettings.ProviderName);
        IDbConnection connection = providerFactory.CreateConnection();
        connection.ConnectionString = connectionStringSettings.ConnectionString;
        return connection;
    }
}

This code simply takes the string passed in and looks up the connection details with that name. It uses the DbProviderFactory to create the connection, so it will handle SqlConnection or any other connection type that is registered. With this class in place, we can move the connection configuration to the connectionStrings section:

<connectionStrings>
    <add name="AdventureWorks"
         connectionString="Data Source=(local);Database=AdventureWorks;Integrated Security=SSPI;"
         providerName="System.Data.SqlClient" />
</connectionStrings>

And then configure Unity to use the type converter to get an IDbConnection instance from the connection name:

<types>
    <type type="MyProject.IRepository, MyProject, Version=1.0.0.0, Culture=neutral"
          mapTo="MyProject.DefaultRepository, MyProject, Version=1.0.0.0, Culture=neutral">
        <lifetime type="transient" />
        <typeConfig>
            <constructor>
                <param name="connection" parameterType="IDbConnection">
                    <value value="AdventureWorks" type="IDbConnection" typeConverter="DbConnectionNameTypeConverter" />
                </param>
            </constructor>
        </typeConfig>
    </type>
</types>

And bingo! We get all of the Unity goodness from the previous post but reuse the existing configuration section to centralise the connection settings!

Posted by stuartle | 3 Comments
Filed under: ,

Creating database connections with Unity

I was adding dependency injection to an existing project and opted to use Unity configured via the application configuration file. As I was running through the configuration, I hit a type that required a  database connection which it took in as an IDbConnection reference in the constructor so I added the following config under the container element:

<types>
    <type type="MyProject.IRepository, MyProject, Version=1.0.0.0, Culture=neutral"
          mapTo="MyProject.DefaultRepository, MyProject, Version=1.0.0.0, Culture=neutral">
        <lifetime type="transient" />
        <typeConfig>
            <constructor>
                <param name="connection" parameterType="IDbConnection">
                    <dependency />
                </param>
            </constructor>
        </typeConfig>
    </type>
</types>

(Before going any further, I should point out that I’ve added a type alias to the config for IDbConnection to save entering “System.Data.IDbConnection, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089” each time.)

If you’re not familiar with the configuration file options for Unity, each type element tells Unity what to do when that type is requested. In the case of IRepository the mapTo attribute instructs Unity to create an instance of DefaultRepository. The typeConfig section is used to specify how you want Unity to create the type and in this case we are instructing it to use the constructor that takes a single parameter of type IDbConnection. The dependency sub-element tells Unity to use itself to resolve the value for the IDbConnection parameter, so we need to add that to the config:

<types>
    <type type="MyProject.IRepository, MyProject, Version=1.0.0.0, Culture=neutral"
          mapTo="MyProject.DefaultRepository, MyProject, Version=1.0.0.0, Culture=neutral">
        <lifetime type="transient" />
        <typeConfig>
            <constructor>
                <param name="connection" parameterType="IDbConnection">
                    <dependency />
                </param>
            </constructor>
        </typeConfig>
    </type>
    <type type="IDbConnection" 
          mapTo="SqlConnection" >
        <typeConfig>
            <constructor>
                <param name="connectionString" parameterType="System.String">
                    <value value="Data Source=(local);Database=AdventureWorks;Integrated Security=SSPI;" />
                </param>
            </constructor>
        </typeConfig>
    </type>
</types>

We’ve now given Unity enough information to resolve the IRepository type and wire up the IDbConnection parameter in the constructor. One problem with this config is that whenever we try to resolve an IDbConnection we’re going to get a connection to AdventureWorks and my application needs a couple of different database connections. Fortunately Unity provides for this situation and we can add a name to the type mapping. So the IDbConnection specification becomes:

<type type="IDbConnection" 
          mapTo="SqlConnection" 
          name="AdventureWorksConnection">

and we can now refer to this type by name in the parameter dependency:

<dependency name="AdventureWorksConnection" />

This allows us to define multiple connection entries that we simply refer to by name when we need to. Using this approach, the same connection details can be referenced in multiple places in the config so connecting to a different server or database is simply a case of changing the connection string in a single place.  Also, assuming our repository class is able to deal with multiple database providers we can simply switch out the provider by changing the type Unity is creating for us (currently SqlConnection).

There’s lots more to Unity and for more information the Introduction to Unity page on MSDN is a good place to start. Next time we’ll take advantage of some of the flexibility of Unity to fix an something that I don’t really like about the above solution.

Posted by stuartle | 3 Comments
Filed under: ,

Microsoft AutoCollage 2008

I just thought I’d mention that Microsoft AutoCollage 2008 has been released. It is based on research from Microsoft Research in Cambridge and allows you to automatically create collages of images (not that you’d have guessed it from the name ;-) ). You can find out more, see example images, and download a free 30-day trial version from http://research.microsoft.com/AutoCollage (you can follow the download links to purchase it at the Microsoft online store). There are lots of great examples on the AutoCollage page, but I thought I’d add a personal collage…

2_AutoCollage_11_Images_2 (click to enlarge)

 

In case you’re wondering, the collage is of James  - the recent arrival to the Leeks clan :-D

Posted by stuartle | 2 Comments

DataServiceQuery<T>.Expand

ADO.Net Data Services allows you to expose your LINQ To Entities model (or LINQ To SQL model, or even your custom IQueryable model) via a RESTful API with minimal coding. For example, if you’re working with the Northwind database you can use the URL http://server/Service.svc/Customers(‘ALFKI’)/Orders to retrieve the orders for customer ALFKI. This simplicity makes it easy to retrieve data in a variety of scenarios – you simply need to be able to issue HTTP requests to access the data. To make it really easy to consume the data from client-side javascript, you can even specify that you want the data to be in JSON format. In rich client scenarios, you can either work with HTTP requests or if you like to make things really easy you can create a LINQ data model to access your ADO.Net Data Service. Simply point DataSvcUtil at your service and it will generate the LINQ classes you need. There’s a very good article by Shawn Wildermuth in the September 2008 issue of MSDN Magazine called ‘Creating Data-Centric Web Applications With Silverlight 2’ [1] that takes you through the process of creating a service and the LINQ classes to access the service from Silverlight. The article also discusses the mechanisms that ADO.Net Data Services uses to discover your entities if you want to expose your custom model (or if you’re just interested in how it works). I’ve also included a few extra links in the references at the end of the post if you want to find out more. Now, back to the point of the post...

The client-side code can use the DataServiceQuery<T>.Expand method to specify what properties should also be retrieved (much like the Include method in my last post Improving ObjectQuery<T>.Include). So you can write the following query to retrieve the customer ALFKI from the Northwind database and also pull back the orders:

var q = from c in context.Customers.Expand("Orders/Order_Details")
        where c.CustomerID == "ALFKI"
        select c;

Notice the call to the Expand Method. Much like the Include method for LINQ To Entities (see my last post), this lets yoe specify that you want certain related data to be pre-fetched. If you wanted to get the order details included as well then you could change the query to:

var q = from c in context.Customers.Expand("Orders/Order_Details")
        where c.CustomerID == "ALFKI"
        select c;

One point to notice is that it uses a forward slash (‘/’) rather than a decimal (‘.’) as the path separator, other than that it’s pretty similar to Include. Applying the same logic as in my last post, we can quickly create extension methods that allow us to write the above queries as

var q = from c in context.Customers.Expand(c => c.Orders)
        where c.CustomerID == "ALFKI"
        select c;

and

var q = from c in context.Customers.Expand(c => c.Orders.SubExpand(o=> o.Order_Details))
        where c.CustomerID == "ALFKI"
        select c;

Note that you can also expand multiple properties on the same entity. So if you were querying for orders, you can also bring back the customer and order details:

var q = from o in context.Orders.Expand(o=>o.Customer).Expand(o=>o.Order_Details)
        where o.Customer.CustomerID == "ALFKI"
        select o;

The code for the extension method is shown below – note that the code isn’t production ready (and as always, is subject to the standard disclaimer: “These postings are provided "AS IS" with no warranties, and confer no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm”). Since this code is very similar to my previous post, and gives you the same benefits, I’ll refer you to that post if you want to know how the code works and what the benefits are!

public static class DataServiceQueryExtensions
{
    public static DataServiceQuery<TSource> Expand<TSource, TPropType>(this DataServiceQuery<TSource> source, Expression<Func<TSource, TPropType>> propertySelector)
    {
        string expandString = BuildString(propertySelector);
        return source.Expand(expandString);
    }
    private static string BuildString(Expression propertySelector)
    {
        switch (propertySelector.NodeType)
        {
            case ExpressionType.Lambda:
                LambdaExpression lambdaExpression = (LambdaExpression)propertySelector;
                return BuildString(lambdaExpression.Body);

            case ExpressionType.Quote:
                UnaryExpression unaryExpression = (UnaryExpression)propertySelector;
                return BuildString(unaryExpression.Operand);

            case ExpressionType.MemberAccess:
                MemberInfo propertyInfo = ((MemberExpression)propertySelector).Member;
                return propertyInfo.Name;

            case ExpressionType.Call:
                MethodCallExpression methodCallExpression = (MethodCallExpression)propertySelector;
                if (IsSubExpand(methodCallExpression.Method)) // check that it's a SubExpand call
                {
                    // argument 0 is the expression to which the SubExpand is applied (this could be member access or another SubExpand)
                    // argument 1 is the expression to apply to get the expanded property
                    // Pass both to BuildString to get the full expression
                    return BuildString(methodCallExpression.Arguments[0]) + "/" +
                           BuildString(methodCallExpression.Arguments[1]);
                }
                // else drop out and throw
                break;
        }
        throw new InvalidOperationException("Expression must be a member expression or an SubExpand call: " + propertySelector.ToString());

    }

    private static readonly MethodInfo[] SubExpandMethods;
    static DataServiceQueryExtensions()
    {
        Type type = typeof(DataServiceQueryExtensions);
        SubExpandMethods = type.GetMethods().Where(mi => mi.Name == "SubExpand").ToArray();
    }
    private static bool IsSubExpand(MethodInfo methodInfo)
    {
        if (methodInfo.IsGenericMethod)
        {
            if (!methodInfo.IsGenericMethodDefinition)
            {
                methodInfo = methodInfo.GetGenericMethodDefinition();
            }
        }
        return SubExpandMethods.Contains(methodInfo);
    }

    public static TPropType SubExpand<TSource, TPropType>(this Collection<TSource> source, Expression<Func<TSource, TPropType>> propertySelector)
        where TSource : class
        where TPropType : class
    {
        throw new InvalidOperationException("This method is only intended for use with DataServiceQueryExtensions.Expand to generate expressions trees"); // no actually using this - just want the expression!
    }
    public static TPropType SubExpand<TSource, TPropType>(this TSource source, Expression<Func<TSource, TPropType>> propertySelector)
        where TSource : class
        where TPropType : class
    {
        throw new InvalidOperationException("This method is only intended for use with DataServiceQueryExtensions.Expand to generate expressions trees"); // no actually using this - just want the expression!
    }
}

 

References:

1: Creating Data-Centric Web Applications With Silverlight 2, Shawn Wildermuth. MSDN Magazine, September 2008. http://msdn.microsoft.com/en-us/magazine/cc794279.aspx

2 ADO.Net Data Services team blog. http://blogs.msdn.com/astoriateam/

3 ADO.Net Data Services homepage on MSDN. http://msdn.microsoft.com/en-us/data/bb931106.aspx

4 Mike Taulty has a number of blog posts on ADO.Net Data Services. http://mtaulty.com/CommunityServer/blogs/mike_taultys_blog/archive/category/1027.aspx

Improving ObjectQuery&lt;T&gt;.Include

** UPDATE: There’s a bug in the code below – see this post for the update!

One of the great features of LINQ To SQL and LINQ To Entities is that the queries you write are checked by the compiler, which eliminates typing errors in your query. Unfortunately, the ObjectQuery<T>.Include function (which is used to eager-load data that isn’t directly included in the query) takes a string parameter, opening up opportunities for typos to creep back in. In this post I’ll present some sample code that illustrates one way that you can work round this. To start with, let’s take a quick look at a query against an entity model on the Northwind database.

var query = from customer in context.Customers
            where customer.Orders.Count > 0
            select customer;

This query simply retrieves customers with an order. If the code that uses the query results then needs to make use of the order data, it won’t have been loaded. We can use the Include function to ensure that this data is loaded up front:

var query = from customer in context.Customers.Include("Orders")
            where customer.Orders.Count > 0
            select customer;

Notice the Include(“Orders”) call that we’ve inserted which instructs Entity Framework to retrieve the Orders for each Customer. It would be much nicer if we could use a lambda expression to specify what property to load:

var query = from customer in context.Customers.Include(c => c.Orders)
            where customer.Orders.Count > 0
            select customer;

It turns out that this is very easy to achieve by using an extension method:

public static class ObjectQueryExtensions
{
    public static ObjectQuery<TSource> Include<TSource, TPropType>(this ObjectQuery<TSource> source, Expression<Func<TSource, TPropType>> propertySelector)
    {
        MemberExpression memberExpression = propertySelector.Body as MemberExpression;
        if (memberExpression== null)
        {
            throw new InvalidOperationException("Expression must be a member expression" + propertySelector);
        }
        MemberInfo propertyInfo = memberExpression.Member;
        return source.Include(propertyInfo.Name);
    }
}

This Include extension method allows the query syntax above with the lambda expression. When the Include method is called, it inspects the Expression Tree. If the method is used as intended, the tree will describe a accessing a member of the TSource class. We can then us the name of the member to call the original Include function.

Whilst this solves the problem as I described it above, what if the code consuming the query also needed the order details? With the original Include function we can write

var query = from customer in context.Customers.Include("Orders.Order_Details") 
            where customer.Orders.Count > 0
            select customer;

Notice that we can pass a path to the properties to include. The extension method we wrote doesn’t give us a way to handle this case. I imagine using something like the syntax below to describe this situation

var query = from customer in context.Customers.Include(c => c.Orders.SubInclude(o => o.Order_Details))
            where customer.Orders.Count > 0
            select customer;

Here, we’ve specified that we want to include Orders, and then also that we want to include Order_Details. Adding this support is a bit more code, but not too bad:

public static class ObjectQueryExtensions
{
    public static ObjectQuery<TSource> Include<TSource, TPropType>(this ObjectQuery<TSource> source, Expression<Func<TSource, TPropType>> propertySelector)
    {
        string includeString = BuildString(propertySelector);
        return source.Include(includeString);
    }
    private static string BuildString(Expression propertySelector)
    {
        switch(propertySelector.NodeType)
        {
            case ExpressionType.Lambda:
                LambdaExpression lambdaExpression = (LambdaExpression)propertySelector;
                return BuildString(lambdaExpression.Body);

            case ExpressionType.Quote:
                UnaryExpression unaryExpression= (UnaryExpression)propertySelector;
                return BuildString(unaryExpression.Operand);

            case ExpressionType.MemberAccess:
                MemberInfo propertyInfo = ((MemberExpression) propertySelector).Member;
                return propertyInfo.Name;

            case ExpressionType.Call:
                MethodCallExpression methodCallExpression = (MethodCallExpression) propertySelector;
                if (IsSubInclude(methodCallExpression.Method)) // check that it's a SubInclude call
                {
                    // argument 0 is the expression to which the SubInclude is applied (this could be member access or another SubInclude)
                    // argument 1 is the expression to apply to get the included property
                    // Pass both to BuildString to get the full expression
                    return BuildString(methodCallExpression.Arguments[0]) + "." +
                           BuildString(methodCallExpression.Arguments[1]);
                }
                // else drop out and throw
                break;
        }
        throw new InvalidOperationException("Expression must be a member expression or an SubInclude call: " + propertySelector.ToString());

    }

    private static readonly MethodInfo[] SubIncludeMethods;
    static ObjectQueryExtensions()
    {
        Type type = typeof (ObjectQueryExtensions);
        SubIncludeMethods = type.GetMethods().Where(mi => mi.Name == "SubInclude").ToArray();
    }
    private static bool IsSubInclude(MethodInfo methodInfo)
    {
        if (methodInfo.IsGenericMethod)
        {
            if (!methodInfo.IsGenericMethodDefinition)
            {
                methodInfo = methodInfo.GetGenericMethodDefinition();
            }
        }
        return SubIncludeMethods.Contains(methodInfo);
    }

    public static TPropType SubInclude<TSource, TPropType>(this EntityCollection<TSource> source, Expression<Func<TSource, TPropType>> propertySelector)
        where TSource : class, IEntityWithRelationships
        where TPropType : class
    {
        throw new InvalidOperationException("This method is only intended for use with ObjectQueryExtensions.Include to generate expressions trees"); // no actually using this - just want the expression!
    }
    public static TPropType SubInclude<TSource, TPropType>(this TSource source, Expression<Func<TSource, TPropType>> propertySelector)
        where TSource : class, IEntityWithRelationships
        where TPropType : class
    {
        throw new InvalidOperationException("This method is only intended for use with ObjectQueryExtensions.Include to generate expressions trees"); // no actually using this - just want the expression!
    }
}

This code still has the Include method with the original signature, and adds a couple of SubInclude extension methods. You can see that the code to extract the property name has been pulled out into a separate method (BuildString). This now also handles some additional NodeTypes so that we can handle the SubInclude calls inside the Include call. There are some checks in to ensure that we are dealing with the SubInclude calls at this point (using the IsSubInclude method). With this code, we can write the previous query as well as:

var query = from customer in context.Customers.Include(c => c.Orders.SubInclude(o => o.Order_Details).SubInclude(od=>od.Products))
            where customer.Orders.Count > 0
            select customer;

This query includes the Orders, Order Details, and Products as if we’d called Include(“Orders.Order_Details.Product”) and in fact this is what the code will do! Additionally, it doesn’t matter whether you chain the SubInclude calls (as above) or nest them:

var query = from customer in context.Customers.Include(c => c.Orders.SubInclude(o => o.Order_Details.SubInclude(od => od.Products)))
            where customer.Orders.Count > 0
            select customer;

Both of these queries have the same effect, so it’s up to you which style you prefer.

The code isn’t production ready (and as always, is subject to the standard disclaimer: “These postings are provided "AS IS" with no warranties, and confer no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm”). However, there are a couple of other features that I think are worth briefly mentioning:

  • The IsSubInclude function works against a cached MethodInfos for the SubInclude methods. Because these methods are generic, we have to get the generic method definition to test for comparison
  • The SubInclude functions are not intended to be called at runtime - they are purely there to get the compiler to generate the necessary expression tree!
  • Generic type inference is at work when we write the queries. Notice that we could write Include(c => c.Orders) rather than Include<Customer>(c => c.Orders). Imagine how much less readable the code would become, especially when including multiple levels.

I found it quite interesting putting this code together as it pulls in Extension Methods, LINQ To Entities and Expression Trees. The inspiration for this came from the LinkExtensions.ActionLink methods in ASP.Net MVC framework which do the same sort of thing for ActionLinks. I'’m not sure if I’m entirely satisfied with the syntax for including multiple levels, but it is the best I’ve come up with so far. If you’ve any suggestions for how to improve it then let me know!

Posted by stuartle | 11 Comments

A closer look at yield – part 3

This was only going to be two posts, but after my last post I’d been mulling over a post that looks at the compiler generated code in a more general way. Whilst catching up on blogs posts this morning I saw that Raymond Chen has written a blog post entitled ‘The implementation of iterators in C# and its consequences (part 1)’ that does a better job than I would have! Better still, it looks like there’s more to come…

Posted by stuartle | 0 Comments
Filed under:

A closer look at yield – part 2

In part 1, we took a quick tour of the yield keyword. In this post we’re going to have a look at the code that the compiler generates for us when we use yield. We’ll return to the first example from last time and insert a Console.WriteLine before the yield return statement:

private static readonly string[] StringValues = new string[] { "The", "quick", "brown", "fox", 
                            "jumped", "over", "the", "lazy", "dog" };
static IEnumerable<string> TestIterator()
{
    foreach (string value in StringValues)
    {
        Console.WriteLine("In iterator:{0}", value);
        yield return value;
    }
}

If we run the following code to execute the iterator, we’d now see the output shown below:

foreach(string value in TestIterator()) 
{ 
    Console.WriteLine("In foreach:{0}", value); 
}
Output:
In iterator:The
In foreach:The
In iterator:quick
In foreach:quick
In iterator:brown
In foreach:brown
...
 

It’s clear from this output that the TestIterator isn’t returning all of the values in one go and then returning control to the calling code. Conceptually, you can view it that the TestIterator method is iterating over the collection that it wants to return, temporarily returning control to the caller and then resuming where it left off. I recently fired up Reflector (http://www.aisto.com/roeder/dotnet/) to look at the generated code and was initially quite surprised. The most interesting part is the IEnumerator.MoveNext function, and here’s a roughly equivalent piece of code (tidied up for readability):

private bool MoveNext()
{
    switch (this.state)
    {
        case 0:
            this.state = -1;
            this.state = 1;
            this.Values = Program.StringValues;
            this.index = 0;
            while (this.index < this.Values.Length)
            {
                this.temp = this.Values[this.index];
                Console.WriteLine("In iterator:{0}", this.temp);
                this.current = this.temp;
                this.state = 2;
                return true;
            Label_0084:
                this.state = 1;
                this.index++;
            }
            this.state = -1;
            break;

        case 2:
            goto Label_0084;
    }
    return false;

}

The state member is initialised to 0, so when MoveNext is first called it drops into the first case block. This sets up the Values and index members and then starts iterating over the array. Once it has got the first item, it sets the state to 2, stores the value in current (which is then returned via IEnumerator.Current) and returns true so that the caller can process the result. When MoveNext is called the second time, the code drops into the second case block which causes it to jump to immediately after the return true statement (the Label_0084 line). This was the ‘aha’ moment for me – conceptually it is doing exactly what I described above: iterating over the list, returning each item (and control) to the caller, and then resuming exactly where it left off. It’s so brilliant that it almost shouldn’t work! Since all of the state is stored in member variables, when it comes in to the function on subsequent calls it can safely carry on processing. And when it gets to the end of the list it breaks out of the switch block and returns false (setting the state to –1 to ensure it keeps returning false if called again).

Now imagine the code above (and the other methods required to implement IEnumerator) and compare it to the TestIterator method at the top of the post. Which could you code up more quickly? And more importantly, which is more readable to you?

Posted by stuartle | 1 Comments
Filed under:

Using let in LINQ to Objects – Part 3

 

This is a follow-up to my two previous posts on the let keyword

The real reason for this post is to link to a great post that K.Scott Allen has just published : Optimizing LINQ Queries. In his post, he shows an example where using the let keyword decreases performance. More importantly, he makes the point that you shouldn’t prematurely optimise your queries and that you need to measure the performance. Although I was taking measurements in Part 2, I don’t think I did a good enough job of making this point in the post. So to make up for that: the let keyword is a tool in your toolbox. Make sure you measure the performance before you decide you need to optimise your queries, and make sure you measure them after any performance ‘improvements’ to verify the results. And if you haven’t read the Optimizing LINQ Queries post, do it now to find out why let didn’t improve performance...

Posted by stuartle | 0 Comments
Filed under: ,

A closer look at yield

The yield keyword in C# is pretty powerful and expressive, but it doesn’t seem to be very widely known about. In this post we’ll take a quick look at what yield does and then I’ll post a follow-up that looks at what the compiler generates for you. Let’s start by looking at a simple (and contrived) example:

private static readonly string[] StringValues = new string[] { "The", "quick", "brown", "fox", 
                            "jumped", "over", "the", "lazy", "dog" };
static IEnumerable<string> TestIterator()
{
    foreach(string value in StringValues)
    {
         yield return value;
    }
}

The return type for the TestIterator method is IEnumerable<string>, but you can see that we don’t have a return statement in the implementation. Instead, we’re using the yield return statement to return each item that we want the caller to operate on. The compiler automatically generates a class that implements IEnumerable<string> for us. We can call this function using the following code:

foreach(string value in TestIterator())
{
    Console.WriteLine("In foreach:{0}", value);
}

This code will iterate over the IEnumerable<string> instance that is returned from the TestIterator method. In this example we’ve simply iterated over an existing collection, so we’re not providing much functionality! A more interesting example would be for a binary tree such as:

class BinaryTree<T>
{
    public T Item { get; set; }
    public BinaryTree<T> Left { get; set; }
    public BinaryTree<T> Right { get; set; }
}

In this case, the yield keyword makes it a breeze to add IEnumerable support. First, we add IEnumerable<T> to the implemented interfaces and then we just need the code below to provide the implementation:

public IEnumerator<T> GetEnumerator()
{
    yield return Item;
    if (Left != null)
    {
        foreach (T t in Left)
        {
            yield return t;
        }
    }
    if (Right != null)
    {
        foreach (T t in Right)
        {
            yield return t;
        }
    }
}

This code returns the item for the current node and then recurses into the items from the left and right hand of the tree – how easy is that? This is one of the big advantages of the yield keyword: it allows you to write readable and concise code to produce an iterator.

Stay tuned for part 2, when we’ll take a look at the code that the compiler generates for us...

Posted by stuartle | 3 Comments
Filed under:

Using let in LINQ to Objects - Part 2

In my previous post, I looked at what the compiler generates when you use the let keyword in LINQ to Objects. This is a follow-up post slanted towards performance. To this end, I set up four tests:

        static void TestBaseline()
        {
            var q = from c in Customer.AllCustomers
                    select c;
            int count = q.Count();
        }
        static void TestWithAllocation()
        {
            var q = from c in Customer.AllCustomers
                    select new Customer() { CustomerID = c.CustomerID, CompanyName = c.CompanyName, 
ContactName = c.ContactName, ContactTitle = c.ContactTitle,
Address = c.Address, City = c.City, Country = c.Country }; int count = q.Count(); } static void TestWithSingleLet() { var q = from c in Customer.AllCustomers let customerId = c.CustomerID select new Customer() { CustomerID = customerId, CompanyName = c.CompanyName,
ContactName = c.ContactName, ContactTitle = c.ContactTitle,
Address = c.Address, City = c.City, Country = c.Country }; int count = q.Count(); } static void TestWithMultipleLet() { var q = from c in Customer.AllCustomers let customerId = c.CustomerID let companyName = c.CompanyName let contactName = c.ContactName let contactTitle = c.ContactTitle let address = c.Address let city = c.City let country = c.Country select new Customer() { CustomerID = customerId, CompanyName = companyName,
ContactName = contactName, ContactTitle = contactTitle,
Address = address, City = city, Country = country }; int count = q.Count(); }

The first of these tests is a simple count over a list of customers. The second adds in an allocation for comparison with the other tests. The third test adds a single let statement, and the final test goes all out with let statements to exaggerate the effect. I've deliberately kept the tests simply (e.g. avoided group by/where statements) to maximise the effect of the let statement in each case. I ran each test 10,000 times and got the following results:

Test Time per test (ms)
TestBaseLine 0.0051
TestWithAllocation 0.0085
TestSingleLet 0.0155
TestMultipleLet 0.0640

Comparing the test with allocation to the test with the multiple let statements, there is clearly an overhead to using the let keyword, but bear in mind that these tests have been designed to emphasise the effects of using let. Real-world queries are likely to include filtering/grouping etc that will diminish the relative effect of using let. Also, even with seven let keywords the query time is measured in hundreths of a millisecond. I wouldn't necessarily recommend using let everywhere but the performance overhead is likely to be minimal, so if it makes the code easier to read then it's probably worth it.

Aside from code readability, the let keyword can be used to increase performance. Suppose you have a value that is relatively expensive to calculate, but that needs to appear in the where clause multiple times:

        static decimal SumOrders(Customer customer)
        {
            var q = from order in customer.Orders
                    from Order_Detail orderDetail in order.Order_Details
                    select orderDetail.Quantity * orderDetail.UnitPrice;
            GC.KeepAlive(q);
            return q.Sum();
        }
        static void CalcNoLet()
        {
            var q = from c in AllCustomers
                    where SumOrders(c) < 10000 && SumOrders(c) > 1000
                    select c;
            int count = q.Count();
        }
        static void CalcWithLet()
        {
            var q = from c in AllCustomers
                    let expensiveValue = SumOrders(c)
                    where expensiveValue < 10000 && expensiveValue > 100
                    select c;
            int count = q.Count();
        }

In this example, we're summing the total value of the orders for each customer and only want customers where this value is between 1,000 and 10,000. Without let, this value gets calculated twice for each customer. We can use let to effectively give us query-local storage for the calculation so that it only performs the calculation once per customer. The results for these two queries are:

Test Time per test (ms)
CalcNoLet 0.9387
CalcWithLet 0.7352

The results show that, in this case, the cost of performing the calculation outweighs the cost of the extra query that is generated by the let statement. So despite the addition of the extra query, using let reduces the query execution time. An alternative approach is to define a Between function

        static bool Between(decimal value, decimal lower, decimal upper)
        {
                return value < upper && value < lower;
        }

and then re-write the query as

        static void CalcBetween()
        {
            var q = from c in AllCustomers
                    where Between(SumOrders(c), 1000, 10000)
                    select c;
            int count = q.Count();
        }

For comparison, this query came out as 0.7077ms/test, so it's marginally faster that the let query above but it shows that as you start making your query more complicated the overhead of let is less significant.

Posted by stuartle | 3 Comments
Filed under: ,

Using let in LINQ to Objects

I've been delving into LINQ to Objects recently (and enjoying it), but had missed the 'let' keyword. A colleague Rupert Benbrook(http://phazed.com) and I had been chatting about how to solve a particular issue using LINQ to Objects and he sent me a follow-up email with some code that used the 'let' keyword and I thought I'd take a closer look and see what happens under the covers.

To take a simple example, suppose we have int[] values. We could write the following query to double all the values

var query = from i in values
             select 2 * i;

With the let keyword, this could be rewritten as

var query = from i in values
        let doublei = 2 * i
        select doublei;

I'll give a slightly more interesting example in a moment, but this simple example lets us easily look at what actually gets generated. After firing up Reflector, it seems that the compiler treats the code above as though we'd typed

var query = from temp in
        (
            from i in values
            select new { i, doublei = 2 * i }
        )
        select temp.doublei;

So each time you add a let statement into your query, the compiler creates a new subquery that returns an anonymous type composed of the original value plus the new value specified by the let.

For a more interesting example of this, suppose we have IEnumerable<FileSystemInfo> fileSystemInfos, and that we want to pick out the files (i.e. ignore directories) where the size is under 1000 Bytes. We could use the following query

var query = from fileSystemInfo in fileSystemInfos
            where fileSystemInfo is FileInfo && ((FileInfo)fileSystemInfo).Length < 1000
            select (FileInfo) fileSystemInfo;

This code requires the cast to FileInfo in order to access the Length property, and again to ensure that we return the correct type in the query. Using let, we can perform the cast in a single place

var query = from fileSystemInfo in fileSystemInfos
        where fileSystemInfo is FileInfo
        let fileInfo = (FileInfo)fileSystemInfo
        where fileInfo.Length < 1000
        select fileInfo;

I think that this version is probably more readable, and if we had more constraints in the where clause that needed the FileInfo rather than FileSystemInfo then the let version would show a bigger difference. Again, the compiler seems to treat the above query as though it was written as

var query = from temp in
        (
            from fileSystemInfo in fileSystemInfos
            where fileSystemInfo is FileInfo
            select new { fileSystemInfo, fileInfo = (FileInfo)fileSystemInfo }
        )
        where temp.fileInfo.Length < 1000
        select temp.fileInfo;

Having used LINQ to Objects a reasonable amount, I found it interesting to come across a new keyword and was curious to find out how it works - hopefully this post gives an explanation!

Posted by stuartle | 5 Comments
Filed under: ,

Breaking the ice

Hi, my name is Stuart Leeks (as you may have guessed from the blog title!) and I'm an Application Development Consultant at Microsoft in the UK.

In keeping with the tradition of content-free first blog posts this is is a non-technical post. Future posts will be more technical in nature, and hopefully useful to someone (other than me). For a bit more info about me see the About page!

Now the legal bit:
These postings are provided "AS IS" with no warranties, and confer no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm

Posted by stuartle | 0 Comments
 
Page view tracker