My Name is
I'm also known as @writeline
I'm a Technical Evangelist @ Microsoft Germany and you can follow me on Twitter or visit my private blog stream.
Important Update: This post is not any more valid for current Entity Framework versions. Current Versions don't have this performance impact as seen here. This is out dated! Sorry for any inconvenience on applying this to current versions!
I was really shocked when I saw this blog post: Entity Framework and LINQ to SQL Performance. I decided to download the performance test solution and try the tests myself. I did not changed any of the tests so far, I just added for my own interest NHibernate into the game as I think that it is the choice of Frameworks for Domain Driven Design.
So, I was shocked again, as I saw my numbers in the chart:
The elapsed time was 4 times higher than with plain SqlDataReader. But more worse was the fact the the elapsed processor time was nearly 8 times higher. NHibernate was really good in performance compared to Entity Framework, but LINQ to SQL was still better.
What was wrong? I couldn’t imagine that we had such an issue in data access using the Entity Framework. I took a look at the code. All test runs are looping 10000 times over the access method to retrieve the customers data. In this loop, each time a connection will be created and closed. Linq to SQL and Linq to Entities are using compiled queries, so, this optimization was already done. I also added in my tests a pre-generated Entity Framework view (using edmgen.exe) to enhance startup performance. Still those poor results.
Here is the original code from the tests:
1: public static List<CustomerResults> GetCustomers()
3: using (NorthwindEntities ne = new NorthwindEntities())
5: return GetCustomersQuery(ne).ToList();
9: private static Func<NorthwindEntities, IQueryable<CustomerResults>>
10: GetCustomersQuery = CompiledQuery.Compile(
11: (NorthwindEntities ne) => from c in ne.Customers
12: select new CustomerResults
14: CustomerID = c.CustomerID,
15: CompanyName = c.CompanyName,
16: ContactName = c.ContactName,
17: ContactTitle = c.ContactTitle,
18: City = c.City,
19: Country = c.Country
My guess was that the instantiation of the connection string could be the problem. Why? Because several things happen here. First, the EntityConnection which is created when not specifying any argument in the constructor of the typed object context class NorthwindEntities, has to read from the config file. I/O cost. Second, it has to read the metadata information, which is stored by default in the assembly as resource. Second performance cost. Here is my optimization approach. One of the overload constructors is taking an EntityConnection object. Guess what, you probably just use this one connection object during the whole life time of your application, so why not create it once and give it into the constructor. This does not mean that I open the connection beforehand, no, I just create the type EntityConnection and retrieve the meta information in the first call. From that, I reuse this instance, so no need anymore to retrieve the metadata information again.
Here is the optimization in my code base:
1: private static EntityConnection entityConnection;
3: public static void InitializeConnectionString()
5: SqlConnectionStringBuilder sqlBuilder = new SqlConnectionStringBuilder();
6: sqlBuilder.DataSource = @"(local)\sqlexpress";
7: sqlBuilder.InitialCatalog = "NorthwindPerfTest";
8: sqlBuilder.IntegratedSecurity = true;
9: sqlBuilder.MultipleActiveResultSets = true;
12: EntityConnectionStringBuilder builder = new EntityConnectionStringBuilder();
13: builder.Provider = "System.Data.SqlClient";
14: builder.Metadata = "res://*/Customers.csdl|" +
15: "res://*/Customers.ssdl|" +
17: builder.ProviderConnectionString = sqlBuilder.ToString();
19: entityConnection = new EntityConnection(builder.ToString());
I have a static variable entityConnection which holds the reference on the object. In the constructor I just pass this reference:
3: using (NorthwindEntities ne = new NorthwindEntities(entityConnection))
Now I have difference numbers to show in my chart:
For this specific test scenario I cutted performance cost two times for the elapsed time and four times for the process elapsed time. Hope that helps.