The Scenario

Throughout the article, I will refer to the AdventureWorks database and use the following hypothetical customer scenarios:

  • Customer browses list of all products
  • Customer checks status of order through web portal showing all previous orders; detail for first order shown

Each step in the scenario requires different entities in different configurations; to keep explanations brief we will only use the Customer, SalesOrderHeader, SalesOrderDetail, and Product entities. The basis for which you tune is based on the business objects for each bullet of the scenario, starting with minimizing what data you really need.

Minimizing your Payload

The most straightforward way to accomplish item two of the scenario is to retrieve all orders for a given customer and display them in a table on the portal. The user clicks some header information (like the order identifier) and is taken to a detail page. Simple enough—the relationships of the entities and how the context traverses them is unchanged. The customer object is loaded, and when the page requests the order data for rendering, that data is also retrieved from the system. The fundamental issue involved here is we have to make two trips to the database to retrieve the entities: one for the customer, the other for their orders. How can this be made more efficient?

Data shaping with DataLoadOptions

The LINQ to SQL framework allows the developer to specify how eager or lazy the DataContext should be in loading related entities and materializing them. By default related entities (as specified in a 1:1 or 1:* relationship) are materialized "on demand", or lazily, when the property wrapping those entities is invoked. Conversely, a DataContext may also be configured to eagerly load related entities in an effort to reduce the amount of queries and data materialized for a given query (Microsoft Corporation, 2007) by loading all related when an entity is materialized and various configurations in between.

There is no absolute rule for configuring the DataContext one way or another, but is rather a function of what the object for the data is. An example of the differing configurations is evident in the various scenarios above. In scenario one, the customer does not need to know anything about their previous orders while placing a new order. In scenario two, the customer needs their order header information, but not the details. In scenario three, we need to know the customer, the order header information, and the detail.

The mechanism to control this is called DataLoadOptions and can be associated with a DataContext to control entity materialization aggressiveness.

Projections and change tracking

Additionally a projection could be made as a result of a series of joins to provide a flattened, read-only view of the data. Unlike Entities, which have identities and can be modified, projections cannot be persisted and are represented in denormalized views.

Comparing LINQ to SQL performance with ADO.NET

ADO.NET maintains only the transfer and fundamental materialization of CLR types to represent data and not classes representing the entities themselves. With less overhead the raw speed shows fairly impressive numbers at the cost of the development convenience.

Method

Median

Mean

Standard Deviation

Minimum

Maximum

DLINQ: no change tracking/compiled/projection

3407.861

3387.762

200.316

3073.884

3644.475

ADO.NET: DataReader

3461.938

3543.22

232.798

3453.19

4240.387

DLINQ: no change tracking/compiled

4978.377

5018.974

130.961

4949.884

5407.276

DLINQ: no change tracking

4978.478

5010.796

100.207

4940.606

5296.035

DLINQ

5381.326

5398.57

54.385

5340.671

5543.181

DLINQ: compiled

5396.588

5462.503

154.085

5351.099

5786.155

ADO.NET: DataSet

6198.333

6276.153

162.064

6146.066

6604.432

Table 1 Scenario 1: read only list of products

The table shows results for what we might expect, about 3.5 milliseconds to load the products from the database using an ADO.NET DataReader and about 6.2 milliseconds using an ADO.NET DataSet. The data also shows that while a simple one entity LINQ to SQL object graph does not perform better than an ADO.NET DataReader, a projection that bypasses the identity cache performs just as well. For comparison let us examine the code required to perform the action, one as the traditional ADO.NET DataReader, the other as the LINQ to SQL projection.

List<object[]> productValues = new List<object[]>(); 
using( var connection = new SqlConnection( LinqToSqlPerf.Properties.Settings.Default.AdventureWorksConnectionString ) )
{
using( var command = new SqlCommand( "SELECT [t0].[ProductID], [t0].[Name],
[t0].[ProductNumber], [t0].[MakeFlag], [t0].[FinishedGoodsFlag], [t0].[Color],
[t0].[SafetyStockLevel], [t0].[ReorderPoint], [t0].[StandardCost],
[t0].[ListPrice], [t0].[Size], [t0].[SizeUnitMeasureCode],
[t0].[WeightUnitMeasureCode], [t0].[Weight], [t0].[DaysToManufacture],
[t0].[ProductLine], [t0].[Class], [t0].[Style], [t0].[ProductSubcategoryID],
[t0].[ProductModelID], [t0].[SellStartDate], [t0].[SellEndDate],
[t0].[DiscontinuedDate], [t0].[rowguid], [t0].[ModifiedDate] FROM
[Production].[Product] AS [t0]"
) )
{
command.Connection = connection;
connection.Open();

var dr = command.ExecuteReader( CommandBehavior.CloseConnection );

while( dr.Read() )
{

var productRowValues = new object[ dr.FieldCount ];
dr.GetValues( productRowValues );
productValues.Add( productRowValues );
}

connection.Close();
}
}

Figure 1 Scenario 1 product retrieval as ADO.NET DataReader

The resulting implementation is 11 lines of code with an embedded TSQL statement.


using( var context = new DataClasses1DataContext() ) 
{
context.DeferredLoadingEnabled = false;
context.ObjectTrackingEnabled = false;

var productsProjection = from product in context.GetScaledDownProductsFast()
select new
{
ProductID = product.ProductID,
Name = product.Name,
Number = product.ProductNumber,
Color = product.Color,
ListPrice = product.ListPrice,
Size = product.Size,
Weight = product.Weight,
Style = product.Style
};

var products = productsProjection.ToArray();
}

Figure 2 Scenario 1 product retrieval as LINQ to SQL projection

The resulting implementation 5 lines of code without an embedded TSQL statement.

For the second scenario, we require joins on all four tables and require read only views.

Method

Median

Mean

Standard Deviation

Minimum

Maximum

DLINQ: data load options (details)/compiled

1313.25

1433.731

255.303

1285.668

1975.581

DLINQ: data load options (details)/no change tracking/compiled

1362.623

1405.075

158.995

1302.991

1874.912

DLINQ: data load options (details)/no change tracking/compiled/cache

1554.027

1659.795

284.601

1392.884

2289.98

ADO.NET: DataReader

1614.67

1694.006

319.316

1379.959

2373.989

ADO.NET: DataSet

1940.488

2072.922

306.141

1830.352

2871.116

DLINQ: data load options (headers)/compiled

6112.38

6213.204

472.648

5784.032

7345.794

DLINQ: compiled

8458.7

8693.848

494.015

8070.511

9690.908

DLINQ: data load options (headers)

8657.555

8685.617

135.128

8404.756

8931.598

DLINQ

9148.816

9371.009

509.871

8833.747

10461.56

DLINQ: data load options (details)/no change tracking

12069.17

12212.82

252.35

11975.24

12714.18

DLINQ: data load options (details)/no change tracking/cache

12176.1

12330.98

335.642

12136.32

13297.72

DLINQ: data load options (details)

12821.73

12903.43

394.513

12476.51

13517.67

Table 2 Scenario 2: Read only customer and order entities

The lines in Table 2 in grey show load options optimized for read-only scenarios, with the traditional DataSet implementation included for reference. Given the data for both scenarios several important observations can be made:

  1. Compilation of a query on a simple object graph has little impact to the performance of the query
  2. The object graph depth at which optimizations are performed matters significantly
  3. Not registering objects with the change tracker affects overall performance significantly
  4. Optimized correctly, LINQ to SQL performance meets expectations

In the first observation, we see that for scenario the second query compilation positively affects the performance of data loading such that when the query is compiled into a variable, along with any methods required, for later use. When the resulting expression is invoked, the cost for preparing the object graph and associated methods has already been performed (some elements of the query can be changed for variables), but no jitting is performed, resulting in gains of over 85.6% when compared to a straight LINQ to SQL implementation.

Additionally we find that the level that the data load options are placed in the object graph dramatically affects the performance of the query. The leading LINQ to SQL scenario has optimizations placed on the relationship between the Customer entity and the SalesOrderDetails entity such that when a Customer entity is loaded so are the related sales order headers, the order details, and the resulting products. Table 2 shows the outcome of placing the load optimization on the details resulting in a performance increase over other LINQ to SQL methods with different or no optimizations.

In the third observation, we also see that in instances where the context is not tracking object changes (and results are read-only) the code can execute faster, although in some instances not significantly. For the first scenario, toggling the change tracking feature resulted in about 7.5% performance gain. However, we also see that changing the query to use a projection, reducing the amount of data materialized, in addition to disabling change tracking, resulted in further gains—consistently exceeding the ADO.NET DataReader performance.

As another point of comparison, the two fastest implementations from ADO.NET and LINQ to SQL are below.

object[] customerValues = null; 
List<object[]> salesHeaderValues = null;
List<object[]> salesDetailValues = null;
List<object[]> salesDetailProductValues = null;

using( var connection = new SqlConnection(AdventureWorksConnectionString ) )
{
using( var command = new SqlCommand( "SELECT [t0].[CustomerID], [t0].[AccountNumber], [t0].[rowguid], [t0].[ModifiedDate] FROM [Sales].[Customer] AS [t0] WHERE [t0].[CustomerID] = @p0" ) )
{
command.Parameters.AddWithValue( "@p0", 29475 );
command.Connection = connection;
connection.Open();

var dr = command.ExecuteReader( System.Data.CommandBehavior.CloseConnection );
customerValues = new object[ dr.FieldCount ];

while( dr.Read() )
{
dr.GetValues( customerValues );
}
connection.Close();
}
}

using( var connection = new SqlConnection(AdventureWorksConnectionString ) )
{
using( var command = new SqlCommand( "SELECT [t0].[SalesOrderID], [t0].[RevisionNumber],
[t0].[OrderDate], [t0].[DueDate], [t0].[ShipDate], [t0].[Status],
[t0].[OnlineOrderFlag], [t0].[SalesOrderNumber], [t0].[PurchaseOrderNumber],
[t0].[AccountNumber], [t0].[CustomerID], [t0].[SubTotal], [t0].[TaxAmt],
[t0].[Freight], [t0].[TotalDue], [t0].[Comment], [t0].[rowguid],
[t0].[ModifiedDate] FROM [Sales].[SalesOrderHeader] AS [t0] WHERE
[t0].[CustomerID] = @x1"
) )
{
command.Connection = connection;
command.Parameters.AddWithValue( "@x1", customerValues[ 0 ] );
connection.Open();

var dr = command.ExecuteReader( System.Data.CommandBehavior.CloseConnection );
salesHeaderValues = new List<object[]>( 16 );
while( dr.Read() )
{
var salesHeaderRowValues = new object[ dr.FieldCount ];
dr.GetValues( salesHeaderRowValues );
salesHeaderValues.Add( salesHeaderRowValues );
}
connection.Close();
}
}

using( var connection = new SqlConnection(AdventureWorksConnectionString ) )
{
using( var command = new SqlCommand( "SELECT [t0].[SalesOrderID],
[t0].[SalesOrderDetailID], [t0].[CarrierTrackingNumber], [t0].[OrderQty],
[t0].[ProductID], [t0].[UnitPrice], [t0].[LineTotal], [t0].[rowguid],
[t0].[ModifiedDate] FROM [Sales].[SalesOrderDetail] AS [t0] WHERE
[t0].[SalesOrderID] = @p0"
) )
{
command.Connection = connection;
command.Parameters.AddWithValue( "@p0", salesHeaderValues[ 0 ][ 0 ] );
connection.Open();

var dr = command.ExecuteReader( System.Data.CommandBehavior.CloseConnection );
salesDetailValues = new List<object[]>( 16 );
while( dr.Read() )
{
var salesDetailRowValues = new object[ dr.FieldCount ];
dr.GetValues( salesDetailRowValues );
salesDetailValues.Add( salesDetailRowValues );
}
connection.Close();
}
}

using( var connection = new SqlConnection(AdventureWorksConnectionString ) )
{
using( var command = new SqlCommand( "SELECT [t0].[ProductID], [t0].[Name],
[t0].[ProductNumber], [t0].[MakeFlag], [t0].[FinishedGoodsFlag], [t0].[Color],
[t0].[SafetyStockLevel], [t0].[ReorderPoint], [t0].[StandardCost],
[t0].[ListPrice], [t0].[Size], [t0].[SizeUnitMeasureCode],
[t0].[WeightUnitMeasureCode], [t0].[Weight], [t0].[DaysToManufacture],
[t0].[ProductLine], [t0].[Class], [t0].[Style], [t0].[ProductSubcategoryID],
[t0].[ProductModelID], [t0].[SellStartDate], [t0].[SellEndDate],
[t0].[DiscontinuedDate], [t0].[rowguid], [t0].[ModifiedDate] FROM
[Production].[Product] AS [t0] WHERE [t0].[ProductID] = @p0"
) )
{
command.Connection = connection;
command.Parameters.AddWithValue( "@p0", salesDetailValues[ 0 ][ 4 ] );
connection.Open();


var dr = command.ExecuteReader( System.Data.CommandBehavior.CloseConnection );
salesDetailProductValues = new List<object[]>( 16 );
while( dr.Read() )
{
var salesDetailProductRowValues = new object[ dr.FieldCount ];
dr.GetValues( salesDetailProductRowValues );
salesDetailProductValues.Add( salesDetailProductRowValues );
}
connection.Close();
}
}

Figure 3 Scenario 2 using ADO.NET DataReader

Much like the previous example, the ADO.NET version is more verbose and includes embedded TSQL statements for data retrieval, totaling nearly 50 lines of code.

public static readonly Func<AdventureWorksDataContext, int, Customer> CustomerWithSalesOrderDetails = 
CompiledQuery.Compile(
( AdventureWorksDataContext context, int identifier ) =>
context.Customers.Single( c => c.CustomerID == identifier )
);

using( var context = new DataClasses1DataContext() )
{
var dlo = new DataLoadOptions();
dlo.LoadWith<Customer>( c => c.SalesOrderHeaders );
dlo.LoadWith<SalesOrderHeader>( soh => soh.SalesOrderDetails );
dlo.LoadWith<SalesOrderDetail>( sod => sod.Products );

context.ObjectTrackingEnabled = false;
context.DeferredLoadingEnabled = false;
context.LoadOptions = dlo;

customer = CustomerWithSalesOrderDetails(context, 29475);
headers = customer.SalesOrderHeaders.ToArray();
header = headers.First();
details = header.SalesOrderDetails.ToArray();
products = details.Select( sod => sod.Products ).ToArray();
}

Figure 4 Scenario 2 using LINQ to SQL

Conversely, the same result is achieved with 14 lines of code with LINQ to SQL.

Caching

To this point, all of the tests and observations have been made against contexts with short lifetimes and in our scenarios, it has worked fine. However, in some other scenarios having to load fairly static data repeatedly may produce unnecessary overhead and slow the performance of the application down.

To optimize for this, LINQ to SQL includes an identity cache such that when an entity is retrieved it is automatically inserted into this cache. If you run another query where results overlap with an entity that has already been materialized in the same context, the entity is returned from the identity cache instead of the source data store except if using a projection.

Let us extend the second scenario such that the customer calls into a call center to check the status of the order, where an operator uses a Windows application to view/update the order on behalf of the customer. Given that, operators are reusing the same reference data repeatedly (such as product information, country, region, state, and province); it seems inefficient to reload the data for every screen that requires it.

Custom Object Pooling

To work around these issues, a crude object pool was created using the HttpRuntime.Cache object to store reference information. Additionally some modifications were also made to the objects to accommodate lazy lookups of reference data without violating the integrity of the developer experience.

[Association(Name="SalesOrderDetail_Product", Storage="_Products", ThisKey="ProductID", OtherKey="ProductID", IsUnique=true, IsForeignKey=false)] 
public Product Product
{
get { return _Product.Entity; }
set {
var previousValue = _Product.Entity;
if ((previousValue != value) || (_Product.HasLoadedOrAssignedValue == false))
{
SendPropertyChanging();

if ((previousValue != null))
{
_Product.Entity = null;
previousValue.SalesOrderDetail = null;
}

_Product.Entity = value;
if ((value != null))
{
value.SalesOrderDetail = this;
}

SendPropertyChanged("Product");
}
}
}

Figure 5 Entity relationship for LINQ to SQL


public Product Product2 
{
get { return EntityCache.GetOrAdd<Product>( p => p.ProductID == ProductID ).FirstOrDefault(); }
set {
SendPropertyChanging();

if( value == null )
{
ProductID = 0;
}
else
{
ProductID = value.ProductID;
SendPropertyChanged( "Product2" );
}
}
}

Figure 6 Entity relationship for cache

The EntityCache type, where the data is loaded from a context and stored in the local cache object, performs the work. The property accessors were updated on the entity to use the different means of managing the entity relationships due to issues with associating entities from multiple contexts together. Since LINQ to SQL really translates strongly typed entity properties to foreign keys, the keys are manipulated manually in the updated property.

The data showing the overhead of such an operation is shown in Table 2, where the caching method is 12% slower, but results in less data retransmitted over the network (each product in this case is about 1KB) and still performs the same or better than a straight data reader. While the performance degradation over 1KB of data is hardly persuasive, applied to our extension of scenario 2 where operators are loading the same reference data repeatedly may result in significant performance gains.

Technorati Tags: , , , , , , ,