<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Rico Mariani's Performance Tidbits : databases</title><link>http://blogs.msdn.com/ricom/archive/tags/databases/default.aspx</link><description>Tags: databases</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Linq Compiled Queries Q &amp; A</title><link>http://blogs.msdn.com/ricom/archive/2008/08/25/linq-compiled-queries-q-a.aspx</link><pubDate>Mon, 25 Aug 2008 22:21:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8894803</guid><dc:creator>ricom</dc:creator><slash:comments>5</slash:comments><comments>http://blogs.msdn.com/ricom/comments/8894803.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ricom/commentrss.aspx?PostID=8894803</wfw:commentRss><description>&lt;P&gt;I did a series of postings on &lt;A href="http://blogs.msdn.com/ricom/archive/2007/06/22/dlinq-linq-to-sql-performance-part-1.aspx" mce_href="http://blogs.msdn.com/ricom/archive/2007/06/22/dlinq-linq-to-sql-performance-part-1.aspx"&gt;Linq Compiled Queries&lt;/A&gt; last year, I recently got some questions on those postings that I thought would be of general interest.&lt;/P&gt;
&lt;P&gt;Q1:&lt;/P&gt;
&lt;P&gt;Why use the 'new' keyword in this snippet?&lt;/P&gt;
&lt;P&gt;var q = from o in nw.Orders &lt;BR&gt;select &lt;STRONG&gt;new &lt;/STRONG&gt;{o.everything …};&lt;/P&gt;
&lt;P&gt;A:&lt;/P&gt;
&lt;P&gt;If you did just :&lt;/P&gt;
&lt;P&gt;var q = from o in nw.Orders &lt;BR&gt;select o;&lt;/P&gt;
&lt;P&gt;You're getting editable orders. Linq then has to track them in case you change them and want to submit the changes. If you use new effectively you're making a copy of the orders that is not going to be change tracked. That's faster for read only cases. The other thing you can do is mark the query context as read-only and then you get the same effect.&amp;nbsp; When I wrote that test case, that feature wasn't available yet so I used &lt;STRONG&gt;new &lt;/STRONG&gt;to simulate it.&lt;/P&gt;
&lt;P&gt;Q2:&amp;nbsp; &lt;/P&gt;
&lt;P&gt;What do you mean when you say that linq will 'Create custom methods that bind the data perfectly' ?&lt;/P&gt;
&lt;P&gt;A:&lt;/P&gt;
&lt;P&gt;Whenever you use linq to sql to read data from a database it has to do two important things for you. The first is convert your Linq query into SQL. The second is to make a method that takes the stream of data that comes back from the database and converts it into the managed objects you required. That's the data-binding step. Linq creates the necessary methods automatically, and it makes the perfect code for doing this.&lt;/P&gt;
&lt;P&gt;Q3:&lt;/P&gt;
&lt;P&gt;How did Linq to SQL beat your ADO.Net code for insert times.&amp;nbsp; Shouldn't a tie be the best possible result?&lt;/P&gt;
&lt;P&gt;A:&lt;/P&gt;
&lt;P&gt;The SQL I used in my test case was pretty much the standard simplest SQL you would use for such a job. The automatically generated SQL from Linq was better than what I wrote by hand because they had parameterized the insert statements which I never bothered to do. Had I changed my SQL to what they created it would have been a tie. This is kind of like when the C++ compiler finds a machine code pattern that is better than what you would have written doing it by hand because it did something you don't usually bother doing with hand tuned machine code. But you *could* replace what you wrote with what the compiler generated.&lt;/P&gt;
&lt;P&gt;Q4: &lt;/P&gt;
&lt;P&gt;What are the downsides to precompiled queries?&lt;/P&gt;
&lt;P&gt;A:&lt;/P&gt;
&lt;P&gt;There is no penalty to precompiling (&lt;A href="http://blogs.msdn.com/ricom/archive/2008/01/11/performance-quiz-13-linq-to-sql-compiled-queries-cost.aspx" mce_href="http://blogs.msdn.com/ricom/archive/2008/01/11/performance-quiz-13-linq-to-sql-compiled-queries-cost.aspx"&gt;see Quiz #13&lt;/A&gt;). The only way you might lose performance is if you precompile a zillion queries and then hardly use them at all -- you'd be wasting a lot of memory for no good reason.&amp;nbsp; &lt;/P&gt;
&lt;P&gt;But measure :)&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8894803" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ricom/archive/tags/performance/default.aspx">performance</category><category domain="http://blogs.msdn.com/ricom/archive/tags/design+advice/default.aspx">design advice</category><category domain="http://blogs.msdn.com/ricom/archive/tags/databases/default.aspx">databases</category></item><item><title>Performance Quiz #13 -- Linq to SQL compiled query cost -- solution</title><link>http://blogs.msdn.com/ricom/archive/2008/01/14/performance-quiz-13-linq-to-sql-compiled-query-cost-solution.aspx</link><pubDate>Mon, 14 Jan 2008 20:51:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7110071</guid><dc:creator>ricom</dc:creator><slash:comments>18</slash:comments><comments>http://blogs.msdn.com/ricom/comments/7110071.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ricom/commentrss.aspx?PostID=7110071</wfw:commentRss><description>&lt;P&gt;Well is there really a "solution" at all in general?&amp;nbsp; This particular case I think I constrained enough that you can claim an answer but does it generalize?&amp;nbsp; Let's look at what I got first, the raw results are pretty easy to understand.&lt;/P&gt;
&lt;P&gt;The experiment I conducted was to run a fixed number of queries (5000 in this case) but to break them up so that the compiled query was reused a decreasing amount.&amp;nbsp; The first run is the "best" 1 batch of 5000 selects all using the compiled query.&amp;nbsp; Then 2 batches of 2500, and so on down to 5000 batches of 1.&amp;nbsp; As a control I also run the uncompiled case at each step expecting of course that it makes no difference.&amp;nbsp; Note the output indicates we selected a total of 25000 rows of data -- that is 5 per select as expected.&amp;nbsp; Here are the raw results:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Testing 1 batches of 5000 selects &lt;BR&gt;5000 selects uncompiled 9200.0ms 25000 records total 543.48 selects/sec &lt;BR&gt;5000 selects compiled 5401.0ms 25000 records total 925.75 selects/sec &lt;/P&gt;
&lt;P&gt;Testing 2 batches of 2500 selects &lt;BR&gt;5000 selects uncompiled 9181.0ms 25000 records total 544.60 selects/sec &lt;BR&gt;5000 selects compiled 5402.0ms 25000 records total 925.58 selects/sec &lt;/P&gt;
&lt;P&gt;Testing 5 batches of 1000 selects &lt;BR&gt;5000 selects uncompiled 9169.0ms 25000 records total 545.32 selects/sec &lt;BR&gt;5000 selects compiled 5432.0ms 25000 records total 920.47 selects/sec &lt;/P&gt;
&lt;P&gt;Testing 100 batches of 50 selects &lt;BR&gt;5000 selects uncompiled 9184.0ms 25000 records total 544.43 selects/sec &lt;BR&gt;5000 selects compiled 5511.0ms 25000 records total 907.28 selects/sec &lt;/P&gt;
&lt;P&gt;Testing 1000 batches of 5 selects &lt;BR&gt;5000 selects uncompiled 9166.0ms 25000 records total 545.49 selects/sec &lt;BR&gt;5000 selects compiled 6526.0ms 25000 records total 766.17 selects/sec &lt;/P&gt;
&lt;P&gt;Testing 2500 batches of 2 selects &lt;BR&gt;5000 selects uncompiled 9165.0ms 25000 records total 545.55 selects/sec &lt;BR&gt;5000 selects compiled 7892.0ms 25000 records total 633.55 selects/sec &lt;/P&gt;
&lt;P&gt;Testing 5000 batches of 1 selects &lt;BR&gt;5000 selects uncompiled 9157.0ms 25000 records total 546.03 selects/sec &lt;BR&gt;5000 selects compiled 10825.0ms 25000 records total 461.89 selects/sec&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;And there you have it.&amp;nbsp; Even at 2 uses the compiled query still wins but at 1 use it loses.&amp;nbsp; In fact, the magic number for this particular query is about 1.5 average uses to break even.&amp;nbsp; But why?&amp;nbsp; And how might it change?&lt;/P&gt;
&lt;P&gt;Well, as has been observed in the comments, Linq query compilation isn't like regular expression compilation.&amp;nbsp; In fact compiling the query doesn't do anything that isn't going to have to happen anyway.&amp;nbsp; In fact, actually creating the compiled query with Query.Compile hardly does anything at all, it's all deferred until the query is run just as it would have been had the query not been compiled.&amp;nbsp; So what is the overhead?&amp;nbsp; Why is it slower at all?&amp;nbsp; And what's the point of it?&lt;/P&gt;
&lt;P&gt;Well the main purpose of that compiled query object is to have an object, of the correct type, that also has the correct lifetime.&amp;nbsp; The compiled query can live across DataContexts, in fact it could potentially live for the entire life of your program.&amp;nbsp; And since it has no shared state in it, it's thread-safe and so forth.&amp;nbsp; It exists to:&lt;/P&gt;
&lt;P&gt;1) Give the Linq to SQL system a place to store the results of analyzing the query (i.e. the actual SQL plus the delegate that will be used to extract data from the result set)&lt;/P&gt;
&lt;P&gt;2) Allow the user to specify the "variable parts" of the query.&amp;nbsp; The most common case isn't that the query is exactly the same from run to run, usually it's "nearly" the same... That is it's the same except that perhaps the search string is different in the where clause, or the ID being fetched is different.&amp;nbsp; The shape is the same.&amp;nbsp; Creating a delegate with parameters allows you to specify which things are fixed and which things are variable.&lt;/P&gt;
&lt;P&gt;Now there was some debate about how to make compiled queries durable, automatically caching them was considered, but this was something I was strongly against.&amp;nbsp; Largely because of the object lifetime issues it would cause.&amp;nbsp; First, you would have to do complicated matching of a created query against something that was already in the cache -- something I'd like to avoid.&amp;nbsp; Secondly you have to decide where to store the cache, if you associate it with the DataContext then you get much less query re-use because you only get a benefit if you run the same query twice in the same data context.&amp;nbsp; To get the most benefit you want to be able to re-use the query across DataContexts.&amp;nbsp; But then, do you make the cache global?&amp;nbsp; If you do you have threading issues accessing it, and you have the terrible problem that you don't know when is a good time to discard items from the cache.&amp;nbsp; Ultimately this was my strongest point, at the Linq data level we do not know enough about the query patterns to choose a good caching policy, and, as I've written many times before, when it comes to caching good policy is crucial.&amp;nbsp; In fact, analogously, we had to make changes in the regular expression caching system back in Whidbey precisely because we were seeing cases where our caching assumptions were resulting in catastrophically bad performance (Mid Life Crisis due to retained compiled regular expressions in our cache) --&amp;nbsp; I didn't want to make that mistake again.&lt;/P&gt;
&lt;P&gt;So that's roughly how we end up at our final design.&amp;nbsp; Any Linq to SQL user can choose how much or how little caching is done.&amp;nbsp; They control the lifetime, they can choose an easy mechanism (e.g. stuff it in a static variable forever) or a complicated recycling method depending on their needs.&amp;nbsp; Usually the simple choice is adequate.&amp;nbsp; And they can easily choose which queries to compile and which to just run in the usual manner.&lt;/P&gt;
&lt;P&gt;Let's get back to the overhead of compiled queries.&amp;nbsp; Besides the one-time cost of creating the delegate there is also an little extra delegate indirection on each run of the query plus the more complicated thing we have to do: since the compiled query can span DataContexts we have to make sure that the DataContext we are being given in any particular execution of a compiled query is compatible with the DataContext that was provided when the query was compiled the first time.&lt;/P&gt;
&lt;P&gt;Other than that the code path is basically the same, which means you come out ahead pretty quickly.&amp;nbsp; This test case was, as usual, designed to magnify the typical overheads so we can observe them.&amp;nbsp; The result set is a small number of rows, it is always the same rows, the database is local, and the query itself is a simple one.&amp;nbsp; All the usual costs of doing a query have been minimized.&amp;nbsp; In the wild you would expect the query to be more complicated, the database to be remote, the actual data returned to be larger and not always the same data.&amp;nbsp; This of course both reduces the benefit of compilation in the first place but also, as a consolation prize, reduces the marginal overhead.&lt;/P&gt;
&lt;P&gt;In short, if you expect to reuse the query at all, there is no performance related reason not to compile it.&amp;nbsp; &lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=7110071" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ricom/archive/tags/performance/default.aspx">performance</category><category domain="http://blogs.msdn.com/ricom/archive/tags/databases/default.aspx">databases</category><category domain="http://blogs.msdn.com/ricom/archive/tags/quiz/default.aspx">quiz</category></item><item><title>Performance Quiz #13 -- Linq to SQL compiled queries cost</title><link>http://blogs.msdn.com/ricom/archive/2008/01/11/performance-quiz-13-linq-to-sql-compiled-queries-cost.aspx</link><pubDate>Fri, 11 Jan 2008 23:08:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7078749</guid><dc:creator>ricom</dc:creator><slash:comments>28</slash:comments><comments>http://blogs.msdn.com/ricom/comments/7078749.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ricom/commentrss.aspx?PostID=7078749</wfw:commentRss><description>&lt;P&gt;I've written a few articles about Linq now and you know I was a big fan of &lt;A href="http://blogs.msdn.com/ricom/archive/2007/06/22/dlinq-linq-to-sql-performance-part-1.aspx" mce_href="http://blogs.msdn.com/ricom/archive/2007/06/22/dlinq-linq-to-sql-performance-part-1.aspx"&gt;compiled queries in Linq&lt;/A&gt; but what do they cost?&amp;nbsp; Or more specifically, how many times to you have to use a compiled query in order for the cost of compilation to pay for itself?&amp;nbsp; With regular expressions for instance it's usually a mistake to compile a regular expression if you only intend to match it against a fairly small amount of text.&lt;/P&gt;
&lt;P&gt;Lets do a specific experiment to get an idea.&amp;nbsp; Using the ubiquitous Northwinds database and getting the same data over and over to control for the the cost of the database accesses (and magnify any Linq overheads) we run this query:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;var q = (from o in nw.Orders &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select new { &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; OrderID = o.OrderID, &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CustomerID = o.CustomerID, &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; EmployeeID = o.EmployeeID, &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ShippedDate = o.ShippedDate &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }).Take(5);&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;and compare it against:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;var fq = CompiledQuery.Compile &lt;BR&gt;( &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Northwinds nw) =&amp;gt; &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (from o in nw.Orders &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select new &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; { &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; OrderID = o.OrderID, &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CustomerID = o.CustomerID, &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; EmployeeID = o.EmployeeID, &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ShippedDate = o.ShippedDate &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }).Take(5) &lt;BR&gt;);&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;So now the quiz:&amp;nbsp; How many times to I have to use the compiled version of the query in order for it to be cheaper to compile than it would have been to just use the original query directly?&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=7078749" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ricom/archive/tags/performance/default.aspx">performance</category><category domain="http://blogs.msdn.com/ricom/archive/tags/databases/default.aspx">databases</category><category domain="http://blogs.msdn.com/ricom/archive/tags/quiz/default.aspx">quiz</category></item><item><title>Database Performance, Correctness, Compostion, Compromise, and Linq too</title><link>http://blogs.msdn.com/ricom/archive/2007/08/31/database-performance-correctness-compostion-compromise-and-linq-too.aspx</link><pubDate>Sat, 01 Sep 2007 04:10:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4678887</guid><dc:creator>ricom</dc:creator><slash:comments>16</slash:comments><comments>http://blogs.msdn.com/ricom/comments/4678887.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ricom/commentrss.aspx?PostID=4678887</wfw:commentRss><description>&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Introduction and Disclaimer&lt;/STRONG&gt; 
&lt;P&gt;Regular readers of my blog are already familiar with my goal to provide brief and useful information that is approximately correct and that illustrates some key truths. Most of the time my articles are not authoritative and that is especially true in this case. I am certainly not an Official Microsoft Authority on databases and data systems, I just have a good bit of experience in that area, and I wanted to convey some things I learned that I thought were important, and that I’ve never seen assembled as a whole before, so I’ve written this article. This article uses Linq to SQL for its examples but I think it is actually more broadly applicable, with due caution. 
&lt;P&gt;&lt;STRONG&gt;Performance in Many Tier Systems&lt;/STRONG&gt; 
&lt;P&gt;Again if you read my blog you’ll know that I always talk about the importance of &lt;A href="http://blogs.msdn.com/ricom/archive/2006/12/21/do-performance-analysis-in-context.aspx" mce_href="http://blogs.msdn.com/ricom/archive/2006/12/21/do-performance-analysis-in-context.aspx"&gt;measuring performance in context&lt;/A&gt;.&amp;nbsp; This is especially important in systems with multiple tiers because making a choice in one tier can profoundly impact the consequences in other tiers. For instance, a client-side cache might take a lot of load off of the middle tier. Sounds great right? Oops, you of course remembered that if you cached the contents on the client then they are going to be at least a little of out of date right? You’re ok with that? Or did you add a scheme to periodically recheck validity. Oops, now you have more traffic to the middle teir again. No, wait, you have a nice periodic discard policy where you only keep local data for a known interval? Ah yes, but now your cache contents aren’t necessarily self-consistent because they were fetched with different queries at different times and unified to create the cache.&amp;nbsp; Does it ever end? 
&lt;P&gt;There is a dance here, and it is a complicated one. Only by understanding how all the dancers play together in your system can you truly create systems that have solid correctness characteristics and good performance. And it’s a bad idea to look at the performance of any one of the dancers independently. 
&lt;P&gt;&lt;STRONG&gt;Key Factors&lt;/STRONG&gt; 
&lt;P&gt;I break it down into a fairly small list of considerations, and these are of course deeply entangled. 
&lt;UL&gt;
&lt;LI&gt;Locality&lt;/LI&gt;
&lt;LI&gt;Isolation&lt;/LI&gt;
&lt;LI&gt;Unit of Work&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;You might wonder why I’m not mentioning things like network, schema, and so forth… It’s because I think about them in the context of the bigger phenomena. And maybe you’re wondering what this could possibly have to do with Linq to SQL but don’t worry we’re going to get there by the time we talk about unit of work. 
&lt;P&gt;&lt;STRONG&gt;Locality&lt;/STRONG&gt; 
&lt;P&gt;Even when we’re not talking about databases I often say “Locality is everything” and it’s possible that it is even more true in the database world (don’t nitpick me on “degrees of truth”, it’s only a figure of speech ^_^). When you’re talking about a data system, bad locality generally translates directly to more disk operations which in turn translate into torpedoed performance. 
&lt;P&gt;What do we do about it? Two big things: Schema and Indexes. Maybe that’s really two sides of the same thing but let me separate those concerns for a moment. 
&lt;P&gt;We create schema in a manner that is consistent with the data we intend to represent and so that logically related concerns will tend to be physically together. If we normalize well we also tend to find that a single logical fact tends to be represented in a single physical location. All of this bodes well for locality. 
&lt;P&gt;But now we’re faced with a problem. One organization of the data is frequently not enough to support all the operations that we intend to perform on the data. If we want to look up only by (e.g.) ID, one ordering is fine but the moment that we want to look up also by (e.g.) surname then we need to organize the data in more than one way. That is where indexes come in to play. 
&lt;P&gt;Trust me, I’m going somewhere with all of this. 
&lt;P&gt;To get multiple orderings of the same data you could create multiple tables with the same data arranged differently. That would work but it would be highly inconvenient as you would always have to be choosing the flavor of the data to access when you queried the data and you would have to update multiple copies of the data whenever you made changes. 
&lt;P&gt;Indexes let you automate this. Indexes are your way of telling a database that you want multiple copies of your data, ordered differently, and any time you do an operation that updates the main copy you want all of the alternate copies also updated automatically in the same transaction. 
&lt;P&gt;This is actually a profound statement. Remember we added indexes because there were some questions we could not answer without looking at a lot of data (bad locality) and the price we pay is that now when we make changes to one spot in the data we actually have to propagate that change to many places atomically (reduced locality). 
&lt;P&gt;In the final analysis an index isn’t much more than a second, or third, etc. copy of the table with some of the columns reordered and some removed entirely. 
&lt;P&gt;Locality means we get just the data we need, that it’s organized in such a way that we do not have to sift though vast volumes of data to get to what we want, and that we can make surgical changes to the data in support of the things our system needs to do as it runs. Good locality translates directly into good performance. 
&lt;P&gt;&lt;STRONG&gt;Isolation&lt;/STRONG&gt; 
&lt;P&gt;Isolation is a notion that is cloaked in mystery, perhaps unnecessarily. I’ve met many developers that know what a transaction is but far fewer that know about isolation levels and fewer still that understand how these things are entangled. In short, isolation is about giving every program using a data system the illusion that it is the only user. Levels of isolation, roughly, describe how imperfect that illusion is going to be. 
&lt;P&gt;OK if you’re still with me at this point you must be wondering what any of this could possibly have to do with performance and even more wondering why on earth I am now talking about Isolation, a concept that is understood even less well than Locality. 
&lt;P&gt;I’m glad you asked :) 
&lt;P&gt;The first thing you should know is that the better your data locality the easier it is to create the illusion of isolation. The simplest case is two clients looking at two completely unrelated sections of a database – they have no overlap in their operations whatsoever and so isolation is easy. Now of course the more data you access the greater the likelihood that you will overlap with someone else and some isolation technique is going to be necessary to preserve the illusion. If your data has great locality then you’ll be able to make nice tight queries minimizing your isolation needs. 
&lt;P&gt;The second thing you should know is that maintaining isolation has a cost, like everything else. Depending on the technique used it can have both direct costs and indirect costs. But that’s really abstract so let’s be more concrete to illustrate these costs with a specific example. 
&lt;P&gt;Let’s suppose I have a simple database with just one table in it. It’s a set of account numbers, names, and balances. Now just one table isn’t enough to really model a bank with necessary auditing and so forth but this example is already complicated enough to show some isolation concerns and how they turn into performance problems. Furthermore let’s say the isolation required is one of the most basic choices “READ_COMMITTED” meaning I’m never allowed to observe someone else’s data if they have not yet committed their transaction – so no intermediate results. 
&lt;P&gt;Now let’s suppose I’m a big customer of this bank and I have 100 accounts, conveniently numbered 1000, to 1099. Some of the accounts are in my name and some of them are in my wife’s name. I want to know the total amount of money in the accounts in my name. In SQL it would look like this 
&lt;BLOCKQUOTE&gt;
&lt;P&gt;select sum(balance) &lt;BR&gt;from account &lt;BR&gt;where account.id &amp;gt; = 1000 and account.id &amp;lt;= 1099 and account.name = ‘Rico’&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;The situation doesn’t get much easier than that. We could assume there is a nice index on the account table by ID so that all those accounts are (nearly) contiguous and with one scan of just my slice of the data we can get the answer. That’s about the best locality we could hope for. 
&lt;P&gt;Now you can imagine that you are the little database engine, you go and read account #1000, check the name, find that it’s Rico, and add the balance. Move on to #1001, check the name, add the balance. Chug chug chug. 
&lt;P&gt;Now let me complicate things just a little bit. While this is happening, and we’re on account #1050 (chug chug) another user comes along and deposits money into one of my accounts. Let’s say its account #1060. Is this a problem? Well no it isn’t. Since I haven’t yet read the contents of account #1060 when I get there I will find the new balance and all is well. Or is it? 
&lt;P&gt;Here’s a subtle point: the total that I get isn’t guaranteed to be the total that I would have gotten if the whole query had run at the instant that that it started (you could imagine such a system but that’s unusual) the only guarantee is that we will get some total that represents the sum of the balances as they could have existed at some moment where all the data was committed. So here we’ve created valid total, an especially interesting one because it shows the balance at the instant the query finishes. 
&lt;P&gt;Great, so we can write into areas we have not yet read without any trouble. 
&lt;P&gt;In fact we could even do something a little more complicated. Suppose we did a transfer of $100 from one of my accounts, #1060, to another account, #1070. We’re still in good shape because neither of those two accounts has yet been read and so the sum will be computed correctly. 
&lt;P&gt;Hey, this isolation thing doesn’t seem very hard so far! &amp;nbsp;I wanted to subtract money from #1060 and put in #1070 and everything is great. But what if I had wanted to put the money in account #1030? 
&lt;P&gt;Now I have a problem, #1030 has already been read. If I allow the money to be moved then the sum calculation will be off by $100 because the $100 is gone from #1060 which I have not yet read and it was missing from #1030 which I have read. Ooops. 
&lt;P&gt;What do we do about this? Well there are many approaches, I will pick one for this example, the key thing to remember is that all the approaches have costs. 
&lt;P&gt;You might think that what&amp;nbsp;the database&amp;nbsp;must do when the query begins is lock everything&amp;nbsp;that is&amp;nbsp;going to be&amp;nbsp;read and prevent anyone/everyone from writing to those things. That could work, if only&amp;nbsp;it could predict, accurately, what it is that will be read. Or&amp;nbsp;it could lock more than will be&amp;nbsp;read, hopefully not much more, that could work too. But keep in mind these things: 
&lt;P&gt;#1 predicting the future is hard (e.g. in this example, which rows belong to me and which belong to my wife?) 
&lt;P&gt;#2 we want to lock as little as possible so that as much can continue running as possible (e.g. all updates to my wife’s accounts would be fine since they do not affect the sum) 
&lt;P&gt;So we might end up at a different scheme – rather than guess the future, lets lock the past. When the transaction wrote to #1060, subtracting $100 that was just fine, so far. The contents of #1060 are still dirty so if the reader arrives there it must wait for the transaction to finish. Meanwhile, if the writer chooses to move the $100 to account #1070 all is well. #1070 has not been read yet, it isn’t locked, the writer moves the cash, commits the transaction and if the reader had been waiting for #1060 to finalize it will do so and the read operation can proceed. So perhaps the summing was paused for a moment but otherwise everything went well. 
&lt;P&gt;Notice that we already have taken a bunch of performance hits. We had to mark rows that were “dirty” and pending a transaction with some kind of write lock. We had to have readers checking the write locks and waiting (i.e. taking longer) if they encountered a lock. And we had to clean out the write locks whenever a transaction committed. But that was the easy case. 
&lt;P&gt;What if, after subtracting $100 from account #1060 we then attempted to add the $100 to account #1030. Lucky for us new “lock the past” isolation scheme says that row #1030, since it has already been read, now has a read-lock. That means we can’t write to it. But wait. #1060 has a write lock. So when the reader eventually gets to #1060 it will stop. It will not give up its read lock on #1030 so the write operation cannot complete, meanwhile the read operation on #1060 cannot complete and so the read long on #1030 well never be released. 
&lt;P&gt;Deadlock. 
&lt;P&gt;You’ll notice I’ve managed to create a deadlock with just one writer in the system merely because of the need for isolation. 
&lt;P&gt;Now you can imagine other isolation methods that would not get a deadlock in this case but they have other costs and we’re not studying them. The point here is that there is a direct isolation cost, in any scheme, and now we’re about to see an example of an indirect cost. 
&lt;P&gt;Since the system is deadlocked, the database must now choose one of those two transactions (the reader or the writer) and abort it. 
&lt;P&gt;If it chooses the reader, then the write will be allowed to complete, the reader must retry the operation and then get the correct and accurate new sum based on the completed write transaction. 
&lt;P&gt;If it chooses the writer, then the reader may complete but first we must use the transaction log to restore the original contents account #1060 (i.e. the transaction does not commit, it aborts) and we get the correct sum. Meanwhile the writer must retry its operation. 
&lt;P&gt;Can you see indirect costs? Probably the most significant is that either the read or the write must be retried. Now these were both tiny cases but you can imagine if many accounts were implicated and if there was complicated math, sorting, etc. that redoing that read query could be costly and likewise the write query with many indices and if it spanned many tables could be rather complex, to say nothing of the business logic that might be required to redo the math. In this case it’s a simple add $100, subtract $100 but in a complex system it might be a tricky reservation computation, mileage credit and commission distribution – with auditing. Anything you have to redo is just wasted work. 
&lt;P&gt;Importantly, the above is a normal and natural part of using a database and nothing has gone “wrong” here and we can reach this vital conclusion. &lt;B&gt;The more we read or write in one unit, the greater the amortized cost of isolation. &lt;/B&gt;The more we attempt to write in one unit the greater the chance that the entire write will have to be aborted (and then redone from scratch). 
&lt;P&gt;This leads us right into the next topic… 
&lt;P&gt;&lt;STRONG&gt;Unit of Work&lt;/STRONG&gt; 
&lt;P&gt;More than anything else how much work you do in one chunk drives everything else. Sure you need good locality – via a proper schema and indexes – and yes you need to choose the right isolation technique, where you have options, but you can destroy a system by trying to do too much at once – and people do. 
&lt;P&gt;If you’ve been waiting for me to make the connection to Linq here it is: every data layer has to make choices about how much to read and when. It may seem like a good idea to do large amounts of pre-reading, and in fact when measured in isolation it may seem like you get better performance when doing so, but, like the &lt;A href="http://en.wikipedia.org/wiki/Prisoner%27s_dilemma" mce_href="http://en.wikipedia.org/wiki/Prisoner%27s_dilemma"&gt;prisoners’ dilemma&lt;/A&gt;, when composed with the other operations that are happening on the server you may find that making the best looking choice independently results in a poor situation for the system as a whole. 
&lt;P&gt;Should you make the choice that is best for one part of the system at the expense of the system as a whole? Probably not. 
&lt;P&gt;Linq to SQL is faced with very real choices around how much to read in one chunk and how much isolation to guarantee. Creating a solution where we appear to offer excellent but non-composable performance would simply be leading people down a path to disaster – and you know I always advocate the Pit of Success. The obvious way should work out well. 
&lt;P&gt;&lt;STRONG&gt;Correctness&lt;/STRONG&gt; 
&lt;P&gt;Once you’ve thought about unit of work you soon realize that you cannot afford to submit large transactions to your database – they are too likely to not commit successfully. So what do you do? 
&lt;P&gt;A classic thing to do is to break up those transactions into smaller pieces, but doing so creates a new set of intermediate stored results which must be valid. It’s a designed “workflow” for your uber-transaction where for instance every-line item in an order, or every 10 or something like that, is independently written and the order is flagged as “in flight.” In that world “in flight” orders are a normal healthy thing and your system has to be able to handle them appropriately – basically some intermediate states have become visible. 
&lt;P&gt;It sounds bad, but it’s essentially inescapable. The alternative is to *try* to commit gianormous transactions that really have no hope of *actually* committing with any kind of realiability. You may think that you’re going to get great performance by submitting all that work in one nice big chunk but you may find that all those savings are lost to the cost of all the extra retries you have to do to actually succeed. 
&lt;P&gt;And it gets worse. 
&lt;P&gt;If you have a client side component, like you do with Linq, then the data you have saved on the client might become “wrong” if something happens on the server that you don’t know about. You’ll need some kind of &lt;A href="http://blogs.msdn.com/ricom/archive/2004/06/24/165063.aspx" mce_href="http://blogs.msdn.com/ricom/archive/2004/06/24/165063.aspx"&gt;locking mechanism&lt;/A&gt; like Optimistic (what Linq uses) or Pessimistic. What this says is that before you write data back to the database you first re-verify that the data now in the database is still what it was before you made the update – if anything has changed out from under you then you throw an exception. 
&lt;P&gt;What does that mean? Well if you are writing a large amount of data and it has to be atomic that’s a bad thing because even if there wasn’t a database deadlock you still might find that some part of what you wrote has been altered. If you require that it all be as-it-was you may find you can never write anything, or that you often have to try 2, 3, 4, 5, 10, 50… who knows how many times to actually get the stuff to write. 
&lt;P&gt;What do you do about this? 
&lt;P&gt;We’re right back to the unit of work discussion. If you break those writes down into smaller chunks and make it so that you can write back in pieces – including some markers to show what is in flight and what is not – then you can “partly succeed” in your writes, even “mostly succeed” and when things go wrong because of a conflict you only need resolve those few records that actually did conflict and write them back to the database in your retry operation. 
&lt;P&gt;It is not wise to expect to successfully write thousands and thousands of rows in one operation and actually succeed on a production database under load. 
&lt;P&gt;Yes breaking those operations down will result in more round trips to the server and it may seem that such a thing performs more poorly (it will if measured alone) but those operations are much more composable with other things the server is going to be doing. 
&lt;P&gt;Your overall performance could be, almost certainly will be,&amp;nbsp;A Lot Better (TM). 
&lt;P&gt;You’ll of course have to measure, in context. 
&lt;P&gt;&lt;STRONG&gt;One Last Warning&lt;/STRONG&gt; 
&lt;P&gt;If you consider what I said, about the natural occurrence of failures in a database, then you’ll soon realize that it is *normal*, using Linq parlance, for db.SubmitChanges() to throw an exception from time to time. If you are trying to write a robust application with high reliability you need to think about that. 
&lt;P&gt;In addition to obvious things like, “the network went down”, “the database went down”, there are less obvious things like, “there was a deadlock”, “there was an optimistic lock conflict” that can and do happen. Those latter two things should be appropriately retried because *nothing bad has happened*. The strategy you choose, especially for cases where the optimistic lock failed, can have a profound impact on your performance and certainly you can’t just let those exceptions flow to the user. I think I can safely say that my mom doesn’t want to hear about how table X on connection A deadlocked with table Y on connection B. 
&lt;P&gt;If you’ve been reading carefully then you’ll see that it’s also “normal” for a foreach operation over a Linq query to fail from time to time – you need a retry strategy for those too to be fully robust. 
&lt;P&gt;Don’t get down on Linq though, those problems exist with all data solutions, the productivity benefits you get from Linq will go a long way to helping you to add the robustness you need in the areas you need it. 
&lt;P&gt;Don’t read “too much” at once. Don’t write “too much” at once. Handle deadlocks, they’re normal. Handle optimistic lock failures, they’re also normal. You should land in the Pit of Success.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=4678887" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ricom/archive/tags/performance/default.aspx">performance</category><category domain="http://blogs.msdn.com/ricom/archive/tags/design+advice/default.aspx">design advice</category><category domain="http://blogs.msdn.com/ricom/archive/tags/databases/default.aspx">databases</category></item><item><title>DLinq (Linq to SQL) Performance (Part 5)</title><link>http://blogs.msdn.com/ricom/archive/2007/07/16/dlinq-linq-to-sql-performance-part-5.aspx</link><pubDate>Mon, 16 Jul 2007 21:17:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3899391</guid><dc:creator>ricom</dc:creator><slash:comments>23</slash:comments><comments>http://blogs.msdn.com/ricom/comments/3899391.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ricom/commentrss.aspx?PostID=3899391</wfw:commentRss><description>&lt;P&gt;This posting is the last of what I had planned in this series but I think there are likely to be questions, especially when Orcas Beta 2 is more widely available so we're likely to talk about this some more.&lt;/P&gt;
&lt;P&gt;First let's talk about the result I got and what it means.&amp;nbsp; On my particular hardware I was able to achieve 93% of the throughput of the underlying provider.&amp;nbsp; &lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;How did I do this?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;[7/17: Note: I just realized this could be read not how I intended.&amp;nbsp; I mean&amp;nbsp;'how I did this' from the perspective of how the benchmark is built and how is it that the benchmark could be expected to achieve such a result. I did not write *any* of the&amp;nbsp;Linq code myself, I only gave them some ideas to help improve this result.&amp;nbsp; &lt;A class="" href="http://blogs.msdn.com/mattwar" mce_href="http://blogs.msdn.com/mattwar"&gt;Matt&lt;/A&gt; didn't ask me to clarify this but&amp;nbsp;he would have been more than justified if he had.&amp;nbsp;:) ]&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Well think about it, even though my test case is designed to magnify Linq overheads (because there is no business logic to do anything with the data and the data is all local and in the cache) what we've done by compiling the query is basically to remove most of the Linq overhead entirely.&amp;nbsp; It's almost cheating except for you can do it too, and in meaningful, cases.&amp;nbsp; In fact, arguably in the most common and important cases where you most need it you'll be able to get the best performance.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What are the major steps in running a query?&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;The expression tree that represents the query has to be created (var q = from etc...) 
&lt;UL&gt;
&lt;LI&gt;You could avoid this even if you don't compile the query by just saving your query somewhere reusable 
&lt;LI&gt;If you weren't using Linq you'd have to just execute a command with parameters&lt;/LI&gt;&lt;/UL&gt;
&lt;LI&gt;The expression tree has to be converted to SQL 
&lt;UL&gt;
&lt;LI&gt;The query can have parameters which in turn appear in the generated query 
&lt;LI&gt;The actual parameters are applied when the query is executed 
&lt;LI&gt;We make this a one-time cost by compiling the query once and saving the resulting SQL&lt;/LI&gt;&lt;/UL&gt;
&lt;LI&gt;The query has to execute 
&lt;UL&gt;
&lt;LI&gt;The cost of this step is the same with Linq and without since the query looks very similar 
&lt;LI&gt;SQL server can use the same plan for the query since it looks the same each time 
&lt;LI&gt;This cost has been minimized in this test case by running the same query and keeping the database local&lt;/LI&gt;&lt;/UL&gt;
&lt;LI&gt;The results of the query have to be turned into objects 
&lt;UL&gt;
&lt;LI&gt;In the May CTP this was done with reflection, however, now this is done by creating a custom method with light-weight code generation that does the object creation and data copying exactly the way you would do it by hand 
&lt;LI&gt;Since the columns that come back from the query are the same regardless of values of any parameters in the query&amp;nbsp;this custom method can be re-used 
&lt;LI&gt;Instead of paying a cost to create the method every time you pay it once 
&lt;LI&gt;You still have to pay the cost of invoking this method via a delegate on each row, if you do it manually that code is inline so there is no function&lt;/LI&gt;&lt;/UL&gt;
&lt;LI&gt;Whatever you do with your data 
&lt;UL&gt;
&lt;LI&gt;In this example it's&amp;nbsp;basically nothing (one add operation)&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/OL&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;So what's left?&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;We made #1, #2 and #4 one-time-costs.&amp;nbsp; #3, and #5&amp;nbsp;we have to do in both cases no matter what.&amp;nbsp; So what's the overhead?&amp;nbsp; Not a whole lot, some checks to make sure we really can use the saved versions of everything and then the cost of calling the delegate.&amp;nbsp; The reason the cost is as &lt;STRONG&gt;high &lt;/STRONG&gt;as 7% is because so little is happening in steps 3 and 5.&amp;nbsp; In real cases those steps would tend to be the bulk of your cost.&lt;/P&gt;
&lt;P&gt;When can you get this benefit?&amp;nbsp; Any time you are running the same (parameterized) query -- which is very often.&amp;nbsp; Just as prepared statements and stored procedures are common/popular in regular SQL compiled queries should be popular in Linq to SQL.&amp;nbsp; The most critical queries of your application can probably be compiled.&amp;nbsp; Others might not be worth the hassle.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What about those insert and update cases?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;In the Linq world, the update looks like a select, some data changes, followed by an update.&amp;nbsp; I used the very same select statement and I arbitrarily updated the first dozen or so rows with a trivial update (I added 1s to a date field).&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;            var fq = CompiledQuery.Compile
                (
                    (Northwinds nw) =&amp;gt;
                            from o in nw.Orders
                            select o
                );

            int i = 0;
            for (; i &amp;lt; updateruns; i++)
            {
                using (Northwinds dc = new Northwinds(conn))
                {
                    int j = 0;

                    foreach (Orders o in fq(dc))
                    {
                        if (j++ &amp;gt; updatebatch)
                            continue;

                        o.OrderDate = o.OrderDate.Value.AddSeconds(1);
                    }

                    dc.SubmitChanges();
                }
            }
&lt;/PRE&gt;
&lt;P&gt;Note that I reported times for both the compiled case and the non-compiled case.&amp;nbsp; That's because you can either compile the select part of the update or not.&amp;nbsp; Depending on the frequency of actual updates you might find it worthwhile or not.&amp;nbsp; Again this is a very dumb example designed to magnify Linq overheads.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Why did I get such outstanding performance?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The alternative code looked&amp;nbsp;like this:&lt;/P&gt;&lt;PRE&gt;    
    // also runs in a loop updateruns times, not show here

    StringBuilder sb = new StringBuilder();

    while (dr.Read())
    {
        OrderDetail o = new OrderDetail();

        ... populate the fields...

        if (j++ &amp;gt; updatebatch) // updatebatch size was 10 in my test case
            continue;&lt;BR&gt;
        o.ShippedDate = o.ShippedDate.Value.AddSeconds(1);

        sb.AppendFormat("update Orders set ShippedDate = '{0}' where OrderID = {1}\r\n", 
               o.ShippedDate.ToString(), 
               o.OrderID);&lt;BR&gt;    }&lt;/PRE&gt;&lt;PRE&gt;&lt;BR&gt;&lt;BR&gt;&lt;BR&gt;&lt;BR&gt;&lt;BR&gt;    // execute the query in the stringbuilder&lt;/PRE&gt;
&lt;P&gt;&lt;FONT face="Trebuchet MS"&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Trebuchet MS"&gt;That is all pretty simple stuff.. it's actually cheating a little because Linq to SQL will make sure that the data hasn't changed since it was read and I don't bother with that.&amp;nbsp; However all of this is trumped by the fact that I didn't bother using prepared statements (but I did executed my updates in one batch) and Linq to SQL automatically made a prepared statement for doing the update and as a result SQL was able to process it better.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Could you do that yourself?&amp;nbsp; &lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Yes.&amp;nbsp; Would you?&amp;nbsp; Maybe.&amp;nbsp; Or you might use a stored proc to do the update for you.&amp;nbsp; At that point my guess is that you would break even as you'd be back to doing exactly what Linq to SQL does.&amp;nbsp; Isn't it strange that we're talking about what you have to do to the no-Linq case in order to get the speed you get with Linq by default?&amp;nbsp; I think that's a good sign.&amp;nbsp; But see the overall conclusions at the end.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What about the Insert case?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The insert test case gets its performance boost basically the same way except it's an batch of insert statements rather than updates and of course there is no select.&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;    using (Northwinds dc = new Northwinds(conn))
    {
        for (int j = 0; j &amp;lt; insertbatch; j++) // 10 items
        {
            Categories cat = new Categories();
            cat.CategoryName = "dummy_category" + j.ToString();
            cat.Description = "Description... Description... ' Description...Description...Description...Description...";

            dc.Categories.Add(cat);
        }

        dc.SubmitChanges();
    }
&lt;/PRE&gt;
&lt;P&gt;The batch insertion code in my test case (without Linq) looks nearly identical to the update case.&lt;/P&gt;&lt;PRE&gt;        sb.AppendFormat("insert into Categories (CategoryName, Description) values('{0}', '{1}')\r\n",
            "dummy_category" + j.ToString(),
            "Description... Description... Description...Description...Description...Description...");
&lt;/PRE&gt;
&lt;P&gt;So you can expect the Linq version will perform better for the same (artificial) reason as in the update. &lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What's my final word?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;On selects you can pretty much make the Linq overhead vanish if you have a repeatable query pattern, which is a pretty common thing.&amp;nbsp; That's great news.&lt;/P&gt;
&lt;P&gt;On inserts and updates, my test cases weren't especially great and the main thing&amp;nbsp;they illustrate is that good connection management and prepared statements dwarf the other costs in simple insert/update cases.&amp;nbsp; The good news there is that Linq gives you both for free.&lt;/P&gt;
&lt;P&gt;Despite the fact that I'm pretty handy with SQL it took me a LOT longer to write the no-Linq version of these tests and I'd much rather maintain those than the reverse.&lt;/P&gt;
&lt;P&gt;None of the times I reported have anything to do with what actual applications with normal latencies and data logic will experience.&amp;nbsp; In those cases the results show that you're likely to see little to no difference between using linq and not using linq (if you compile etc.) which is also a great result.&lt;/P&gt;
&lt;P&gt;Overall I'm very pleased.&amp;nbsp; I hope you will be too.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3899391" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ricom/archive/tags/performance/default.aspx">performance</category><category domain="http://blogs.msdn.com/ricom/archive/tags/databases/default.aspx">databases</category></item><item><title>DLinq (Linq to SQL) Performance (Part 4)</title><link>http://blogs.msdn.com/ricom/archive/2007/07/05/dlinq-linq-to-sql-performance-part-4.aspx</link><pubDate>Fri, 06 Jul 2007 03:32:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3716711</guid><dc:creator>ricom</dc:creator><slash:comments>24</slash:comments><comments>http://blogs.msdn.com/ricom/comments/3716711.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ricom/commentrss.aspx?PostID=3716711</wfw:commentRss><description>&lt;P&gt;Well it's high time I gave you some numbers for the new stuff.&lt;/P&gt;
&lt;P&gt;In &lt;A href="http://blogs.msdn.com/ricom/archive/2007/06/22/dlinq-linq-to-sql-performance-part-1.aspx" mce_href="http://blogs.msdn.com/ricom/archive/2007/06/22/dlinq-linq-to-sql-performance-part-1.aspx"&gt;the original benchmark&lt;/A&gt; the Linq version was running at 13.62% of the original time.&amp;nbsp; And while I'm discussing that result,&amp;nbsp;Sekiya Sato pointed out an error in my original benchmark (see the comments of the above posting) in which I had one of my ISDBNull() checks backwards.&amp;nbsp; That error&amp;nbsp;made the "nolinq" version actually run 3.6% faster than it should have.&amp;nbsp; So the number I reported, 13.62% should have actually been 14.09% -- let me restate that result for clarity, in May 2006, DLinq was running at 14.09% of the underlying provider speed in this (harsh) test case on my hardware and not 13.62% as previously reported.&lt;/P&gt;
&lt;P&gt;I have in my hands a nice fresh build, which is similar to what you're going to get when you adopt Beta 2.&amp;nbsp; The results below include my original test plus an some quick insert and update tests I added --&amp;nbsp;I'll describe those in the next installment.&amp;nbsp; What we want to talk about right now is the select cases.&amp;nbsp; The regular select is as orignially described.&amp;nbsp; The syntax for the compiled select (and this really builds) is this:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;var fq =CompiledQuery.Compile(&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Northwinds nw) =&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from o in nw.Orders&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select new OrderDetail&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; OrderID = o.OrderID,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CustomerID = o.CustomerID,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; EmployeeID = o.EmployeeID,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ShippedDate = o.ShippedDate&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR&gt;);&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Note that with the nice type inferencing you never have to see the generic types in your code but it's still strongly typed.&amp;nbsp; To use this query you simply &lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;foreach (var detail in fq(nw))&lt;BR&gt;{&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; sum += detail.OrderID;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; count++;&lt;BR&gt;}&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Now let's have a look at the numbers:&lt;/P&gt;
&lt;TABLE class=""&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class=""&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/TD&gt;
&lt;TD class="" align=middle colSpan=3&gt;no linq&lt;/TD&gt;
&lt;TD class="" align=middle colSpan=5&gt;
&lt;P align=center&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; with linq&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD class=""&gt;
&lt;P align=right&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select &lt;/P&gt;&lt;/TD&gt;
&lt;TD class=""&gt;
&lt;P align=right&gt;&amp;nbsp; update&lt;/P&gt;&lt;/TD&gt;
&lt;TD class=""&gt;
&lt;P align=right&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; insert&lt;/P&gt;&lt;/TD&gt;
&lt;TD class=""&gt;
&lt;P align=right&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;original&lt;BR&gt;select&lt;/P&gt;&lt;/TD&gt;
&lt;TD class=""&gt;
&lt;P align=right&gt;&amp;nbsp;&amp;nbsp; compiled&lt;BR&gt;&amp;nbsp;select&lt;/P&gt;&lt;/TD&gt;
&lt;TD class=""&gt;
&lt;P align=right&gt;&amp;nbsp; update&lt;/P&gt;&lt;/TD&gt;
&lt;TD class=""&gt;
&lt;P align=right&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; compiled&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; update&lt;/P&gt;&lt;/TD&gt;
&lt;TD class=""&gt;
&lt;P align=right&gt;&amp;nbsp;&amp;nbsp; insert&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;run 1&lt;/TD&gt;
&lt;TD class="" align=right&gt;915.25 &lt;/TD&gt;
&lt;TD class="" align=right&gt;4.87 &lt;/TD&gt;
&lt;TD class="" align=right&gt;4.29 &lt;/TD&gt;
&lt;TD class="" align=right&gt;497.81 &lt;/TD&gt;
&lt;TD class="" align=right&gt;858.07 &lt;/TD&gt;
&lt;TD class="" align=right&gt;20.65 &lt;/TD&gt;
&lt;TD class="" align=right&gt;21.05 &lt;/TD&gt;
&lt;TD class="" align=right&gt;20.25&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;run 2&lt;/TD&gt;
&lt;TD class="" align=right&gt;916.25 &lt;/TD&gt;
&lt;TD class="" align=right&gt;5.02&lt;/TD&gt;
&lt;TD class="" align=right&gt;4.76 &lt;/TD&gt;
&lt;TD class="" align=right&gt;491.59 &lt;/TD&gt;
&lt;TD class="" align=right&gt;864.60 &lt;/TD&gt;
&lt;TD class="" align=right&gt;20.34 &lt;/TD&gt;
&lt;TD class="" align=right&gt;20.62 &lt;/TD&gt;
&lt;TD class="" align=right&gt;12.02&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;run 3&lt;/TD&gt;
&lt;TD class="" align=right&gt;942.86 &lt;/TD&gt;
&lt;TD class="" align=right&gt;4.87&lt;/TD&gt;
&lt;TD class="" align=right&gt;4.66 &lt;/TD&gt;
&lt;TD class="" align=right&gt;496.57 &lt;/TD&gt;
&lt;TD class="" align=right&gt;859.11 &lt;/TD&gt;
&lt;TD class="" align=right&gt;21.03 &lt;/TD&gt;
&lt;TD class="" align=right&gt;20.47 &lt;/TD&gt;
&lt;TD class="" align=right&gt;16.08&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;average&lt;/TD&gt;
&lt;TD class="" align=right&gt;924.79 &lt;/TD&gt;
&lt;TD class="" align=right&gt;4.92&lt;/TD&gt;
&lt;TD class="" align=right&gt;4.57 &lt;/TD&gt;
&lt;TD class="" align=right&gt;495.32 &lt;/TD&gt;
&lt;TD class="" align=right&gt;860.59 &lt;/TD&gt;
&lt;TD class="" align=right&gt;20.67 &lt;/TD&gt;
&lt;TD class="" align=right&gt;20.71 &lt;/TD&gt;
&lt;TD class="" align=right&gt;16.12&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;
&lt;P&gt;The units for all of the above are test iterations&amp;nbsp;per second so bigger is better.&amp;nbsp; 
&lt;P&gt;&lt;BR&gt;
&lt;TABLE class=""&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class=""&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD class=""&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dlinq&lt;/TD&gt;
&lt;TD class=""&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;nolinq&lt;/TD&gt;
&lt;TD class=""&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ratio&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;select&lt;/TD&gt;
&lt;TD class="" align=right&gt;495.32 &lt;/TD&gt;
&lt;TD class="" align=right&gt;924.79 &lt;/TD&gt;
&lt;TD class="" align=right&gt;53.56%&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;compiled select&lt;/TD&gt;
&lt;TD class="" align=right&gt;860.59 &lt;/TD&gt;
&lt;TD class="" align=right&gt;924.79 &lt;/TD&gt;
&lt;TD class="" align=right&gt;&lt;STRONG&gt;&lt;FONT color=#008000&gt;93.06%&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;update&lt;/TD&gt;
&lt;TD class="" align=right&gt;20.67 &lt;/TD&gt;
&lt;TD class="" align=right&gt;4.92 &lt;/TD&gt;
&lt;TD class="" align=right&gt;420.19%&lt;/TD&gt;
&lt;TD class=""&gt;&amp;nbsp; (DLinq is faster)&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;compiled update&lt;/TD&gt;
&lt;TD class="" align=right&gt;20.71 &lt;/TD&gt;
&lt;TD class="" align=right&gt;4.92 &lt;/TD&gt;
&lt;TD class="" align=right&gt;421.00%&lt;/TD&gt;
&lt;TD class=""&gt;&amp;nbsp; (DLinq is faster)&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;insert&lt;/TD&gt;
&lt;TD class="" align=right&gt;16.12 &lt;/TD&gt;
&lt;TD class="" align=right&gt;4.57 &lt;/TD&gt;
&lt;TD class="" align=right&gt;352.66%&lt;/TD&gt;
&lt;TD class=""&gt;&amp;nbsp; (DLinq is faster)&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Wow that's pretty good.&amp;nbsp; If you do nothing to your code, just raw internal improvements go from 14.09% of the underlying provider to 53.56% -- that's a 3.8x improvement.&amp;nbsp; But look at what you can do with compiled queries: if you compile the select statement I got 93.06% of the underlying providers raw speed -- that's 6.6x faster than what I got back in May of 2006.&amp;nbsp; This is a truly great result because, as I've mentioned before, this is a harsh test. With the normal overheads associated with actual business logic and data transfer this result basically means that you may not even be able to measure any throughput degradation at all if you use compiled DLinq queries in your program.&lt;/P&gt;
&lt;P&gt;I think I'll let &lt;A href="http://blogs.msdn.com/mattwar" mce_href="http://blogs.msdn.com/mattwar"&gt;Matt&lt;/A&gt; talk about the details of how we did this because he did the work but I can give you the high level points if you haven't already guessed them from the previous postings&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;create custom methods that bind the data perfectly using lightweight code generation&lt;/LI&gt;
&lt;LI&gt;create reusable SQL with parameters to avoid&amp;nbsp;generating the SQL&amp;nbsp;query again&lt;/LI&gt;
&lt;LI&gt;provide read-only contexts to avoid any unnecessary entity management (this not needed anyway in my case because I new up an OrderDetail object with only part of the data)&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;When I modelled this on paper last summer it looked like we could get about 95% of the underlying provider speed plus or minus measurement error and we seem to have landed at 93%.&amp;nbsp; &lt;/P&gt;
&lt;P&gt;Now you may ask, why is DLinq doing better at updates than my code that writes directly to the underlying provider?&amp;nbsp; I'll talk about this a bit more next time but the short answer is this:&amp;nbsp; the code I wrote to do the updates looks like pretty typical SQL sent to the database in batches.&amp;nbsp; However I didn't go to the trouble of creating prepared statements for update and insert cases, DLinq gives you this automatically.&amp;nbsp; So despite my more complicated logic the savings DLinq got from superior SQL trumped my techinque.&lt;/P&gt;
&lt;P&gt;And lastly, as always, this doesn't necessarily translate to any specific numbers for your application but it sure bodes well.&amp;nbsp; I'm very pleased indeed.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3716711" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ricom/archive/tags/performance/default.aspx">performance</category><category domain="http://blogs.msdn.com/ricom/archive/tags/databases/default.aspx">databases</category></item><item><title>DLinq (Linq to SQL) Performance (Part 3)</title><link>http://blogs.msdn.com/ricom/archive/2007/06/29/dlinq-linq-to-sql-performance-part-3.aspx</link><pubDate>Fri, 29 Jun 2007 19:01:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3605797</guid><dc:creator>ricom</dc:creator><slash:comments>24</slash:comments><comments>http://blogs.msdn.com/ricom/comments/3605797.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ricom/commentrss.aspx?PostID=3605797</wfw:commentRss><description>&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I’d like to start with a little housekeeping. Some readers asked me how I made the nifty table in &lt;A href="http://blogs.msdn.com/ricom/archive/2007/06/25/dlinq-linq-to-sql-performance-part-2.aspx" mce_href="http://blogs.msdn.com/ricom/archive/2007/06/25/dlinq-linq-to-sql-performance-part-2.aspx"&gt;part 2&lt;/A&gt; that showed the costs broken down by major area.&lt;/P&gt;
&lt;P&gt;It was actually pretty easy to create that table using our profiler. I did 500 iterations of the test case in sampled mode and that gave me plenty of samples. I could see which callstacks ended in mscorjit.dll – even without symbols --&amp;nbsp;which I never bothered using -- that gave me a good idea how much jit time there was. I could see the data-fetching functions from the underlying provider being called -- the same ones that appear in the no-linq version of the code (GetInt32, GetString and so forth) so I knew what the costs of actually getting the data were. I could see the path that creates the expression tree for the query and I could see stub dispatch functions. So I added up the related ones, broke it into 5 categories and then showed one more line for the bits that didn’t fit into any of those categories. Then I scaled the numbers up so that the part of the benchmark I cared about was 100% (there was other junk in my hardness not relevant to the benchmark).&amp;nbsp; That’s it :)&lt;/P&gt;
&lt;P&gt;One more gotcha though, when I talked to &lt;A href="http://blogs.msdn.com/mattwar/" mce_href="http://blogs.msdn.com/mattwar/"&gt;Matt&lt;/A&gt; about this&amp;nbsp;we realized that I had reported the breakdown from an internal&amp;nbsp;build that included one change he had made for me from the May 2006 CTP. The breakdown you would see if you did this experiment again, on a May 2006 CTP build, would have reflection costs instead of jitting costs.&amp;nbsp; I'll discuss that more when I go into the specific improvements we made which are coming soon... &lt;/P&gt;
&lt;P&gt;But, meanwhile, let's move on to the real topic of Part 3… 
&lt;H2&gt;Per Row Costs&lt;/H2&gt;
&lt;P&gt;In this area there were two things that I worried about. 
&lt;H3&gt;&lt;B&gt;Entity/Identity Managment&lt;/B&gt;&lt;/H3&gt;
&lt;P&gt;The data binding problem used to be at the top of my list but entity creation costs started to worry me more. This query is sort of an example of a troublesome one: 
&lt;BLOCKQUOTE&gt;
&lt;P&gt;var q = from o in nw.Orders&lt;BR&gt;select o;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Doing foreach over the above is much less efficient than the below, and the astonishment factor for that might be pretty high. 
&lt;BLOCKQUOTE&gt;
&lt;P&gt;var q = from o in nw.Orders&lt;BR&gt;select new {o.everything …};&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;In general all these “read a bunch of data” queries are likely to suffer significant overhead because of temporary allocations and the mid-to-long life of the entities being enumerated. In the first formulation, the objects have to be stored because they might be modified as you foreach over them (or later). In the second formulation no object management is required because you’re newing up a synthetic object not associated with a table. 
&lt;P&gt;But, there is a strong expectation that code is just a forward-only read operation with at most one temporary business object created per iteration. Especially if you don’t modify the objects in the foreach loop is read-only. That expectation also implies a nifty solution. 
&lt;P&gt;Since the Connection is a property of the context you could reasonably have multiple contexts associated with one connection. In particular a read-only context could be just the thing to avoid all this entity creation. Importantly the read-only context can be connected to the same Connection which means isolation concerns go away – it looks like one logical view of the database. This is important because otherwise a thread owning two DataContext objects could self-deadlock. 
&lt;P&gt;So I recommend creating some kind of read-only context to avoid the overhead of object management when it was not necessary. 
&lt;H3&gt;&lt;B&gt;Data Binding&lt;/B&gt;&lt;/H3&gt;
&lt;P&gt;&lt;B&gt;&lt;/B&gt;
&lt;P&gt;The second item is overhead associated with extracting the data. In the then current implementation there were several virtual calls between the MoveNext/GetCurrent on the enumerator and the actual data field fetches. 
&lt;P&gt;However, if we do query compilation as described in the last installment we have the opportunity to significantly limit the dispatches – as low as one delegate call. 
&lt;P&gt;To do this we need to pre-build the necessary helper for getting and storing the fields so that there is effectively straight line code calling the underlying provider. Basically the same code you would have to write if you were doing the data access manually. 
&lt;P&gt;Pre-compiling it also means we don’t have to jit the helper every time the query runs – we have a place to store the compiled helper function that stores the data. 
&lt;P&gt;At this point we had a pretty good idea what kinds of actions to take to make things a whole lot better.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3605797" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ricom/archive/tags/performance/default.aspx">performance</category><category domain="http://blogs.msdn.com/ricom/archive/tags/databases/default.aspx">databases</category></item><item><title>DLinq (Linq to SQL) Performance (Part 2)</title><link>http://blogs.msdn.com/ricom/archive/2007/06/25/dlinq-linq-to-sql-performance-part-2.aspx</link><pubDate>Tue, 26 Jun 2007 03:10:10 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3532684</guid><dc:creator>ricom</dc:creator><slash:comments>23</slash:comments><comments>http://blogs.msdn.com/ricom/comments/3532684.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ricom/commentrss.aspx?PostID=3532684</wfw:commentRss><description>&lt;p&gt;So after getting some high level times I started digging into the particulars of the costs more broadly and I ended up studying a very simple query like the below one.&amp;nbsp;  &lt;blockquote&gt; &lt;p&gt;Northwinds nw = new Northwinds(conn); &lt;/p&gt; &lt;p&gt;var q = from&amp;nbsp;o in nw.Orders&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where o.OrderId ==&amp;nbsp;orderid&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select o; &lt;/p&gt; &lt;p&gt;foreach (Orders&amp;nbsp;o in q)&lt;br&gt;{&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; …&lt;br&gt;}&lt;/p&gt;&lt;/blockquote&gt; &lt;p&gt;It was the per-query costs that seemed to be the greatest trouble spot in the then-current profiles. Those costs would be the most problematic for complex queries which return comparatively few rows – all too common in business logic.&amp;nbsp; For small numbers of rows the rough bucketization of costs looked like this:&lt;/p&gt; &lt;table&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td&gt;&lt;strong&gt;Category&lt;/strong&gt;&lt;/td&gt; &lt;td&gt;&lt;strong&gt;&amp;nbsp;&amp;nbsp; Time&lt;/strong&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td&gt;Total Benchmark&lt;/td&gt; &lt;td align="right"&gt;100.00%&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp; Query Build&lt;/td&gt; &lt;td align="right"&gt;24.40%&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp; Query Enumeration&lt;/td&gt; &lt;td align="right"&gt;74.55%&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Dispatch Glue&lt;/td&gt; &lt;td align="right"&gt;7.34%&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Jitting Costs&lt;/td&gt; &lt;td align="right"&gt;18.07%&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Data Reading&lt;/td&gt; &lt;td align="right"&gt;49.14%&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp; Misc&lt;/td&gt; &lt;td align="right"&gt;1.06%&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; &lt;p&gt;In short the problem is that the basic Linq construction (we don’t really have to reach for a complex query to illustrate) results in repeated evaluations of the query if you ran the query more than once.  &lt;p&gt;Each execution builds the expression tree, and then builds the required SQL. In many cases all that will be different from one invocation to another is a single integer filtering parameter. Furthermore, any databinding code that we must emit via lightweight reflection will have to be jitted each time&amp;nbsp;the query runs. Implicit caching of these objects seems problematic because we could never know what good policy is for such a cache – only the user has the necessary knowledge.  &lt;p&gt;But all is not lost... the usual parameterized query model seems to be helpful here without unduly complicating everything. You could imagine a sequence something&amp;nbsp;like this:  &lt;blockquote&gt; &lt;p&gt;Func&amp;lt;Northwinds, IQueryable&amp;lt;Orders&amp;gt;, int&amp;gt; q =&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CompiledQuery.Compile&amp;lt;Northwinds,&amp;nbsp;int, IQueryable&amp;lt;Orders&amp;gt;&amp;gt;&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ((Northwinds nw, int orderid) =&amp;gt; &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from o in nw.Orders&amp;nbsp;&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;where o.OrderId == orderid &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select o );  &lt;p&gt;Northwinds nw = new Northwinds(conn);  &lt;p&gt;foreach (Orders o in q(nw, orderid))&lt;br&gt;{&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ...&lt;br&gt;}&lt;/p&gt;&lt;/blockquote&gt; &lt;p&gt;The important thing here now is that q is a durable thing that can be applied to different data contexts and we've identified the orderid paramater.&amp;nbsp; You'll have to forgive my syntax I don't think there's a compiler in existance (old or new) that compiles precisely the above but hopefully you'll get the idea.  &lt;p&gt;Importantly, upon compilation, the query can be reduced to some kind of prepared statement. At this time any helper methods that need to be code-generated are also created. Upon binding we do the minimal query formatting for the constants and no jitting. The compiled query has lifetime specified by the user, so it lives exactly as long as it needs to.  &lt;p&gt;These operations would drastically reduce per query overhead while simultaneously giving us a good place to hang state with suitably lifetime – compiled queries.&lt;/p&gt; &lt;p&gt;That seemed to get us forward progress on the per-query costs but what about the per-row costs?&lt;/p&gt; &lt;p&gt;We had a couple of different ideas to help with those as well.&lt;/p&gt; &lt;p&gt;Stay tuned for part 3.&amp;nbsp; :)&lt;/p&gt; &lt;p&gt;P.S. Keep your eye on &lt;a href="http://blogs.msdn.com/mattwar/archive/2007/06/23/linq-to-sql-learning-to-crawl.aspx"&gt;Matt Warren's Weblog&lt;/a&gt;&amp;nbsp;as he'll likely comment on what I'm saying as the series evolves,&amp;nbsp;&amp;nbsp;it was his hand that actually made the changes I'm talking about here.&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3532684" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ricom/archive/tags/performance/default.aspx">performance</category><category domain="http://blogs.msdn.com/ricom/archive/tags/databases/default.aspx">databases</category></item><item><title>Caching Redux</title><link>http://blogs.msdn.com/ricom/archive/2007/06/25/caching-redux.aspx</link><pubDate>Mon, 25 Jun 2007 18:47:52 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3521960</guid><dc:creator>ricom</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/ricom/comments/3521960.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ricom/commentrss.aspx?PostID=3521960</wfw:commentRss><description>&lt;p&gt;I got some interesting questions about how to build good middle-tier caches in my inbox last week.&amp;nbsp; I cleaned up the responses a little bit and I'm posting them here because they're actually pretty general.&amp;nbsp; I've written about this before but some things merit repeating :)&lt;/p&gt; &lt;p&gt;Here's what I wrote:&lt;/p&gt; &lt;p&gt;If I had a dime for every person who thought caching was the answer but then didn’t actually build a cache… &lt;p&gt;First, consider your cache &lt;b&gt;&lt;i&gt;policy&lt;/i&gt;&lt;/b&gt; carefully.&amp;nbsp; As I’ve often written, &lt;b&gt;&lt;i&gt;caching implies policy&lt;/i&gt;&lt;/b&gt; &lt;p&gt;&lt;a href="http://blogs.msdn.com/ricom/archive/2004/01/19/60280.aspx"&gt;http://blogs.msdn.com/ricom/archive/2004/01/19/60280.aspx&lt;/a&gt; &lt;p&gt;And as I told Raymond – a cache with bad policy is another name for a memory leak &lt;p&gt;&lt;a href="http://blogs.msdn.com/oldnewthing/archive/2006/05/02/588350.aspx"&gt;http://blogs.msdn.com/oldnewthing/archive/2006/05/02/588350.aspx&lt;/a&gt; &lt;p&gt;Raymond turns this into some excellent recommendations, including instrumentation and observation which result in cache design by a &lt;i&gt;quantitative&lt;/i&gt; approach. &lt;p&gt;If I had a dime for everyone who built a cache because they thought it was a good idea but then did not measure the efficacy of what they had built… &lt;p&gt;Explore the space, try rough experiments at different layers and try different policies.&amp;nbsp; Often very aggressive policies (fast retirement of cache data) are effective but you must understand not only how data gets in the cache (that is obvious) but how does it get OUT?&amp;nbsp; Actively or passively?&amp;nbsp; Based on limits or hit rate or? &lt;p&gt;Whatever you do, be sure you do it on the basis of measurements.&amp;nbsp; Any kind of automatic “magic” caching layer that somehow knows about new business objects immediately sounds like a disaster to me.&amp;nbsp; It’s not a question of knowing the business objects it’s a question of knowing usage patterns and policy.&amp;nbsp; I don’t know how to do that automatically -- but maybe your particular problem has patterns you can leverage.&amp;nbsp; I also know that (e.g.) SQL server already has a good cache at the data level and if you do your job right that is often all you need.&lt;/p&gt; &lt;p&gt;Policy is everything. &lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3521960" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ricom/archive/tags/performance/default.aspx">performance</category><category domain="http://blogs.msdn.com/ricom/archive/tags/design+advice/default.aspx">design advice</category><category domain="http://blogs.msdn.com/ricom/archive/tags/databases/default.aspx">databases</category></item><item><title>DLinq (Linq to SQL) Performance (Part 1)</title><link>http://blogs.msdn.com/ricom/archive/2007/06/22/dlinq-linq-to-sql-performance-part-1.aspx</link><pubDate>Fri, 22 Jun 2007 22:20:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3466539</guid><dc:creator>ricom</dc:creator><slash:comments>41</slash:comments><comments>http://blogs.msdn.com/ricom/comments/3466539.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ricom/commentrss.aspx?PostID=3466539</wfw:commentRss><description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;[ By popular demand, here are links for all 5 parts in the series&amp;nbsp;&lt;A class="" title="Part 1" href="http://blogs.msdn.com/ricom/archive/2007/06/22/dlinq-linq-to-sql-performance-part-1.aspx" mce_href="http://blogs.msdn.com/ricom/archive/2007/06/22/dlinq-linq-to-sql-performance-part-1.aspx"&gt;Part 1&lt;/A&gt;, &lt;A href="http://blogs.msdn.com/ricom/archive/2007/06/25/dlinq-linq-to-sql-performance-part-2.aspx"&gt;Part 2&lt;/A&gt;, &lt;A href="http://blogs.msdn.com/ricom/archive/2007/06/29/dlinq-linq-to-sql-performance-part-3.aspx"&gt;Part 3&lt;/A&gt;, &lt;A href="http://blogs.msdn.com/ricom/archive/2007/07/05/dlinq-linq-to-sql-performance-part-4.aspx"&gt;Part 4&lt;/A&gt;, &lt;A href="http://blogs.msdn.com/ricom/archive/2007/07/16/dlinq-linq-to-sql-performance-part-5.aspx"&gt;Part 5&lt;/A&gt;&lt;BR&gt;&amp;nbsp; -Rico ]&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;I love Linq.&amp;nbsp; Really.&amp;nbsp; That might scare you because it has all these wacky new constructs and as a performance guy you'd think that I'd be all about getting rid of abstractions and just getting to the metal.&amp;nbsp; But don't be scared, I haven't lost my mind.&amp;nbsp; Linq is great because, even though it adds some levels of complexity, it simulataneously increases the chunkiness of the work that the framework receives in such a way that it creates fantastic opportunities to deliver performance.&amp;nbsp; Just like SQL can do a great job optimizing database queries because they are chunky enough.&lt;/P&gt;
&lt;P&gt;And speaking of databases, DLinq is really where the opportunities for amazing coolness are present.&lt;/P&gt;
&lt;P&gt;I first started looking at the performance of DLinq (it's officially called Linq to SQL but I still call it DLinq)shortly after the May 2006 CTP -- the very same one many of you are still using.&amp;nbsp; There were some great opportunities at that time and I'm happy to report that we've capitalized on a lot of what we found then, in fact I'll be writing about that in the next few postings.&amp;nbsp; But for today I want to talk about how things looked back in May of 2006.&amp;nbsp; How they might still look to you at this very moment if you're using the May 2006 CTP.&lt;/P&gt;
&lt;P&gt;I wanted to look at the basics of DLinq performance in a very simple case to get an idea what the raw overhead was.&amp;nbsp; So I set up a harsh test environment:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;northwinds database, local 
&lt;LI&gt;many queries have already been run so the database is hot, no disk activity 
&lt;LI&gt;the body of the dlinq query is minimal so all that code is hot 
&lt;LI&gt;no enties need be stored, so CLR memory is also hot&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Why would I do this?&amp;nbsp; Because it's the "nightmare" scenario for DLinq.&amp;nbsp; There is no business logic.&amp;nbsp; There is no database latency.&amp;nbsp; There is no inherent CLR overhead associated with the processing.&amp;nbsp; The cost is DLinq and nothing but DLinq.&amp;nbsp;&amp;nbsp; Normally there is some business processing so the DLinq code is only a portion, maybe even a small portion of the total processing time.&amp;nbsp; Normally you have to connect to a database over the network and maybe even read the data off disk, again that reduces the portion of time you spend waiting for DLinq.&amp;nbsp; But not in my test case... my test case is beating on DLinq so I can see how it stands up.&lt;/P&gt;
&lt;P&gt;I won't show you the whole program (you could create it in a few seconds with the CTP) but the key parts are like this:&lt;/P&gt;
&lt;P&gt;Here's the query&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;var q = from o in nw.Orders&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select new OrderDetail&amp;nbsp; {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; OrderID = o.OrderID,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CustomerID = o.CustomerID,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;EmployeeID = o.EmployeeID,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ShippedDate = o.ShippedDate&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Then we run this in a loop&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;foreach (var detail in q)&lt;BR&gt;{&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; sum += detail.OrderID;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; count++;&lt;BR&gt;}&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Note the highly useful (sarcasm) business logic -- adding the OrderId.&amp;nbsp; I did that so&amp;nbsp;I could print the total at the end and make sure as I change the test that I was really reading them.&amp;nbsp; Same for the count.&amp;nbsp;&amp;nbsp; Any real application would do something -- anything -- with those order detail lines, my application is basically throwing them away. 
&lt;P&gt;OK great so I have a little test harness scientifically designed to maximize the DLinq overhead.&amp;nbsp; What do I compare it against? 
&lt;P&gt;Well I wrote this other program that gets the same data and does the same (trivial) operation the old fashioned way so that I could compare. 
&lt;P&gt;The body of that one looks like this: 
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SqlCommand cmd = con.CreateCommand();&lt;BR&gt;string cmdText = "select OrderID, CustomerID, EmployeeID, ShippedDate from Orders";&lt;BR&gt;cmd.CommandText = cmdText; 
&lt;P&gt;SqlDataReader dr = cmd.ExecuteReader(); 
&lt;P&gt;while (dr.Read())&lt;BR&gt;{&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; OrderDetail o = new OrderDetail(); 
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.OrderID = dr.GetInt32(0);&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; o.CustomerID = dr.GetString(1);&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (!dr.IsDBNull(2)) o.EmployeeID = dr.GetInt32(2); else o.EmployeeID = null;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (!dr.IsDBNull(3)) o.ShippedDate = dr.GetDateTime(3); else o.ShippedDate = null; 
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; sum += o.OrderID;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; count++;&lt;BR&gt;}&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;It's a lot more code but that's how you do it using just SqlDataReader. 
&lt;P&gt;So now I can measure how many queries/sec I can do with the DLinq code and compare it to the SQLDataReader equivalent. 
&lt;P&gt;Any guesses?&amp;nbsp; DLinq can't be faster because of course it uses SqlDataReader itself to do the job so the best you could get is a tie. 
&lt;P&gt;No peeking now. 
&lt;P&gt;Think about it.&amp;nbsp; How much slower do you think DLinq was in May of 2006 in this "worst case" scenario. 
&lt;P&gt;Got your number? 
&lt;P&gt;Last chance now. 
&lt;P&gt;Final answer? 
&lt;P&gt;OK here's what I got when I did the experiment, back in July of 2006.&amp;nbsp; 
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE class=""&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class=""&gt;&lt;STRONG&gt;Build/Test&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD class=""&gt;&lt;STRONG&gt;Time for&lt;BR&gt;500 Queries&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD class=""&gt;&lt;STRONG&gt;Queries/sec&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;May 2006 CTP&lt;/TD&gt;
&lt;TD class=""&gt;8.027s&lt;/TD&gt;
&lt;TD class=""&gt;62.29&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class=""&gt;Raw Cost (SQLDataReader)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/TD&gt;
&lt;TD class=""&gt;1.094s&lt;/TD&gt;
&lt;TD class=""&gt;457.04&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can pretty much eyeball it from those times.&amp;nbsp; In May 2006 DLinq is running at about 1/8 the speed of the underlying provider (13.62%).*&amp;nbsp;**&lt;/P&gt;
&lt;P&gt;We can do better than that.&amp;nbsp; And we did...&lt;/P&gt;
&lt;P&gt;Stay tuned for the details and some modern era DLinq results.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;*Remember no real application would ever see a result as poor as&amp;nbsp;13.62% because of course they would be doing "actual work" as well as the DLinq operations resulting in more comparable performance.&lt;/P&gt;
&lt;P&gt;**Sekiya Sato (see below) pointed out an error in my original benchmark in which I had one of my ISDBNull() checks backwards.&amp;nbsp; That error&amp;nbsp;made the "nolinq" version actually run 3.6% faster than it should have.&amp;nbsp; So the number I reported, 13.62% should have actually been 14.09% -- let me restate that result for clarity, in May 2006, DLinq was running at 14.09% of the underlying provider speed in this (harsh) test case on my hardware and not 13.62% as previously reported.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3466539" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ricom/archive/tags/performance/default.aspx">performance</category><category domain="http://blogs.msdn.com/ricom/archive/tags/databases/default.aspx">databases</category></item><item><title>Optimistic vs. Pessimistic Locking consequences</title><link>http://blogs.msdn.com/ricom/archive/2004/06/24/165063.aspx</link><pubDate>Thu, 24 Jun 2004 19:39:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:165063</guid><dc:creator>ricom</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/ricom/comments/165063.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ricom/commentrss.aspx?PostID=165063</wfw:commentRss><description>&lt;P&gt;Putting on my "database guy" hat, a few days I was asked to comment on locking techniques for databases and to point out some problems with using transactions as a pessimistic locking technique.&amp;nbsp; Here's what I said...&lt;/P&gt;
&lt;P&gt;Some definitions:&lt;/P&gt;
&lt;P&gt;Optimistic locking -- that's where you assume things will go well and design your locks so that they handle conflicts as the exceptional case&lt;/P&gt;
&lt;P&gt;Pessimistic locking -- the converse where you assume conflicts are likely and create some kind of reservation system where sections are locked while they are edited&lt;/P&gt;
&lt;P&gt;[In response to the suggestion, you must] remember that transactions are not really durable things and that any strategy that is designed around transactions having very long life (e.g. minutes) must also be designed around the chance that such a transaction is aborted for any number of reasons, not the least of which is deadlock.&lt;BR&gt;&amp;nbsp;&lt;BR&gt;The math of the situation tells you that the longer running a transaction is the lower the likelihood you can commit it.&lt;BR&gt;&amp;nbsp;&lt;BR&gt;So using something as simple as open transactions for your locking strategy is pretty much right out.&amp;nbsp; &lt;BR&gt;&amp;nbsp;&lt;BR&gt;Now the moment you choose a scheme where a given client applies a (pessimistic) software lock on some set of objects you have another set of issues -- what if the client disconnects?&amp;nbsp; How to you administer and recover these locks?&amp;nbsp; What operations do you allow on the locked objects?&amp;nbsp; How will you provide (manually) the isolation that's needed (if any) to hide any in-flight changes that are committed from the database perspective but pending based on business rules?&lt;BR&gt;&amp;nbsp;&lt;BR&gt;An optimistic locking strategy provides one omnibus solution to these problems.&amp;nbsp; Other solutions are possible but you should not underestimate their difficultly because they create interim/valid committed states.&amp;nbsp; States you might find yourself in, for instance, after a backup, with no supporting client info what-so-ever.&lt;BR&gt;&amp;nbsp;&lt;BR&gt;It's not that the pessimistic lock is inherently bad but once you've decided on such a path, you then have a variety of other problems to solve.&amp;nbsp; By its nature optimistic locking does not create interim valid states and so finesses many of these issues.&lt;/P&gt;
&lt;P&gt;It turns out many of these same issues apply to pretty much any free threaded datastructure.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=165063" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ricom/archive/tags/design+advice/default.aspx">design advice</category><category domain="http://blogs.msdn.com/ricom/archive/tags/databases/default.aspx">databases</category></item><item><title>Some thoughts/advice about databases and caching</title><link>http://blogs.msdn.com/ricom/archive/2004/05/10/129307.aspx</link><pubDate>Mon, 10 May 2004 19:57:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:129307</guid><dc:creator>ricom</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.msdn.com/ricom/comments/129307.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ricom/commentrss.aspx?PostID=129307</wfw:commentRss><description>&lt;P&gt;As usual, kindly allow me to speak in rough terms so that I might be brief keeping in mind that there are exceptions to what I write below.&lt;/P&gt;
&lt;P&gt;I think the most important thing to remember about serving data from a database is that, as a practical matter, you can&amp;#8217;t realistically serve data that you know to be absolutely current. Even if you somehow get the data from the database and&amp;nbsp;magically&amp;nbsp;format it up in zero time there are enough delays along the way that it might have changed by the time your end user sees it and tries to do something with it.&lt;/P&gt;
&lt;P&gt;Your user is going to be seeing stale data, the only question is, how stale.&lt;/P&gt;
&lt;P&gt;Not only is the data going to be stale, it&amp;#8217;s also likely to lack self-consistency.&amp;nbsp; For instance, unless you go to very great pains (and cost), when your user asks for &amp;#8220;page 2&amp;#8221; of a dataset via the &amp;#8220;give me page 2&amp;#8221; button most algorithms for computing &amp;#8220;page 2&amp;#8221; might return duplicate items, or might miss items entirely if something is added or removed that would have been on &amp;#8220;page 1&amp;#8221;.&lt;/P&gt;
&lt;P&gt;Yet despite these theoretical and practical limitations, real people interact with real databases every day and have satisfactory experiences.&amp;nbsp; This comes about because in practice good database analysts think carefully about the sorts of queries that will be made against their database, and the elementary business operations that will change the contents of that database, plus the nature of the user interface that will be doing the asking.&lt;/P&gt;
&lt;P&gt;Now at last we come to the performance part of this discussion&amp;nbsp; :)&lt;/P&gt;
&lt;P&gt;While you&amp;#8217;re thinking about the sort of experience that you want your user to have, it&amp;#8217;s vital to consider what sort of tolerance you can afford to staleness and correctness and how to best leverage that design to get the best overall experience (including performance) for your customers.&lt;/P&gt;
&lt;P&gt;I&amp;#8217;m sure many people are screaming about trading off correctness for performance at this point but I&amp;#8217;ll remind you that delivering 100% correct data to your customers is impossible as a practical matter &amp;#8211; nobody could afford such a system.&amp;nbsp; So from there good engineering forces you to think, &amp;#8220;Since perfection is far too expensive, what guarantees &lt;STRONG&gt;should &lt;/STRONG&gt;I be making?&amp;#8221;&lt;/P&gt;
&lt;P&gt;At the core of the database the kinds of consistency guarantees that you want to make will translate into factors like the size of your transactions and the breadth of allowed queries.&amp;nbsp; Having made those choices you are then in a position to select a suitable isolation level to facilitate them.&lt;/P&gt;
&lt;P&gt;The choices you make in the next layer, your business objects, tend to mirror the choices made at the database level.&amp;nbsp; At this level you might concern yourself with caching choices to avoid a lot of round-trips to the database.&amp;nbsp; Avoiding&amp;nbsp;round-trips is what caching is all about.&lt;BR&gt;&lt;STRONG&gt;&lt;BR&gt;Some things to consider:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Every caching strategy is based on &lt;STRONG&gt;not&lt;/STRONG&gt; asking the database a question it otherwise would have had to answer which ultimately increases the staleness of the data you serve.&amp;nbsp;&amp;nbsp; Is a time-based staleness guarantee suitable for your application?&amp;nbsp; This approach is actually a lot more versatile than it might seem.&lt;/P&gt;
&lt;P&gt;What cache-hit ratio do you need to have for your cache to be a net win?&amp;nbsp; Remember SQL itself caches the underlying data which partly competes with any caching you might do yourself.&amp;nbsp; Given the mix of queries you expect to arrive at your business layer, what hit rate would you need to actually see a benefit?&amp;nbsp; &lt;/P&gt;
&lt;P&gt;Caching is no substitute for a well considered indexing strategy on the server itself.&amp;nbsp; Enormous caches should be the province of the database, their presence elsewhere often indicates that not enough attention has been paid to the schema and indexes. &lt;/P&gt;
&lt;P&gt;Is it fairly easy to detect when your cache is &amp;#8220;too stale&amp;#8221; or does that require a complex computation?&amp;nbsp; If it requires complex calculations you might want to try something simpler, or you might want to try reversing the logic so that the server notifies you of potential changes &amp;#8211; but beware, complicating the back end might add friction to the whole system making it very difficult to do large batches of updates for instance.&amp;nbsp; Complex notification systems are frequently not worth the effort or the expense and of course they do not necessarily provide a 100% up-to-date data guarantee either.&lt;/P&gt;
&lt;P&gt;One caching policy is not likely to be suited to all your applications needs.&amp;nbsp; For instance, you can probably have an astonishingly large time tolerance on data you use for things like converting from a &amp;#8220;State ID&amp;#8221; to a &amp;#8220;State Name&amp;#8221; but I could hardly recommend the same for information associated with an &amp;#8220;Order ID.&amp;#8221; &lt;/P&gt;
&lt;P&gt;A cache that holds &amp;#8220;partly cooked&amp;#8221; data which can be assembled or otherwise quickly processed to create &amp;#8220;fully cooked&amp;#8221; results is often preferable.&amp;nbsp; The partly cooked results can be used in a greater variety of queries and often take less space, increasing the effective size of the cache.&amp;nbsp;&amp;nbsp; In web servers this means caching fragments of pages, or caching the raw data from which the HTML can be quickly formatted, in preference to caching entire HTML pages.&lt;/P&gt;
&lt;P&gt;If a cache will be a key feature of your business layer be sure to optimize the underlying schema so that it is suitable for the cache populating queries rather than potentially optimizing for accessing singleton objects.&amp;nbsp; This is akin to designing DRAMs so that they are best at filling the CPU's cache lines and not optimizing for byte-wise access.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Summary&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Select a simple caching mechanism that produces a predictable and satisfactory customer experience, does not add much in the way of complexity, and co-operates well with the underlying schema.&lt;BR&gt;&lt;BR&gt;Hold the mayo :)&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=129307" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ricom/archive/tags/performance/default.aspx">performance</category><category domain="http://blogs.msdn.com/ricom/archive/tags/design+advice/default.aspx">design advice</category><category domain="http://blogs.msdn.com/ricom/archive/tags/databases/default.aspx">databases</category></item></channel></rss>