[ By popular demand, here are links for all 5 parts in the series Part 1, Part 2, Part 3, Part 4, Part 5
  -Rico ]

I love Linq.  Really.  That might scare you because it has all these wacky new constructs and as a performance guy you'd think that I'd be all about getting rid of abstractions and just getting to the metal.  But don't be scared, I haven't lost my mind.  Linq is great because, even though it adds some levels of complexity, it simulataneously increases the chunkiness of the work that the framework receives in such a way that it creates fantastic opportunities to deliver performance.  Just like SQL can do a great job optimizing database queries because they are chunky enough.

And speaking of databases, DLinq is really where the opportunities for amazing coolness are present.

I first started looking at the performance of DLinq (it's officially called Linq to SQL but I still call it DLinq)shortly after the May 2006 CTP -- the very same one many of you are still using.  There were some great opportunities at that time and I'm happy to report that we've capitalized on a lot of what we found then, in fact I'll be writing about that in the next few postings.  But for today I want to talk about how things looked back in May of 2006.  How they might still look to you at this very moment if you're using the May 2006 CTP.

I wanted to look at the basics of DLinq performance in a very simple case to get an idea what the raw overhead was.  So I set up a harsh test environment:

  • northwinds database, local
  • many queries have already been run so the database is hot, no disk activity
  • the body of the dlinq query is minimal so all that code is hot
  • no enties need be stored, so CLR memory is also hot

Why would I do this?  Because it's the "nightmare" scenario for DLinq.  There is no business logic.  There is no database latency.  There is no inherent CLR overhead associated with the processing.  The cost is DLinq and nothing but DLinq.   Normally there is some business processing so the DLinq code is only a portion, maybe even a small portion of the total processing time.  Normally you have to connect to a database over the network and maybe even read the data off disk, again that reduces the portion of time you spend waiting for DLinq.  But not in my test case... my test case is beating on DLinq so I can see how it stands up.

I won't show you the whole program (you could create it in a few seconds with the CTP) but the key parts are like this:

Here's the query

var q = from o in nw.Orders
            select new OrderDetail  {
                OrderID = o.OrderID,
                CustomerID = o.CustomerID,
                EmployeeID = o.EmployeeID,
                ShippedDate = o.ShippedDate
           }

Then we run this in a loop

foreach (var detail in q)
{
    sum += detail.OrderID;
    count++;
}

Note the highly useful (sarcasm) business logic -- adding the OrderId.  I did that so I could print the total at the end and make sure as I change the test that I was really reading them.  Same for the count.   Any real application would do something -- anything -- with those order detail lines, my application is basically throwing them away.

OK great so I have a little test harness scientifically designed to maximize the DLinq overhead.  What do I compare it against?

Well I wrote this other program that gets the same data and does the same (trivial) operation the old fashioned way so that I could compare.

The body of that one looks like this:

SqlCommand cmd = con.CreateCommand();
string cmdText = "select OrderID, CustomerID, EmployeeID, ShippedDate from Orders";
cmd.CommandText = cmdText;

SqlDataReader dr = cmd.ExecuteReader();

while (dr.Read())
{
    OrderDetail o = new OrderDetail();

    o.OrderID = dr.GetInt32(0);
    o.CustomerID = dr.GetString(1);
    if (!dr.IsDBNull(2)) o.EmployeeID = dr.GetInt32(2); else o.EmployeeID = null;
    if (!dr.IsDBNull(3)) o.ShippedDate = dr.GetDateTime(3); else o.ShippedDate = null;

    sum += o.OrderID;
    count++;
}

It's a lot more code but that's how you do it using just SqlDataReader.

So now I can measure how many queries/sec I can do with the DLinq code and compare it to the SQLDataReader equivalent.

Any guesses?  DLinq can't be faster because of course it uses SqlDataReader itself to do the job so the best you could get is a tie.

No peeking now.

Think about it.  How much slower do you think DLinq was in May of 2006 in this "worst case" scenario.

Got your number?

Last chance now.

Final answer?

OK here's what I got when I did the experiment, back in July of 2006. 

 

Build/Test Time for
500 Queries        
Queries/sec
May 2006 CTP 8.027s 62.29
Raw Cost (SQLDataReader)       1.094s 457.04

 

You can pretty much eyeball it from those times.  In May 2006 DLinq is running at about 1/8 the speed of the underlying provider (13.62%).* **

We can do better than that.  And we did...

Stay tuned for the details and some modern era DLinq results.

 

*Remember no real application would ever see a result as poor as 13.62% because of course they would be doing "actual work" as well as the DLinq operations resulting in more comparable performance.

**Sekiya Sato (see below) pointed out an error in my original benchmark in which I had one of my ISDBNull() checks backwards.  That error made the "nolinq" version actually run 3.6% faster than it should have.  So the number I reported, 13.62% should have actually been 14.09% -- let me restate that result for clarity, in May 2006, DLinq was running at 14.09% of the underlying provider speed in this (harsh) test case on my hardware and not 13.62% as previously reported.