For primers, you need to read this post first: http://buffered.io/2008/07/29/unit-tests-boldly-crossing-boundaries-and-gently-breaking-rules/

So the other day I received an email from a group that I am assisting in understanding TDD, Test Concepts, and Separation of Concerns, and they asked, "Is this relevant"? Well here's the reply to them.

Relevant: Yes and no.
There's some good conversation going on here but the critical concept, IMHO, regarding what constitutes the unit test case is totally being missed. They're getting so caught up on testing the side effects in the database vs actually what a unit test should be asserting that they've missed the forest through the trees, so to speak.

So to get there in full understanding we're actually going to need to go through the whole design for testability process and some Object Thinking. Bear with me and just come along for the ride as there's going to be required "side trips", so to speak to understand all the reasoning going on here. If you're already there and that's remedial, just have a smoke and look out the window while we get to our destination. :)


So for instance, I am going to go with the simplest solution they could have written for this repository: it is implementing some interface (good!) and it uses some ADO.Net data provider to invoke based on input parameters (or as we'd say in modeling, received messages). This is as basic a pattern implementation as we can get (if you are using some mapping software underneath the concepts will be there but there's some extra thoughts depending on the concrete technology that we'll talk about later). You see, the unit test is about the REPOSITORY only, even if it has a collaborator(s). We don't and shouldn't be testing the side effects in / caused by the collaborator but instead the ACTIONS the SUBJECT UNDER TEST (SUT) will take based on INPUT made during the test. So to put it another way, the good unit tests are not "if we get a new Product entity that it writes to the database". That's a *** unit test. Instead the test should be confirming "if we get a new Product entity that the SUT makes this call with these inputs to the collaborator and those input values should be correspond to the original Product instance input". If we think about tests correctly then you should see that there are NO side effects as we're testing the SUT in ISOLATION. J

[sidetrip warning]
The catch here to make this work is that we need to be programming against the abstract provider model and not against the concrete types in ADO.Net Common Data assembly. This is just like the reason we have an interface for the repository in the first place. Let's not fall down and make mistakes on the implementation of the repository now too. So to walk through the why we're first going to need to go over the better way, testing be damned, this repository should have been written in the first place.

 So to get a somewhat reasonable but still simple implementation let's try this class prototype:

public class MyRepository : IMyRepository
{
    public MyClass GetById(Int32 id)
    {
// omitted for now
    }

All is fine and dandy and the type of code you'd commonly expect to see in this type of pattern. So many times we'd see people implement the GetById method as such (IDisposable pattern omitted for brevity):

var conn = new SqlConnection(this.ConnectionString);
var cmd = new SqlCommand("SOMESQL", conn);
conn.Open();
var reader = cmd.ExecuteReader();
//yada yada yada

 

This is simple, direct, and an almost ubiquitous implementation style to the point of being holy scripture. It also happens, frankly to be WRONG.

First of all we shouldn't be dependent on the concrete provider (SqlClient in this case) but instead on the common interfaces. So if we were to "correct" this we'd instead see code at least written like this:

IDbConnection conn = new SqlConnection(this.ConnectionString);
IDbCommand cmd = conn.CreateCommand();
cmd.CommandText = "SOMESQL";
conn.Open();
var reader = cmd.ExecuteReader();
//yada yada yada 

So we're somewhat better with the code that used the provider but we still have problems with the factory line. We're still tightly coupled to the SqlClient because we're taking on too many responsibilities (think back to the Single Responsibility Principal); Our single responsibility as a repository is to mediate interactions between the DOMAIN MODEL and the DATASTORAGE layers, not acting as a factory for the data layer. So instead, we're do we get the actual IDbConnection instance at runtime if we don't create it? We document in the code it as a dependency then! That's accomplished in most class based languages via a constructor. So let's rewrite this code as this:

 public class MyRepository : IMyRepository
{
    public MyRepository(IDbConnection conn)
    {
        this.Connection = conn;
    } 

    protected IDbConnection Connection { get; set; }

    public MyClass GetById(Int32 id)
    {
// omitted for now
    }
}

 "GREAT" you say. "But isn't that cheating? All we did is push the problem to some other code?". Well yes and no. Yes the responsibility now is going to live somewhere else (but that's a good thing as now we can get the repository focused back on doing what it is supposed to be doing) but no it isn't cheating as that's what TDD is all about. Modeling via best practices (basically DDD) clearly our discrete behaviors into individual partitions (classes) that can be brought together in interesting ways to get things done (e.g. solve the business problem) and by leveraging an INTENTION REVEALING INTERFACE that helps add to the self-documentation of the system and adds to the richness of the system model. By clearly documenting via the constructor our dependencies, we achieve both.

 Of course when the metal hits the road someone needs to create it. This is where we can write some code to mediate the resolution of the type or in more modern best practices leverage DI containers which all the major vendors have facilities to perform this type of work for you automatically. However the manual code approach would be to use the ADO.Net Factory types to build up the connection at runtime. You might be tempted to just take the easy way out and just use them directly in the repository implementation. Thus is the path of madness however! Think about it, the use of the factory types directly in the repository now means that the repository is making assumptions on the configuration of the hosting application! In addition, it removes the clarity of the collaboration from the code by the removal of the constructor parameter!

[end of sidetrip]

"OK", you say. "I'm picking up what you are laying down and am drinking the cool-aid so let's get back to the core focus, unit testing and the repository". So because we actually decided to follow best practices (DDD) to begin with, we actually find ourselves in the interesting situation where the code is intrinsically easy to test! Because we now have declared via constructor(s) our external dependencies (COLLABORATORS) and pushed factory concerns into their own problem space, we can actually write GOOD unit tests that focus on the subject in ISOLATION.

So the one thing you should walk away from this diatribe is this: A good unit test for the repository should document and assert the interactions of the SUT with the underlying datalayer and NOT what the datatlayer does in response! Because we can supply any implementation of IDbConnection/Command/Parameter we want, this means we don't need a data base as we can simply perform some test setup for mock data provider (dynamic or not) and supply them to the repository and the test should be that we check that if we supply this input to the repository, the repository makes these calls, in this order, with these inputs to the underlying data layer. THAT'S A UNIT TEST, not what this guy is doing.

The entire point of the original blog post is crap because it's a complete NON-ISSUE in properly designed code in the first place. So if they had followed proper decoupling to begin with, I will assert that most of the time that this situation this blog is discussing is moot for unit tests.
 
The irony here is that if they've switched gears and we actually wanting to perform some good basic INTEGRATION TESTS, they'd of nailed it! As I've said before, the tools and technologies are in general all the same (though of course you would want to organize them to make it clear what is a unit vs some other type of test) but the approach and the mindset you should have when designing them is different.

Now of course when you are using something that isn't directly the Ado.Net provider model then you need to see what technology you are using and isolate it where you can. Sometimes the choice of technology makes this easy (NHibernate for instance has a very good set of interfaces and an in-memory execution model to support unit testing) other times it doesn't (EF comes to mind). There you can introduce abstractions to get over the hurdle (adding/implementing interfaces on your DataContext for example that you take a dependency on) or leverage tools like MS Moles which allow you to intercept and supply your own implementation of any type in the framework.

Here's an reply to my thread from my associate Chris Martinez that has some good points as well that I am going to reproduce now:

[quote]
As usual, Jimmy is spot-on.  To expand slightly further, TDD is a software development technique; it is not a C# technique or any other specific language for that matter.  An obvious, but highly overlooked concept is a database unit test.  A database unit test you say?  Why a database unit test? 

Does a database not have functions and procedures?  This would also be applicable for any SQL or DML statement that causes implicit operations in views, triggers, etc.  If you would create unit tests for functions and methods in C# or some other language, why not the database?  While this is not a new concept, it is often overlooked.  Surprisingly, Visual Studio has supported this concept for years.  As Jimmy points out, there are actually two distinct unit tests: one against the repository and one against the database.  In the provided example we are trying test the database command (most likely a stored procedure) that is invoked.  We are not trying to test ADO.NET.  While it's true that at some point the two tests can be unioned together to verify end-to-end functionality, that is not a unit test; that is an integration test.

In some cases, a database unit test may not be necessary.  For example, if we are using LINQ to SQL or LINQ to Entities, we don't need to test whether the LINQ provider generates the correct SQL; we get that for free.  We do, however, need to verify that our LINQ statements are correct such that they render the correct SQL.  This type of test is still an integration test.  While it's possible to use tools such as Moles to perform unit tests, I don't see the benefit in this case.  We shouldn't need to unit test shrink-wrapped products.  This, however, does not subvert the need for integration tests.

A scenario of when you might need to unit test an out-of-the-box product would be extensions to the product itself.  For example, in one of our frameworks I created a series of extension methods to the DataServiceContext for WCF Data Services.  In this case, how can I test the extension methods against the DataServiceContext without actually creating and executing a data service?  This is where tools such as Moles shine.  I can mock, stub, and intercept the data service invocations in a unit test to simulate the conditions of executing a real data service.  This can include the behavior of things working as expected as well as the behavior of what happens when things do not go as expected (ex: exceptions).

A TDD concept that is beginning to gain popularity (Pex comes to mind) is a "parameterized unit test".  Parameterized unit tests afford reuse of tests by passing a limited, controlled set of parameters to a unit test.  Using this technique, you could have a single, parameterized unit test that can accept a mocked repository or the real repository for an integration test assuming they are in the same project.  This ultimately reduces the amount test code that needs to be written, which IMHO every developer dreads.

A key concept to keep in mind is the business requirement at hand.  We must be careful not to infer nonexistent requirements for the purposes of testing. For example, the requirement "Insert a new customer" should not be interpreted as "Insert a new customer into the database".  From the perspective of the application, inserting a customer might be in-memory, in an XML, or managed by "magic data fairies".  The application does not know and should not care exactly what happens when the InsertCustomer() method is called.  Of course we know that a database is ultimately going to be used, but we must be very cautious to wear the right hat, at the right time.
[/quote]