I don’t know if you’re the same, but when coding away I often find myself wishing for a String.Unformat function – call it the evil twin of String.Format. With String.Format I can build up strings like this;
var result = String.Format(
"http://{0}:{1}/{2}",
"localhost",
"12345",
"TestPage.aspx");
... which will return "http://localhost:12345/TestPage.aspx".
But what if I want to do the opposite; split out the contents of this URL, in one easy statement, much like String.Format? What if I could do this...?
string input = @"http://localhost:12345/TestPage.aspx";
object[] results = input.Unformat(@"http://{0}:{1}/{2}");
CollectionAssert.AreEquivalent(
new object[] { "localhost", "12345", "TestPage.aspx" },
results);
Of course something similar is already possible with Regular Expressions, but I find them much more unwieldy, they result in longer code, they’re overkill for what I’m trying to achieve, and they’re difficult to remember the syntax (oh, and even harder to get it right when coding late at night!). Take this equivalent example and judge for yourself;
string input = @"http://localhost:12345/TestPage.aspx";
string matchingformat = @"^http://(?<C1>.+):(?<C2>.+)/(?<C3>.+)$";
Regex expression = new Regex(matchingformat);
Match match = expression.Match(input);
Assert.AreEqual<object>("localhost", match.Groups["C1"].Value);
Assert.AreEqual<object>("12345", match.Groups["C2"].Value);
Assert.AreEqual<object>("TestPage.aspx", match.Groups["C3"].Value);
To get around this I have created my own String.Unformat function... all it does is take a simple format string and convert it into a regular expression by applying a number of transforms to it. And how do we do these transforms? With regular expressions, of course J
Now before we dive in here, I must make a point. This is not a replacement for regular expressions. It will be difficult to get the precision you need for anything complex – I’m only after a quick and simple solution to the basic case problem. Knowing this, read on...
How it does it
My sample code (and a bunch of demonstrations in the form of unit tests) is attached. Basically there are five phases to the code;
1. It escapes any special Regular Expression characters (e.g. ^, [, ], $ and so on) with a backslash so that they don’t accidentally affect the matching... they are assumed to be literal in the input string.
2. It replaces the {0} match syntax with regular expression (?<C1>.+) syntax.
3. It adds begin (^) and end ($) markers to the match string.
4. It performs the match.
5. It loops through the results, extracting matches and adding them to an object array to return.
Have a look in the attached and see what you think.
But...
The regular expressions I’m using are compiled where possible... but of course I’m applying multiple transforms (i.e. regular expressions) to a string just to do a simple match.
What does this mean? It means this will always be slower than if you just wrote a regular expression yourself (check out here and here if you want to)... so if performance really is that key to your code, or you’re doing this in a long loop, you might want to avoid it. If on the other hand you can see a great use for this when performance isn’t key – give it a go and let me know what you think!
I have also not had time to write a full suite of real unit tests, so if you find a bug shout up.
Conclusion
What do you think? Have I created a monster, or cooked up a treat?
After writing this I came across an interesting conversation here... it seems I have not been alone in my desires for String.Unformat.
Recently I’ve come across a requirement to easily switch between two Visual Studio 2008 configurations. I might need these two configurations open at the same time (so can’t just manually edit settings in Tools->Options), and don’t want any lengthy scripts, log on/log off activities, or virtualisation involved... mainly because each of those adds effort and is therefore a possible blocker to adoption.
One solution seemed to be using the “RunAs” command, and to configure each user’s settings differently. But I have seen issues with using RunAs – for example, a TFS installation with a customer failed a number of times and we couldn’t work out why (and never did), but logging on instead of using RunAs solved the issue. It can also take a while to do the logon before starting the executable. So I’m a little cautious of recommending it.
After a couple of internal conversations around solutions the Visual Studio SDK was discussed, which suddenly reminded me of the “Experimental Hive”. This is just another place in the Registry that Visual Studio can be instructed to load settings from instead of the usual place, and is generally used when testing customisations to Visual Studio (e.g. Domain Specific Languages) that might cause it to fail – so you don’t want to corrupt your main VS install.
It occurred to me that this is pretty much exactly what I’m after! Two sets of Visual Studio settings that can be used independently without machine virtualisation, user virtualisation, or anything else that might seem like overkill.
Solution
To get this working is quite simple. First, you need the VsRegEx.exe command from the Visual Studio 2008 SDK 1.1. Next, change into the directory it is in (I found it under “C:\Program Files (x86)\Microsoft Visual Studio 2008 SDK\VisualStudioIntegration\Tools\Bin”, but it may differ on your machine), and run the following command;
vsregex GetOrig 9.0 Test ranu
I’ve chosen to call my hive “Test”, but you could call it something else. This creates a new Registry hive under “HKCU\Software\Microsoft\VisualStudio\9.0Test” (alongside the usual “HKCU\Software\Microsoft\VisualStudio\9.0”) and file system storage under “%USERPROFILE%\Application Data\Microsoft\VisualStudio\9.0Test” (alongside the usual “%USERPROFILE%\Application Data\Microsoft\VisualStudio\9.0”).
Next, change to the directory that Visual Studio 2008’s “devenv.exe” command is stored. On my machine this is under “C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE” – again, yours may differ. You can then start Visual Studio with the following command line;
devenv /RootSuffix Test /ranu
This starts VS using your new “Test” hive. You could even setup a shortcut to start this version of Visual Studio running off your different configuration.
The first time it starts Visual Studio will take longer than usual as it is effectively a new install. But don’t worry; next time it will be just as quick as what you’re used to.
You are now free to go into Tools->Options and change your settings in this Visual Studio instance, and they won’t affect your standard Visual Studio install (the one started with “devenv.exe”, or from the standard shortcuts).
Caveats
As always there are some caveats. I have this working, and it seems fine, but I haven’t been using it long term for development. When developing for Visual Studio developers often use this approach for testing the software they have written, but generally not for doing their day-to-day development work in VS. Therefore I’d recommend giving it a try and seeing if you notice any odd behaviour specific to your configuration (Note: personally I think this is pretty unlikely as this approach was designed for testing Visual Studio! But I’m just being cautious J)
The final thing to be aware of is the “run as a normal user” switch (“/ranu”) that is used above. You may need to do some configuration as an elevated administrative user to get Visual Studio working if you’re planning on running VS as an admin (there are some notes on the ranu docs here). The most common reason for running Visual Studio as an admin is if you’re testing web apps in a locally hosted IIS rather than the built in Visual Studio Development Web Server. I haven’t tested this stuff, so try it first if this matters to you.
If you do use this and find anything interesting, feel free to report back here using a comment.
Anyway, I hope that helps – and I hope it saves you some time!
A customer asked me about doing this the other day, and I’d never come across it, although it seems obvious now! If I have a page that makes use of ASP.NET’s server-side ScriptManager AJAX history features but also want to link to an anchor on that page, I get problems.
For example, if I have a page like this;

... imagine (or try the download!) that I can click the “here” link to jump to an anchor defined further down the page as follows;
<h1><a id="bottom">Bottom</a></h1>
<p>
This is at the bottom.
Click <a href="#">here</a> to jump to the top.
</p>
Nothing complex... until I point out that the drop-down list and Person details are in an UpdatePanel, and that when the user selects a different user, server-side code such as the following is run (note that Person.All is a very basic repository of test data);
var selectedId =
Int32.Parse(PeopleList.SelectedItem.Value);
var selected =
Person.All.Where(p => p.Id == selectedId).Single();
if (PageScriptManager.IsInAsyncPostBack)
PageScriptManager.AddHistoryPoint(
PersonIdHistoryKey,
selectedId.ToString(),
String.Format("Viewing {0}", selected.Name));
This adds a “history point” to the URL that I can use to reload the correct Person should the user bookmark the page and revisit another time. The key here is how it encodes it – the fragment of the URL after the hash is used (note I’ve disabled the encryption for this post);
http://myserver/Default.aspx#&&PersonIdHistoryKey=3
Of course, if I now click on my “click here” link shown above, the target anchor (“#bottom”) will replace the AJAX history state data. If you want to see what I mean, download my demo solution and delete the component we’re about to write from the page.
A Solution
I thought about this for a while, and came up with an interesting and reusable solution. I created an ASP.NET Extender component that is used to “extend” the ScriptManager control. This extender emits an ASP.NET AJAX JavaScript behaviour that supplies some additional client-side functionality.
Specifically, when the behaviour is initialised, it performs the following actions;
// add a handler for the navigate event
var navigateDelegate =
Function.createDelegate(this, this.handleNavigate);
Sys.Application.add_navigate(navigateDelegate);
// when the page is loaded, scan for every link that
// targets a "#something" anchor and add a "click"
// javascript handler to it
var clickDelegate =
Function.createDelegate(this, this.handleClick);
Sys.Application.add_load(function() {
$('a[href^=#]').click(clickDelegate);
});
Basically we’re doing two things here – firstly, hooking up to the client-side “navigate” event that is fired when a page loads that has client-side History information embedded in the URL... this is done in the same way as for the server-side history portion we saw above, but after the # and before the && symbols. Have a read of this walkthrough for a bit more info on the client-side aspect to AJAX history.
Secondly we’re adding some code to run once the page has finished loading. This code uses a jQuery selector to find all “a” (anchor) tags for which the HREF begins with a # symbol. Once these have been identified, a click event handler is assigned to them all. This means we’re applying functionality to every link on the page that targets a local anchor mark with a single statement – nice!
There are then two more elements to the solution; handleClick (fired when the user clicks on one of these “a” tags), and handleNavigate (fired when a page loads that contains client-side history markers).
handleClick
When a user clicks on link to a local anchor (i.e. of the form “#something”) we run the following script;
var eventSource = args.target;
var parts = eventSource.href.split('#');
if (parts && parts.length > 1) {
var target = parts[1];
Sys.Application.addHistoryPoint({ jumpTo: target }, document.title);
if (target) {
var targetElement = $get(target)
if (targetElement)
this._scrollToElement(targetElement);
}
else
window.scrollTo(0, 0);
return false;
}
What this does is three things;
1. It finds the link that caused the event, and splits out the portion of the URL after the “#” – which of course is the ID of the anchor we want to navigate to.
2. It adds a client-side history point, including the name of the target anchor we want to display as an expando property called “jumpTo” on an anonymous object. This means we can recreate the behaviour if the user bookmarks the page.
3. It finds the target element and scrolls it into view. If the target is blank, it assumes we’re aiming for the top of the page.
Easy huh?!
handleNavigate
This function is fired when a page loads, if it contains client-side history information.
var target = args.get_state().jumpTo;
if (target) {
var targetElement = $get(target);
if (targetElement)
this._scrollToElement(targetElement);
}
All this does is to extract the data from the history state that we encoded in the handleClick function – we’re after the value of the “jumpTo” expando property. Once it has that, it tries to find the element on the page, and if it is found, scrolls it into view.
Summary
The end result is a URL that can look something like this;
http://myserver/Default.aspx#jumpTo=bottom&&PersonIdHistoryKey=1
... it should be obvious that we have both a client-side (in red) and server-side (in green) portion encoded here.
This component is so easy to use – we can just drop it onto a page, extending the ScriptManager control, and immediately enabling the handling of local anchor references without customising the HTML. Of course, if JavaScript is disabled, it will still work fine.
Let me know what you think – there are plenty of ways to enhance or alter this behaviour if it doesn’t work quite how you want it to (for example, you might not like it returning to the same place after every postback, or you might need to use client-side history for both this and something else at the same time). I’d also love to hear if you use it and have some feedback.
I’ve attached the source (usual disclaimers apply – this is not production strength code) so download it and have play.
This is late news as I’ve been busy recently, but if you’ve been following Blueprints make sure you read Michael’s post here. They’re evolving into something else, that I’m sure will be even more exciting...
I’ll be staying tuned so no doubt I’ll shout up when I hear more.
One of the tasks I used to hate when building a new system was adding admin pages to allow simple create/update functionality for lookup data. For example, maintaining a simple list of Products. It takes ages to write, and is very boring – even if you write templates to generate code it consumes a lot of time for such a commonly repeated and simple task.
ASP.NET Dynamic Data has now arrived, which gives a great way to “scaffold” CRUD pages over an Entity Framework Object Context (or other sources). But what if I have a web site that is mostly a line-of-business system, with complex business logic and custom UI, and I need to administer a small set of tables too?
Well, you can add Dynamic Data to a portion of your existing web site! This means you no longer need to write your own administration pages.
* Before you start off on this merry path, read this to make sure the relevant sections of your Entity Framework model are supported by Dynamic Data... for example, Complex Types are not.
How?
There’s a walkthrough that covers adding Dynamic Data to an existing web site here, so give that a read and you’ll be 80% of the way there. [Note: I also had to add a reference to System.Web.Entity at the end of the walkthrough to get it all working]
Next, make sure you have set ScaffoldAllTables to false – the last thing you want to do is expose your non-lookup data for direct editing by admin staff. All that should be exposed is a few key tables of static data. So, in Global.asax, make sure your data model registration looks like this;
model.RegisterContext(
typeof(OrderDbEntities),
new ContextConfiguration()
{ ScaffoldAllTables = false });
But how does the user get to this administration functionality? I’d recommend putting it under a virtual subfolder in your web site – perhaps named “Admin”. This is really just a route (read up on the new ASP.NET Routing features if this is unfamiliar) and needn’t really exist as a physical folder. So in Global.asax, we set it up as follows;
routes.Add(new DynamicDataRoute("Admin/{table}/{action}.aspx")
{
Constraints = new RouteValueDictionary(
new { action = "List|Details|Edit|Insert" }),
Model = model
});
Note that my route begins with “Admin/...”, which means the user will browse to URLs as follows to get to the admin pages;
http://{add address here}/myfakewebsite/Admin/Product/List.aspx
You can of course link to these pages from elsewhere within your site to save the user typing them in.
At the moment, we’re not scaffolding any of our tables, so none of these URLs will work yet. The next step, then, is to pick a table to expose for administration. I’m going to use a Product table that has ID, Description, Price and CreatedBy/CreatedDate fields. In my EF model the entity also has an OrderItems navigation property which allows me to get all Order Item records that are an instance of an order for the current product...

Due to the way EF supports partial classes, I can tell the Dynamic Data runtime to scaffold the Product table using an attribute (from the System.ComponentModel.DataAnnotations namespace & assembly) on a partial class. I added a C# class to the Web Site with the same namespace as the Entity Framework model, and pasted in the following code;
[ScaffoldTable(true)]
public partial class Product
{
}
Immediately this attribute is added the Product table is available for editing. There are a few problems when I browse to the new page though;
1. It is displaying the link to Order Items even though it isn’t editable (because Order Item isn’t scaffolded).
2. I can edit the columns I’m using for Audit – CreatedBy and CreatedDate. Not good.
It turns out changing this is quite easy. Using some additional attributes and a metadata class as follows customises the model sufficiently (see here for more info on this metadata class approach);
[MetadataType(typeof(ProductMetadata))]
[DisplayColumn("Description")]
[ScaffoldTable(true)]
public partial class Product
{
}
public class ProductMetadata
{
[UIHint("TextReadOnly")]
public string CreatedBy;
[UIHint("TextReadOnly")]
public string CreatedDate;
[ScaffoldColumn(false)]
public EntityCollection<OrderItem> OrderItem;
}
I’ve now instructed the runtime not to display the Order Item navigation property, and to use the Description field should I display Products in a list somewhere (as part of a foreign key drop-down, for example).
I’ve also asked it to use the “TextReadOnly” user control to render my CreatedDate and CreatedBy fields instead of the standard Text control, by placing a [UIHint] attribute on two metadata properties. To create the TextReadOnly control I copied-and-pasted the “Text.ascx” control twice, and renamed the copies to “TextReadOnly.ascx” and “TextReadOnly_Edit.ascx”. This control will now be used whenever the CreatedDate and CreatedBy fields are displayed (note I’ve cheated here – CreatedDate is really a date but as I’m happy with the default ToString display mechanism for it I didn’t bother creating a DateTime ReadOnly control).

As you can see, these user controls are available for you to edit under the DynamicData/FieldTemplates folder that was copied in as part of the original walkthrough. If you want to change this folder to be somewhere else, check out this article.
Finally, we probably want to limit access to the admin pages to a limited set of privileged users. The easiest way to do this is using standard ASP.NET authorisation, by adding a location and authorisation element to web.config;
<location path="Admin">
<system.web>
<authorization>
<allow roles="domain\role"/>
<deny users="*"/>
</authorization>
</system.web>
</location>
Next steps...
There’s plenty more you might like to do, so I would recommend looking further into the features that Dynamic Data provides.
One that is particularly likely to be of interest is customising validation of data that is maintained... a good starting point is the ValidationAttribute.
You probably also want to change Site.master to better reflect your site (or move it, rename it, and update the templates under DynamicData/PageTemplates to use the new one). As long as you have a ScriptManager and the required Content Placeholder on there, it should work great. You can obviously make use of your own CSS too, or even go as far as changing all the field templates.
If you’d like a general overview of how Dynamic Data fits together, have a read of the Infrastructure Overview.
Finally, remember that I’m basing mine on an Entity Framework model, so I could customise the model too – for example, hooking up to the SavingChanges event to perform processing and/or validation on the Product table before it is saved to the database.
I hope all that is useful, and saves you time and boredom for your next system... enjoy!
Is this a question of taste, or something more serious? [Warning: this post won’t change your life J]
I found myself writing the following extension method yet again the other day;
public static void ForEach<T>(
this IEnumerable<T> source,
Action<T> action)
{
foreach (T item in source)
{
action(item);
}
}
This means I can write simple (contrived) code like the following;
entities.ForEach<ObjectDefinition>(o => o.DeleteDbRecord());
This will enumerate all my ObjectDefinition objects and call DeleteDbRecord on each. But why didn’t I just write it like this?
foreach (ObjectDefinition o in entities)
{
o.DeleteDbRecord();
}
I’ve raised this because some code I’ve seen (and written!) has reached a saturation point, where lambdas, extension methods, and other concepts have taken over - meaning the code looks more like it was written in LISP than C#.
But does that matter?
A quick search shows there are all sorts of discussions about this online, which is interesting. It is actually quite an old discussion now!
If you’ve got some XML data in a SQL Server column, how can you flatten it out and query it or present it as though it was relational data? It turns out this is quite easy...
Setup
Let’s create a simple table to hold our data;
CREATE TABLE XmlSourceTable
(
RecordId INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
XmlData XML NOT NULL
)
GO
And we’ll define some straightforward XML to import;
<?xml version="1.0" ?>
<Root>
<Person>
<Name>Simon</Name>
<Age>20</Age>
<Skills>
<Skill>Cooking</Skill>
<Skill>Cleaning</Skill>
</Skills>
</Person>
<Person>
<Name>Peter</Name>
<Age>21</Age>
<Skills>
<Skill>Ironing</Skill>
</Skills>
</Person>
</Root>
Ages may have been changed to protect the innocent J
Next, we’ll import it into my table using one of the mechanisms SQL Server provides – each XML file will be imported into a single row in the target table.
INSERT INTO XmlSourceTable(XmlData)
SELECT *
FROM OPENROWSET(
BULK 'C:\XmlSource.xml', SINGLE_BLOB)
AS ImportSource
GO
After this, if we do a quick query...
SELECT * FROM XmlSourceTable
... we can see that we get a single row back containing an ID and some XML;
RecordId XmlData
----------- -------
1 <Root><Person><Name>Simon</Name><Age>20</Age... (snip)
(1 row(s) affected)
Queries
The simplest way to extract this data is to use the CROSS APPLY keyword, as this executes a function against each row and then adds the returned data to the result set. Combining this with a method that can be called on the XML data type called nodes, we get some great results. A quick query like this;
SELECT
pref.value('(Name/text())[1]', 'varchar(50)') as PersonName,
pref.value('(Age/text())[1]', 'int') as PersonAge,
pref.query('Skills') as PersonSkills
FROM
XmlSourceTable CROSS APPLY
XmlData.nodes('/Root/Person') AS People(pref)
GO
... yields a completely different result set to our last query;
PersonName PersonAge PersonSkills
---------- --------- ----------------------
Simon 20 <Skills><Skill>Cooking</Ski... (snip)
Peter 21 <Skills><Skill>Ironing</Ski... (snip)
(2 row(s) affected)
We can see this query has flattened my single row of relational data with embedded hierarchical XML into two rows and columns of relational data. I’ve also included a subset of the XML as a column, just to show I can! Of course, if I wanted to I could modify this to get a list of people and their skills;
SELECT
pref.value('(Name/text())[1]', 'varchar(50)') as PersonName,
sref.value('(text())[1]', 'varchar(50)') as PersonSkill
FROM
XmlSourceTable CROSS APPLY
XmlData.nodes('//Person') AS People(pref) CROSS APPLY
pref.nodes('Skills/Skill') AS Skill(sref)
GO
What is it Doing?
This SQL can be difficult to understand when you first look at it, but it isn’t really that complex. Breaking it down there are three key concepts that we’ll cover below – using my first query above that fetches PersonName, PersonAge, and PersonSkills as an example.
CROSS APPLY
The first concept is the use of CROSS APPLY, which many people haven’t used before. What this is doing is roughly equivalent to the following steps (note: this is my idiots guide to how I think about it, not a description of how the query optimiser does it);
1. Fetch the rows from the XmlSourceTable table.
2. For each row, call the “nodes” function on the XmlData column. This could be some other function too – it needn’t be on XML data. See the docs on APPLY for more info.
3. Duplicate the XmlSourceTable row once for every row returned by the table valued function “nodes”.
4. Add the columns returned by the “nodes” function to the columns in the result set.
5. Continue doing filtering, joining, and column selection as for any other SQL query.
I hope that makes it a little clearer.
nodes() function
The XML data type in SQL Server defines a number of methods that can be called on it. One of these is “nodes” – and what this does is basically select a list of XML nodes that match an XQuery expression. Knowing this, look at the statement in my SQL;
XmlData.nodes('/Root/Person') AS People(pref)
This is using the path “/Root/Person” to ensure that all Person nodes that exist under the Root are selected. The result is aliased as a table named People, and each resulting XML node will be output as a separate row, in the “pref” column.
Plenty of alternative syntaxes are available for selecting this list of nodes, and this really is the core of how to flatten out the XML. I’ve also used “//Skill” syntax in my second query, for example, which selects every Skill node found in the XML it is used on.
Other XML functions
The last concept is the use of the selector XML functions – in my example I’ve used “value()” and “query()”. Both of these use XQuery expressions to select XML data.
Firstly, I’ve used the value() function to extract a specific value from the XML, and convert it to a SQL Server data type;
pref.value('(Name/text())[1]', 'varchar(50)') as PersonName
The “text()” function here retrieves the inner text from within the XML “Name” node. The “[1]” suffix acts as an indexer, and fetches the first result matched. I know there’s only one name node per person, but I need to make sure SQL Server knows this too. Finally, the second argument is the data type it should be converted to. For an Age, I’ve used ‘int’, but here we’re converting to a varchar type.
The “query()” function allows me to return an XML fragment;
pref.query('Skills') as PersonSkills
This returns the XML that matches the “Skills” node, and is found underneath the current Person element held in “pref” (i.e. the search is relative to the contents of pref). This means it returns fragments such as;
<Skills><Skill>Ironing</Skill></Skills>
Conclusion
None of this is all that difficult once you know how – so I hope this has given you a quick start to using XML data in SQL! Don’t forget to read up more generally on the XML data type and XML indexes.
As the beta 1 drop of Visual Studio Team System 2010 approaches, Brian Harry has a great little summary of the high level new feature areas we’re delivering. Check it out here.
There are some great nuggets in there, so I’m really looking forward to RTM – lab management, gated check-ins, and the architecture tooling are some of my favourites.
In my previous post I described the basic concept behind my auditing approach using the Entity Framework, and covered some of the problems I encountered. This post focuses on the solution I went with; do feel free to comment if you have any thoughts.
Audit Lifecycle
To get to the root of my requirements I wrote down the lifecycle of a piece of data, and what needed to be tracked against it.
|
Action |
Audit Requirements |
|
Record Added |
The date it was created, by whom, and what the original values are. |
|
Record Modified
* repeated many times |
The date it was changed, by whom, and which values were changed. |
|
Record Deleted |
The date it was deleted, and by whom. |
The problem is that creating an audit record is tricky based on point (2) in my previous post. Therefore, I chose to record the original values when each change is made. This means I actually track the following;
|
Action |
Audit Requirements |
|
Record Added |
The date it was created, and by whom. |
|
Record Modified
* repeated many times |
The date it was changed, by who, and what the pre-change values were. |
|
Record Deleted |
The date it was deleted, by who, and what the pre-delete values were. |
This has effectively flipped the approach on its head, but I can still get a point-in-time view of the data whenever I need it. Therefore my data model becomes something like this;
When a new Product is added, the CreatedBy and CreatedDate fields are recorded on the Product entity. When changes are made, the previous unchanged version of the entity is copied to the ProductHistory table, and then the new Product entity is saved. When a Product is deleted, it is removed from the Product table and a copy of its last values is saved to ProductHistory.
This means my ChangedDate field on ProductHistory could arguably be described better as “ExpiredDate”, as it is the date and time that the data on that record became out of date.
Enabling Auditing on the ObjectContext
I described in my previous post how I would use the SavingChanges event to create audit records. To wire this up, I’ve created an extension method that can be called on an ObjectContext;
public static void Audit<fromType, toType>(
this ObjectContext context,
Func<IDataRecord, EntityState, toType> mapping,
Action<toType> addToContext)
{
context.SavingChanges +=
new EventHandler((o, e) =>
CreateAuditRecord<fromType, toType>(
context, mapping, addToContext));
}
Calling this method sets up auditing for one specific type of entity by adding an event handler to the SavingChanges event. The handler invokes a method called CreateAuditRecord.
You might think it would be better to set all audit types up at once, or you might want to configure auditing in the ObjectContext’s OnContextCreated partial method... and that should be easy enough. The point of this code is not to be final, but that it should be easy for you to understand and adapt.
So why have I used two type parameters, a Func<T,U,V> and an Action<T> on the Audit method then? It comes down to points (4) and (5) in my previous post; I wanted auditing to be very explicit about how it behaved, and I wanted to avoid passing string values around to identify field names or entity sets.
To demonstrate this, consider the following example of how I would enable auditing on the Product entity, saving changes to ProductHistory.
using (MyEntities db = new MyEntities())
{
db.Audit<Product, ProductHistory>(
(record, action) => new ProductHistory() {
Id = record.Field<Product, int>(f => f.Id),
Description = record.Field<Product, string>(
f => f.Description),
Price = record.Field<Product, double>(f => f.Price),
CreatedDate = record.Field<Product, DateTime>(
f => f.CreatedDate),
CreatedBy = record.Field<Product, string>(
f => f.CreatedBy),
ChangedBy = "Simon",
ChangedDate = DateTime.Now,
ChangeType = action.ToString() },
(ph) => db.AddToProductHistory(ph));
// TODO: make some changes to Product here...
db.SaveChanges();
}
In my Audit method I specify the source entity (Product) and target audit store entity (ProductHistory) as type parameters. This then enables me to easily create a strongly typed lambda for the first parameter that I know accepts an IDataRecord (“record”) and an EntityState (“action”), returning a ProductHistory record;
(record, action) => new ProductHistory() {
Id = record.Field<Product, int>(f => f.Id), ... <snip>
The purpose of this method is to map a Product data record to a ProductHistory entity. It has an IDataRecord input as I pass the OriginalValues to it, not the full Product entity (remember I’m saving the previous values in my audit table, not the new ones). Therefore this code creates a ProductHistory record, and initialises its properties using object initialiser syntax. If this code became too unwieldy I could easily factor it out into a helper method.
Next, you’ll notice that I’m using another extension method named “Field” that applies to the IDataRecord type. I use this to fetch a property value in a strongly typed way, and to make it easy to get at fields that represent an entity property.
In this case, I’m specifying that I expect the IDataRecord to contain fields that represent Product’s members. I’m also saying the particular property I’m after is an integer, and then I use a lambda to identify the property itself (Id in this case).
The second parameter to my Audit call looks like this;
(ph) => db.AddToProductHistory(ph)
This is because I need to know how to add audit records that are created to the ObjectContext, ensuring they are saved as part of the same SaveChanges transaction as the actual changes we’re tracking. The lambda receives a ProductHistory entity (which is known because of the “toType” type parameter in my call to Audit) which I choose to add to the ObjectContext by calling AddToProductHistory.
For this task I could have used the more general AddObject method available on an ObjectContext – but this needs a string value to identify the entity set, and I want as much compile time checking as possible.
The Mechanics of Auditing
Now you see how we setup auditing, let’s dive into the CreateAuditRecord method to see how it actually creates the audit trail.
First, the method retrieves a list of ObjectStateEntry objects that describe entities that have either been modified or deleted. Remember that my approach doesn’t record anything in the audit table when they are added;
IEnumerable<ObjectStateEntry> entities =
from e in context.ObjectStateManager.GetObjectStateEntries(
EntityState.Modified | EntityState.Deleted)
where
e.IsRelationship == false &&
typeof(fromType).IsAssignableFrom(e.Entity.GetType())
select e;
This also ensures that we only get results for entity types that are of type “fromType”, or inherit from it. Arguably this would be more efficient if it looped through all changes looking for the audit configuration for each type as it went... or there may be other desired behaviours for inheritance heirarchies – but that is out of scope for this post, and it should be easy for you to see how you might change it.
Next, we loop through all of our results;
foreach (ObjectStateEntry item in entities)
{
toType auditRecord =
mapping(item.OriginalValues, item.State);
addToContext(auditRecord);
}
... and it is just a case of using the supplied helper functions to map the OriginalValues IDataRecord to a ProductHistory (in this example), and then add it to the ObjectContext. Of course these helper functions are in the form of the lambdas we passed to our call to the Audit extension method.
Retrieving Point-in-Time Products
This design is based on the assumption that I will rarely need to retrieve audited data, as it requires some calculations and slightly complex SQL to complete. This is fine, but if you were retrieving audit data very frequently I would consider doing some further testing, with a view to optimising the approach.
So how do I get a Product record as it looked at 3pm on the 20th April 2009, for example?
Simply put, to get the correct data fields for a given time I need to find the first ProductHistory record after the specified date & time. If there are none, I need the values on the Product record (as this means no changes had been made to the record at that point in time). Let’s break that down.
First I need to get the first record from ProductHistory for products that were created before the specified date & time, but were changed after it...
SELECT
Id, [Description], Price, CreatedBy, CreatedDate
FROM
(
SELECT
Id, [Description], Price, CreatedBy, CreatedDate,
ROW_NUMBER() OVER(ORDER BY ChangedDate) AS DateOrder
FROM
ProductHistory
WHERE
CreatedDate <= @PointInTime
AND ChangedDate > @PointInTime
AND Id = @Id
) A
WHERE
A.DateOrder = 1
Next, I need all Products that have no ProductHistory records that fall into the above category, but were created before the specified date & time;
SELECT
Id, [Description], Price, CreatedBy, CreatedDate
FROM
Product
WHERE
NOT EXISTS (
SELECT * FROM ProductHistory ph
WHERE ph.Id = Product.Id AND
CreatedDate <= @PointInTime
AND ChangedDate > @PointInTime)
AND CreatedDate <= @PointInTime
AND Id = @Id
Put these two query sections together using a UNION statement and we have the content of my [GetPointInTimeProduct] Stored Procedure (see the code download), which allows us to find the exact state of a Product given its identifier and a specific date and time. I’ve brought it into my model using a Function Import.
The Good and the Bad
I hope all that has made sense – as usual I’ve included the code as a download (standard disclaimers apply). There are of course some pros and cons to my approach – some that spring to mind are below. If you have comments on better ways to do this, or thoughts about my approach, feel free to chip in!
· We can easily record the username of whoever made the changes as we can deduce it in our C# code; even in a trusted subsystem model.
· The approach is easy to configure in a compile-time checked fashion. It doesn’t rely on possibly flawed convention (e.g. always store audit records in tables ending “Audit”). It should also therefore be easy to refactor to suit your needs.
· Mappings between entity and audit records are explicit, so it does not dictate too much about how you would do this. You could store completely different audit data to me if you’d like.
· The audit records participate in the ObjectContext’s Unit of Work; that is, they are saved in a batch of SQL statements with the data changes. It also means that due to the ordering and batching of updates that the likelihood of locks and deadlocks is arguably reduced when compared to some other approaches.
· The configuration of auditing for each type independently keeps the code clear, but introduces some inefficiency (such as many subscribers to the SavingChanges event, and many loops through the ObjectStateEntry collection).
· The data model, although simple, needs explanation. The history records may not work quite how a newcomer would assume.
· The complexity of the point-in-time SQL is slightly higher than I would like.
· [Edit] It is important to use some kind of concurrency checking to ensure that multiple audit records are not written by different users... I tend to use a timestamp with “Fixed” concurrency.
· [Edit] The biggest drawback of the code so far is that it doesn’t record changes to relationships... so it is suited well to resource data tables, but not so well to those involved in a complex model.
On balance, overall I like the approach. What do you think?
Note: Regarding the edits, I just reread this and realised I’d oversimplified for this post and missed two important points from my notes... that’ll teach me to type up blog posts late at night! Sorry.
If you’ve read my last post on Types of Auditing, you should be primed for this one; I’m looking at a way to do Data Operation Auditing in the Entity Framework. That is, I want to track who changes which tables & columns, when, and in what way.
Why does this specifically apply to the Entity Framework? Well there are three main reasons;
1. A lot of DBAs don’t like triggers in their database. In fact, I don’t like triggers in my database! I find them to be more difficult to manage, and often overlooked during defect investigation. I also don’t like the cascading updates that can occur as a result of triggers.
2. In a Trusted Subsystem model, which is often my preferred security configuration, the database is accessed as a service account. This means that triggers and SQL statements have no automatic concept of who is performing the data changes, as the user’s security context has not been delegated through to the data tier.
3. People are starting to use the Entity Framework, so I’m keen to maximise the benefits of using it. It also places some constraints on an auditing approach – for example, out of the box, the Entity Framework cannot pass values that are not contained in an entity to the database during DML operations... so username, for example, can’t be passed down to a Stored Procedure when saving entity changes.
In this post I’ll cover the basic solution concept I’ve used, and some of the problems I encountered. In the next post I’ll detail my final solution. This isn’t some poor attempt at leaving a post on a cliff-hanger to keep you coming back – I just want to keep each post short enough to digest easily!
Basic Design
The Entity Framework extensibility point that seems to fit my requirements best is the SavingChanges event on an ObjectContext. This fires when changes are just about to be saved (i.e. before the SQL is actually generated). Therefore this is a great opportunity for us to check what has been changed, and generate our audit records. I can even add more entities (containing audit information) at this point to the ObjectContext and they’ll get saved too.
To find what has changed we can use ObjectStateManager.GetObjectStateEntries – this is accessible via the ObjectContext’s ObjectStateManager property, and gives us access to information about the original and current values of each property on an entity, whether it has been Added, Modified, or Deleted, and some other useful metadata.
To store data audit records I created a copy of the table I’m auditing, and added some metadata fields. For example, if I have a Product table, I would add a ProductHistory table – something like the following;

Note that the Primary Key in Product is the Id field – but in ProductHistory a basic counter field is the Primary Key (AuditId), and Id is a reference to the Id field in the Product table. This is because we’re expecting multiple history records per product.
I’ve also added metadata about who changed the record, when, and in what way (i.e. Modified, Deleted, etc).
None of this is particularly surprising – it is a pretty standard way to record audit information, in the form of data “deltas”.
Gotchas
I did try a number of different approaches before I settled on one, and this was due to some interesting gotchas...
1: Don’t Create a Foreign Key
You may have noticed there’s no relationship in my data model. That’s intentional; if I delete a Product, I still want to keep my ProductHistory records for that Product, and that means I can’t have ProductHistory.Id related to Product.Id – as referential integrity would break.
2: Can’t Save Added Records
The SavingChanges event fires before changes are saved to the database. In my data model I’m using database-generated keys (i.e. IDENTITY(1,1) fields). This means only after the changes have been saved to the database is the value of the key available in our in-memory representation of the database – our ObjectContext.
This means that if we detect an entity has been added to the ObjectContext, I might want to add a record to the history table... but I cannot get the value to put in the Id field, because it hasn’t been created yet. To get around this, I could allow SaveChanges to complete, and then create my audit records, and call SaveChanges again... but that feels like a hack and would lead to other compromises, so I avoided it.
3: Don’t Create a Conceptual Association
To get around point (2), I thought I had a solution. The Entity Framework automatically manages this key-relationship problem for us using navigation properties. If I just add an association in my CSDL linking Product and ProductHistory, this would mean ProductHistory records would be inserted after Product records, and the framework will fix up the Id values for me.
But of course, as soon as we come to deleting records we come across the same issue as point (1), but within the ObjectContext rather than the database – it will not permit us to add a ProductHistory record with an Id that refers to a Product record that we’re about to delete in the same transaction. In fact, a bit of hacking and I managed to create some pretty serious bugs in my code trying to work around this!
There are possibly some solutions here that involve Conditional Mappings, QueryViews, DefiningQuery, database-level updateable Views, or more. But each one felt wrong to me – they had maintenance implications, could create quite brittle code, would be overwritten when using the “update from database” menu option, or worse. If you have a good solution, let me know – but I avoided them.
4: Avoid Passing Strings Around
One of the things I love about the Entity Framework is that a lot of code becomes strongly typed, and is therefore checked at compile time. In a few places this isn’t necessarily the case – such as accessing DbDataRecord field values, or calling ObjectContext.AddObject.
To address this I took a tip from Stuart Leeks – check out this great post on how lambda expressions can be used to bring compile-time checking to your application. I added similar code to my solution to address slightly different areas.
5: Avoid Auditing by Convention
Convention over configuration can be a valid approach in some circumstances. For a general blog post and flexible solution, though, it felt wrong – I wanted to explicitly select which entities are audited to which tables, through compile-time checked, strongly typed, clear code.
You might not like my approach to this – but I chose it to ensure it was clear what I was doing. Therefore it’s easy for you to put your own flavour on it if needed; I hope you agree!
Solution
So how did I implement it in the end? Watch this space and I’ll post my code shortly.
Auditing almost means something different to everyone I speak to... just a few things people mean when they say “our system has auditing” to me are;
Business Operation Auditing. This is recording when a business operation is performed, such as “Hire Employee”. They usually signify that a whole lot of work has been done across the system’s databases and services.
Data Operation Auditing. This refers to tracking individual changes to database tables. For example, this might involve recording that a user performed an operation that caused the Employee table’s row with identifier “5” to have the “Hired” flag set to “true”.
System Auditing. This is usually what I would call tracing or instrumentation rather than auditing – i.e. outputting detailed information about the running of the system. This often uses the TraceSource class, performance counters, or might be custom statistics recorded in a database/file about long running tasks – it can take many forms.
I’m sure this list is by no means complete – it isn’t intended to be. It’s just designed to make you think “what do they mean by auditing?” when you’re reading the requirements for the shiny new system you’re about to write. What is it you need to achieve?
I’m also hopeful that when I visit you one day in the future and say “what do you mean by auditing?”... you’ll realise I’m just trying to make sure we’re talking about the same thing J
My next couple of posts will cover one approach to Data Operation Auditing in the Entity Framework, so stay tuned...
* You might think there’s a rather large gap in the above... what about Security Auditing? Well, I’ve left it out deliberately as I see that as more of a vertical slice compared to the horizontal list that I’ve come up with. In other words, it’s just a slightly different way to different cut it – I’d categorise Security Auditing as either System or Business Operation auditing, depending upon your requirements, the technology you’re using for authentication and authorisation, and how you choose to implement it.
I spend quite a lot of time nowadays advising on architecting applications to be scalable, recommending performance testing approaches and processes, and being called in to help an organisation remediate problems with an existing application. It’s one of my favourite sides to the job.
But in spite of spending so much time on this, I still catch myself apparently confusing three terms – Performance, Scalability, and Stability. Why? Well, I’m not really confusing them – it’s just that they’re so closely related I don’t think you can consider any one of them in isolation. Let’s consider my (unofficial and buzzword-free) definition of each;
Performance is about the time taken to execute tasks. The quicker a task completes, the better the application is performing.
Scalability is about the number of tasks a system can execute at the same time. This usually maps almost directly to the number of concurrent users that an application can support.
Stability is about how many failures an application exhibits; whether that is manifested as unexpected or unintended behaviour, users receiving errors, or a catastrophic failure that brings a system down. The fewer failures that are observed the more stable an application is.
I’ve over-simplified a little and left out some other concepts, but I think it serves to illustrate the point. A nice way to visualise this is in the form of a triangle;

This visualisation helps to emphasise that each quality can affect the other. The word “quality” is key here – each of these is basically a “Quality Attribute” (or “Quality of Service” requirement) – although Stability arguably covers a number of attributes. If this term is unfamiliar to you, check out the patterns & practices Application Architecture Guide 2.0, in section II (Design), under Quality Attributes.
It is easy to say they affect each-other, but how exactly does this happen? Let’s consider them one at a time;
Poor Performance affects Scalability
Slow performance means tasks take longer. If they take longer, they are more likely to overlap when multiple users do them at the same time. Therefore improving performance reduces the likelihood of contention for shared resources (such as a database), and therefore increases scalability.
Poor Performance affects Stability
Slow performance means tasks take longer. If they take longer, they are more likely to overlap when multiple users do them at the same time. This is likely to lead to more frequent locks, deadlocks, and resource contention, which is more likely to lead to errors and stability problems. It also means that poor programming practices (such as race conditions and incorrectly shared state) are more likely to be spotted.
Poor Scalability affects Performance
Poor scalability means an application supports fewer concurrent tasks. Therefore tasks may have to queue up if there is no capacity to process them. This usually reduces both perceived and actual performance.
Poor Scalability affects Stability
Poor scalability means an application has two options when demand exceeds capacity; queue requests, or reject requests. Rejecting requests results in an error or unexpected behaviour, and is my definition of instability. Queuing requests leads to reduced performance, but also puts demands on sharing resources, connections, memory, CPU, and more. All this increased demand inevitably leads to further stability issues eventually.
Poor Stability affects Performance
Poor stability generally means three things. Firstly, errors must be propagated throughout the system. .NET exceptions do carry overheads and therefore increase CPU and memory requirements of the application.
Secondly, partial success and partial failure must be handled, usually with database rollbacks but also with manual compensation logic. This inevitably leads to additional resource requirements on the system – whether that is the database server dealing with rollbacks, or the additional processing of compensation logic.
Thirdly, when unexpected results are received the user is likely to retry them. This means the load on parts of an application can literally double (or worse) during times of instability.
The first two of these impacts the actual physical performance of the system, but the third also affects the perceived performance from the user’s point of view.
Poor Stability affects Scalability
Poor stability means that additional resources are used, and users are likely to retry tasks. This means more tasks are being performed by fewer users, and therefore ultimately less users can be supported concurrently as a result.
... wow, what should I do?!
After all that, you might be thinking “wow, what should I do?”
The answer is that you should always consider performance, scalability, and stability when architecting, building, and testing your applications. To be precise, you should;
1. Set objectives for each Quality Attribute (including those I’ve not mentioned in this post).
2. Be aware of how each Quality Attribute affects the others.
3. Invest time, hardware, and money in testing that your application meets the objectives you set for each Quality Attribute.
4. Fix issues that you find through re-architecting, code refactoring or tuning, or by tuning your deployment configuration and/or environment.
5. Do all this as early as possible, as it is considerably cheaper to fix performance, scalability, and stability issues early in the development lifecycle.
During my day job I come across customers that are struggling to make a system perform to the expectations of their users, and they often see this as a SQL Server problem. This post is designed to demonstrate how much of a difference your approach to SQL tasks can make to the performance and scalability of your system. Of course, the real trick is to make sure you performance test thoroughly; a system can behave totally differently in a live environment to how it was seen to behave during development and system testing.
The Scenario
For this post I’ve tried to illustrate a common scenario with the attached SQL script. Very simply, we have Customers, Orders, OrderItems, and Products tables following a very familiar relationship pattern.

I’ve then invented a set of Stored Procedures to retrieve a list of “Big Orders”. The idea is that a “Big Order” is defined according to some rules, based on the date, size of the order, a substring in the order description, and so on.
There are three types of Stored Procedure, and they are all based on approaches that I’ve regularly seen in real implementations, and therefore hopefully illustrate my points well;
1. GetBigOrdersReselecting. This procedure also uses a function named IsBigOrder. Basically it retrieves a list of Order Ids to work with, and then loops through them calling IsBigOrder with each. IsBigOrder then selects the row it needs to work with using that ID, and determines if the order is “big”. If it is, GetBigOrdersReselecting adds this ID to the result set. This means the row is “reselected” multiple times – a very common approach.
2. GetBigOrdersCursor. This procedure opens a cursor and steps through all the orders determining if the order is “big”. If it is, it adds the ID to the result set using a temporary table. This approach is used in many, many systems.
3. GetBigOrdersSetBased. This procedure uses a single SELECT statement to fetch all the orders that are considered “big”... and is my preferred approach and the reason for this post.
The Tests
What I wanted to show was how each of these approaches behaves differently as the volume of data increases. Therefore I have used a Visual Studio 2008 Data Generation Plan to fill my tables with test data (with ratios Customer:Order:OrderItem of 1:5:15). I have then run each of the Stored Procedures and captured some statistics to look at.
I did this for two test data sets – those that included less than 15% of “big” orders (“Low Hit Ratio”), and those that included about 45% of “big” orders (“High Hit Ratio”). This is intended to reflect two different types of SQL – those that retrieve and process a large percentage of the data as a batch job might, and those that need a smaller percentage of the rows, like an OLTP system might.
It is at this point you must realise that this post is indicative only. It is actually pretty unscientific due to my time constraints. I’m running SQL 2008 on my laptop, with various programs open (e.g. Outlook), using randomly generated data that is different each time, without any sensible indexing strategy, without recompiling stored procs between runs, and so on. In other words, I believe the figures show a valid pattern, but don’t go on the individual millisecond results. Just take the core message on board.
The Results
Let’s get to the point then, and look at how long each of the three approaches took to run (in seconds) for increasing volumes of records in the Order table.
|
50 |
250 |
500 |
1,000 |
2,500 |
5,000 |
10,000 |
25,000 |
50,000 |
100,000 |
|
Set based
High Hit % |
0.000 |
0.000 |
0.017 |
0.017 |
0.046 |
0.047 |
0.126 |
0.237 |
0.453 |
0.686 |
|
Cursor
High Hit % |
0.000 |
0.013 |
0.030 |
0.046 |
0.140 |
0.313 |
0.437 |
1.007 |
2.074 |
3.687 |
|
Reselect
High Hit % |
0.017 |
0.127 |
0.406 |
1.507 |
8.727 |
5.193 |
20.073 |
|
|
|
|
Set based
Low Hit % |
0.000 |
0.000 |
0.013 |
0.017 |
0.033 |
0.064 |
0.046 |
0.186 |
0.343 |
0.513 |
|
Cursor
Low Hit % |
0.000 |
0.013 |
0.017 |
0.030 |
0.077 |
0.173 |
0.280 |
0.734 |
1.413 |
2.530 |
|
Reselect
Low Hit % |
0.017 |
0.127 |
0.406 |
1.490 |
8.917 |
5.157 |
19.747 |
|
|
|
Note I didn’t bother running the 25,000+ runs for the “reselect” approach. There is also an interesting anomaly in there (I wish I had time to investigate!), in that “reselect” seems to temporarily improve around the 5,000 record mark. That could be for all sorts of reasons but at least you know I’m not faking my results J
Anyway, I think two graphs illustrate the results best; the first includes the “reselect” procedure results, the second excludes it so that we can see more detail of the other two.

Including Reselect

Excluding Reselect
You can clearly see a few things from this;
1. The Set-based approach is faster than any other approach for anything but the smallest amounts of data.
2. The Cursor approach gets a lot slower quickly (fairly proportionally, at least).
3. The Reselect approach is unpredictable, and much, much slower than either of the other two approaches. It almost spirals out of control quickly - with very few records it is probably in the unacceptable performance category for an OLTP system.
It is important to understand that as these queries run slower and slower so they will also have a disproportionately bad effect on any multi-user system. There is more chance that multiple connections will be executing queries against the same data at the same time, more memory will probably be in use, the processor will be busier, and your disks might well be more active. This increases the chances of deadlocks and more. Therefore the wrong approach to SQL statements not only compromises the performance of your queries, but also the scalability of your concurrent queries, and ultimately the stability of your system.
Remember that my example just uses SELECTs, but we could easily be looking at UPDATE statements too.
Implementation
So how is this set-based approach implemented for complex logic you have traditionally used procedural code for? Quite easily actually, after a bit of practice - complex rules can often be expressed using a combination of AND/OR, CASE, and more. Have a look at the HAVING clause of my set-based query;
SELECT
... SQL removed...
HAVING
(
o.OrderDate < GETDATE() AND
(
(
SUM(oi.NumberOfItems * p.Price) > 100 AND
CASE SUBSTRING(o.OrderDescription, 0, 4)
WHEN '011' THEN 1
WHEN '012' THEN 1
ELSE 0
END = 1
) OR
(
SUM(oi.NumberOfItems * p.Price) <= 100 AND
SUM(oi.NumberOfItems) > 3
)
)
) OR
(
o.OrderDate >= GETDATE() AND
SUM(oi.NumberOfItems * p.Price) > 150
)
This is by no means perfect. I could replace SUBSTRING with LIKE, I haven’t tuned it by looking at the query plan at all, and I should consider adding some indices; I might even choose to cache some data using an indexed view, and more. But without any of this work it still runs much faster than the other options, and I think you can see from my HAVING clause how easy it is to express the logic.
I’m trying to keep this post as brief as possible, so I won’t go into approaches to writing this set-based logic – but one key point is to ensure you use mutually exclusive OR clauses (so you might duplicate some lines – such as my use of o.OrderDate), and don’t forget to test, test, test. In fact, if you’re replacing procedural code you might consider automating running both queries against the same data and comparing the results.
Summary
This is one of the easiest summaries I’ve ever written...
Whenever you’re writing SQL always start out with writing set-based statements. It may look harder at first, but it gets easier once you’ve done it a few times, and I firmly believe the effort up front pays dividends when you come to scaling your application. Set-based SQL is also far easier to tune than procedural logic using the usual SQL tricks of changing indexes, partitioning, and so on.
There are always exceptions to rules of thumb, and I’m sure you’ll have a scenario where something can’t or shouldn’t be done in a set-based way... but just be absolutely sure that’s the case. Usually it comes down to data that must be dealt with in an ordered manner, perhaps with a running cumulative total or similar.
If you try reworking some SQL into a set-based approach I’d love to hear what you find; whether it is huge performance improvements, or otherwise – feel free to comment here.
Whilst at Q-Con a few weeks ago someone asked me about how to map two database tables to one entity in the Entity Framework... something a lot of people want to do at some point or another. At this point I must thank Chris Barker for pinging me some links to get this working. It seems a common thing to need to do so I thought I’d post briefly here.
Options
There are actually two main options;
1. Do the mapping in the Entity Framework, which requires that both tables must share the same primary key. Therefore if you have Customer and CustomerAddress tables (with a 1:1 relationship) the CustomerId must be the primary key on the CustomerAddress table.
2. Do the mapping in the database, using an updatable View that you manually map into the Entity Framework’s model. Rick has a post about how you might do this here.
If you want to use Option 1, which I personally prefer (assuming you have control of the database schema at this point), there is a walkthrough in the documentation here. The walkthrough creates a new entity type – I actually cheated and made it even simpler.
Walkthrough
I have two simple tables – Customer and CustomerAddress. I want them to be encapsulated in a single Customer entity within my Entity Framework model.

When I generate my entity model this gives me two entities in both the Conceptual and Storage schemas, appropriately mapped. It looks a bit like this;
The problem is, I want my address fields on the Customer entity. There are two ways to achieve this... the “designer” way is to right click “Line1” and choose Cut, then Paste it into the Scalar Properties in Customer, then repeat for all the remaining properties (except of course CustomerId – that’s already there).
The XML fan way is to dive into the Conceptual Schema in the dbml file and add these properties manually. I find this quicker as I can copy/paste and tweak the names very easily and all in one operation (for example, if I want to call a field AddressLine1 instead). I end up with the following XML;

This gives us a slightly different looking interim model, as CustomerAddress starts to become a bit empty;

So now we have a nice complete Entity in our Conceptual space; next we need to map these new properties to the table in the Store. I could do this in the mapping section of the XML, or we can use the designer. The tool makes some assumptions here that actually automate mapping all the properties for you if you haven’t changed their names, so I’ll use that for simplicity.
Right-click on the Customer entity and choose “Table Mapping”. You’ll see something like the following. We can now choose to add a Table to the mapping;

As soon as I pick CustomerAddress, it pre-populates all the fields for me with a guess at their mapping. If you’ve renamed any properties in your entity you need to fix them up manually. Also, map the CustomerId field in the database to the Id property on the entity. The result is a mapping that looks like this;

Finally, delete the CustomerAddress entity from the designer (which just deletes it from the Conceptual schema, not the Storage schema – which is good because we need to table still!). In deleting this entity, the navigation property named CustomerAddress that was on Customer disappears, completing the tidy-up of our model.
... and we’re finished!
A Quick Query
Just to prove it, a nice simple query like this now works;
var customersAndAddresses = from c in db.Customer
where c.Name.StartsWith("Simon")
select c;
var result = customersAndAddresses.First();
string address = String.Format("{0}\n{1}\n{2} {3}",
result.Name,
result.Line1,
result.Line2,
result.PostCode);
Using SQL Profiler to see what’s going on shows that the Entity Framework is doing the join for us, just as we would expect;

Good huh?
Hope that helps!
I’ve been working with a few colleagues over the past months to come up with something we’ve called the “Development Foundation Assessment” (DFA). It’s a quick test that helps us understand where a customer is strong, and where they could benefit from our help – and therefore helps feed into our ongoing consulting service delivery planning. It will be in the form of a simple WPF-based Windows application deployed to our consultants’ laptops. If you already work with the ADC team speak to your consultant and they will let you know when it is ready for you to try; or if you don’t have a contract with us and are interested, head over to our landing page or drop me an email.
The geek in me also thinks this is quite fun. I keep running through it for projects I’ve been involved in or run in the past, and I think the results are really interesting. I particularly like the visual output.
One of these colleagues is the talented Mr Twist, who has knocked us up a pretty fantastic looking version of the DFA test that runs on a Microsoft Surface. Check out his blog post about it here, which includes a video of the test in action. He also explains some of our reasoning behind the DFA test, how we came up with the questions, and more.
And yes... I’ll be one of the ADCs at DevWeek (for one of the days at least), so you can come say hello and see our preview of this app all at once.