Welcome to MSDN Blogs Sign in | Join | Help

Departing Microsoft...

It's taken a while to reach this decision, but Friday will be my last day at Microsoft.  I'll be taking a week off (imagine all the housework I can get done...) and starting work at a company where I expect to be working on a lot of R&D and prototypes.  It should be interesting, and I hope to be able to play with the bits I spent the last couple years of my life working on.

As I mentioned in my last post, I'm focussing any blogging efforts at http://www.thuban.org.  Maybe I'll have something to write about during my week off.

Migrating

With the release of LINQ, the ASP.NET MVC Preview, and Server 2008 with IIS7, I've finally gotten over the web-developer's block that has prevented me from porting my old website off of Zope.

So I'll be posting to my revised website from now on.  It'll let me play around more with the new toys, including Silverlight.

Where(prototype)

Just something I decided to try cobbling together tonight.  Didn't take more than a few minutes, trying to type quietly (didn't want to wake anyone).

It allows you to do things like:

        var q = db.Customers.Where(new Customer { CustomerID = "EASTC" });

Basically, it's a where-by-prototype. One obvious function, as seen above, is to get a LINQ to SQL entity by ID.

Here's the code:

        public static IQueryable<TEntity> Where<TEntity>(
this IQueryable<TEntity> seq, TEntity entity
) { IQueryable<TEntity> result = seq; var memberTypes = new[] { MemberTypes.Property, MemberTypes.Field }; var members = typeof(TEntity).GetMembers()
.Where(mi => memberTypes.Contains(mi.MemberType)
&& mi.GetCustomAttributes(typeof(ColumnAttribute), true)
.Length > 0); foreach (var member in members) { var isField = member.MemberType == MemberTypes.Field; var memberAsField = member as FieldInfo; var memberAsProperty = member as PropertyInfo; var memberValue = isField ? memberAsField.GetValue(entity)
: memberAsProperty.GetValue(entity, null); var memberIsRefType = (isField ? memberAsField.FieldType
: memberAsProperty.PropertyType
).IsClass; var memberIsDefaultValue = (memberValue == (memberIsRefType ? (object)null
: (object)0
)); if (memberIsDefaultValue) { continue; } var parameter = Expression.Parameter(typeof(TEntity), "p"); var memberAccess = Expression.MakeMemberAccess(parameter, member); var value = Expression.Constant(memberValue); var equality = Expression.Equal(memberAccess, value); var predicate = Expression.Lambda<Func<TEntity, bool>>(equality, parameter); result = result.Where(predicate); } return result; }

Encapsulation and LINQ to SQL

In my last post I rendered some opinions on how to approach using LINQ to SQL in an encapsulated manner.  In response, several folks requested I put together some more concrete examples.  I have now done so.  Pardon the high code-to-text ratio: I'm not feeling particularly poetic at the moment.  I also just wrote this code up, so it hasn't had anything more than trivial testing.  But, as a place to get people started it's as good as any at the moment.

TableView

The TableView class encapsulates an IQueryable constructed by applying a predicate filter to a Table.  This enables the pattern of client-side row restriction by way of forcing WHERE constraints in the generated SQL.  For example, permitting the retrieval of only those customers residing in London:

var tableView = new TableView<Customer>(dataContext, c => c.City == "London");

It includes a subset of the functionality of Table, including Attach, Insert and Delete, which it forwards to the encapsulated table provided the entities in question pass the filter.

TableView.cs

using System;
using System.Collections.Generic;
using System.Data.Linq;
using System.Linq;
using System.Linq.Expressions;

namespace Thuban.Data.Linq
{
    public sealed class TableView<TEntity> : IQueryable<TEntity>, ITable
        where TEntity : class
    {
        IQueryable<TEntity> baseQuery;
        Table<TEntity> table;
        Func<TEntity, bool> predicate;

        public TableView(DataContext dataContext, Expression<Func<TEntity, bool>> predicate)
        {
            this.table = dataContext.GetTable<TEntity>();
            this.baseQuery = table.Where(predicate);
            this.predicate = predicate.Compile();
        }

        #region IEnumerable<TEntity> Members

        public IEnumerator<TEntity> GetEnumerator()
        {
            return this.baseQuery.GetEnumerator();
        }

        #endregion

        #region IEnumerable Members

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return this.GetEnumerator();
        }

        #endregion

        #region PermissionChecks

        internal void PermissionCheck(Action<TEntity> action, TEntity entity)
        {
            if (predicate(entity))
            {
                action(entity);
            }
            else
            {
                throw new InvalidOperationException("No permission");
            }
        }

        internal void PermissionCheck(Action<TEntity, TEntity> action, TEntity entity1, TEntity entity2)
        {
            if (predicate(entity1) && predicate(entity2))
            {
                action(entity1, entity2);
            }
            else
            {
                throw new InvalidOperationException("No permission");
            }
        }

        internal TResult PermissionCheck<TResult>(Func<TEntity, TResult> function, TEntity entity)
        {
            if (predicate(entity))
            {
                return function(entity);
            }
            else
            {
                throw new InvalidOperationException("No permission");
            }
        }

        internal void PermissionCheck(Action<IEnumerable<TEntity>> action, IEnumerable<TEntity> entities)
        {
            if (entities.All(predicate))
            {
                action(entities);
            }
            else
            {
                throw new InvalidOperationException("No permission");
            }
        }
        
        #endregion

        #region FilteredTable<TEntity> Members

        void Attach(TEntity entity, TEntity original)
        {
            PermissionCheck((x, y) => table.Attach(x, y), entity, original);
        }

        void Attach(TEntity entity, bool asModified)
        {
            PermissionCheck(x => table.Attach(x, asModified), entity);
        }

        void Attach(TEntity entity)
        {
            PermissionCheck(x => table.Attach(x), entity);
        }

        void AttachAll(IEnumerable<TEntity> entities, bool asModified)
        {
            PermissionCheck(x => table.AttachAll(x, asModified), entities);
        }

        void AttachAll(IEnumerable<TEntity> entities)
        {
            PermissionCheck(x => table.AttachAll(x), entities);
        }

        void DeleteAllOnSubmit(IEnumerable<TEntity> entities)
        {
            PermissionCheck(x => table.DeleteAllOnSubmit(x), entities);
        }

        void DeleteOnSubmit(TEntity entity)
        {
            PermissionCheck(x => table.DeleteOnSubmit(x), entity);
        }

        ModifiedMemberInfo[] GetModifiedMembers(TEntity entity)
        {
            return PermissionCheck(x => table.GetModifiedMembers(x), entity);
        }

        TEntity GetOriginalEntityState(TEntity entity)
        {
            return PermissionCheck(x => table.GetOriginalEntityState(x), entity);
        }

        void InsertAllOnSubmit(IEnumerable<TEntity> entities)
        {
            PermissionCheck(x => table.InsertAllOnSubmit(x), entities);
        }

        void InsertOnSubmit(TEntity entity)
        {
            PermissionCheck(x => table.InsertOnSubmit(x), entity);
        }

        #endregion

        #region ITable Members

        void ITable.Attach(object entity, object original)
        {
            this.Attach((TEntity)entity, (TEntity)original);
        }

        void ITable.Attach(object entity, bool asModified)
        {
            this.Attach((TEntity)entity, asModified);
        }

        void ITable.Attach(object entity)
        {
            this.Attach((TEntity)entity);
        }

        void ITable.AttachAll(System.Collections.IEnumerable entities, bool asModified)
        {
            this.AttachAll(entities.Cast<TEntity>(), asModified);
        }

        void ITable.AttachAll(System.Collections.IEnumerable entities)
        {
            this.AttachAll(entities.Cast<TEntity>());
        }

        DataContext ITable.Context
        {
            get { throw new InvalidOperationException("Access to the underlying context is not allowed."); }
        }

        void ITable.DeleteAllOnSubmit(System.Collections.IEnumerable entities)
        {
            this.DeleteAllOnSubmit(entities.Cast<TEntity>());
        }

        void ITable.DeleteOnSubmit(object entity)
        {
            this.DeleteOnSubmit((TEntity)entity);
        }

        ModifiedMemberInfo[] ITable.GetModifiedMembers(object entity)
        {
            return this.GetModifiedMembers((TEntity)entity);
        }

        object ITable.GetOriginalEntityState(object entity)
        {
            return this.GetOriginalEntityState((TEntity)entity);
        }

        void ITable.InsertAllOnSubmit(System.Collections.IEnumerable entities)
        {
            this.InsertAllOnSubmit(entities.Cast<TEntity>());
        }

        void ITable.InsertOnSubmit(object entity)
        {
            this.InsertOnSubmit((TEntity)entity);
        }

        bool ITable.IsReadOnly
        {
            get { return table.IsReadOnly; }
        }

        #endregion

        #region IQueryable Members

        Type IQueryable.ElementType
        {
            get { return baseQuery.ElementType; }
        }

        Expression IQueryable.Expression
        {
            get { return baseQuery.Expression; }
        }

        IQueryProvider IQueryable.Provider
        {
            get { return baseQuery.Provider; }
        }

        #endregion
    }
}

DataContextWrapper

The other type I've worked on is the abstract class, DataContextWrapper.  As TableView does for Table, it encapsulates and provides a subset of the original's function.  In particular, it includes a hook for table initialization (not entirely hashed out, to be sure -- I haven't added support for UDFs).  I've surfaced some properties of the underlying data context where I thought (quickly) that it made sense to do so, such as CommandTimeout, SubmitChanges, and strongly-typed versions of Refresh; I omitted Log since it's primarily for debugging, and not necessarily appropriate for end-user consumption in this situation.  I've also implemented the dispose pattern, and provided a helper method to create a new TableView, for use in the GetQuery(type) initialization hook.  Of the two -- TableView and DataContextWrapper -- this is the more primitive.

DataContextWrapper.cs

using System;
using System.Collections.Generic;
using System.Data.Linq;
using System.Linq;
using System.Linq.Expressions;
using System.Reflection;

namespace Thuban.Data.Linq
{
    public abstract class DataContextWrapper : IDisposable
    {
        protected DataContext DataContext { get; private set; }
        protected bool IsDisposed { get; private set; }

        Dictionary<Type, object> tables = new Dictionary<Type, object>();

        protected DataContextWrapper(DataContext dataContext)
        {
            this.DataContext = dataContext;

            this.InitializeTables();
        }

        void InitializeTables()
        {
            CheckDisposed();

            var filteredTableFields =
                this.GetType()
                .GetFields(BindingFlags.Public | BindingFlags.Instance)
                .Where(fi => fi.FieldType.GetGenericTypeDefinition() == typeof(TableView<>));

            foreach (var filteredTableField in filteredTableFields)
            {
                var type = filteredTableField.FieldType.GetGenericArguments()[0];

                var getTable = this.GetType().GetMethod("GetTable").MakeGenericMethod(type);

                var table = getTable.Invoke(this, null);

                filteredTableField.SetValue(this, table);
            }
        }

        #region Abstract Methods

        protected abstract IQueryable GetQuery(Type entityType);

        #endregion

        #region Helpers

        protected TableView<TEntity> CreateTableView<TEntity>(Expression<Func<TEntity, bool>> predicate)
            where TEntity : class
        {
            return new TableView<TEntity>(this.DataContext, predicate);
        }

        #endregion

        #region DataContext Analogues

        public int CommandTimeout
        {
            get
            {
                return this.DataContext.CommandTimeout;
            }
            set
            {
                this.DataContext.CommandTimeout = value;
            }
        }

        public void SubmitChanges()
        {
            this.DataContext.SubmitChanges();
        }

        public void SubmitChanges(ConflictMode conflictMode)
        {
            this.DataContext.SubmitChanges(conflictMode);
        }

        public TableView<TEntity> GetTable<TEntity>()
            where TEntity : class
        {
            CheckDisposed();

            if (tables.Keys.Contains(typeof(TEntity)))
            {
                return (TableView<TEntity>)tables[typeof(TEntity)];
            }

            var query = this.GetQuery(typeof(TEntity));

            if (query == null)
            {
                throw new InvalidOperationException("Table of type '" + typeof(TEntity).Name + "' does not have a defined query.");
            }
            else
            {
                tables.Add(typeof(TEntity), (TableView<TEntity>)query);

                return (TableView<TEntity>)tables[typeof(TEntity)];
            }
        }

        public void Refresh<TEntity>(RefreshMode refreshMode, TEntity entity)
            where TEntity : class
        {
            this.GetTable<TEntity>().PermissionCheck(x => this.DataContext.Refresh(refreshMode, x), entity);
        }

        public void Refresh<TEntity>(RefreshMode refreshMode, IEnumerable<TEntity> entities)
            where TEntity : class
        {
            this.GetTable<TEntity>().PermissionCheck(x => this.Refresh(refreshMode, x), entities);
        }

        public void Refresh<TEntity>(RefreshMode refreshMode, params TEntity[] entities)
            where TEntity : class
        {
            this.Refresh(refreshMode, (IEnumerable<TEntity>)entities);
        }

        #endregion

        #region IDisposable Members

        protected void CheckDisposed()
        {
            if (this.IsDisposed)
            {
                throw new ObjectDisposedException("DataContextWrapper");
            }
        }

        protected virtual void Dispose(bool disposing)
        {
            if (!this.IsDisposed && disposing)
            {
                if (this.DataContext != null)
                {
                    this.DataContext.Dispose();

                    this.IsDisposed = true;
                }
            }
        }

        public void Dispose()
        {
            this.Dispose(true);

            // Suppress finalization of this disposed instance.
            GC.SuppressFinalize(this);
        }

        ~DataContextWrapper()
        {
            Dispose(false);
        }

        #endregion
    }
}

Example

So how are these intended to be used?  Here's an example DataContextWrapper implementation:

using System;
using System.IO;
using System.Linq;
using Northwind;
using Thuban.Data.Linq;

namespace Sample
{
    sealed class NorthwindWrapper: DataContextWrapper
    {
        public TableView<Customer> Customers;

        public NorthwindWrapper()
            : base(new NorthwindDataContext())
        {
        }

        public TextWriter Log
        {
            get
            {
                return this.DataContext.Log;
            }

            set
            {
                this.DataContext.Log = value;
            }
        }

        protected override IQueryable GetQuery(Type entityType)
        {
            if (entityType == typeof(Customer))
            {
                return this.CreateTableView<Customer>(c => c.City == "London");
            }
            else
            {
                return null;
            }
        }
    }
}

Like the familiar DataContext, you subclass it and add public fields to represent the tables, which are populated during initialization.  The DataContextWrapper has only one constructor, which takes the DataContext instance to wrap, and so we forward our default constructor to that.  We add a Log property, since this is just a sample, but the most important item is the (required) implementation of GetQuery.

GetQuery accepts the entity type that is being requested, and returns the IQueryable (ie, the TableView) that is appropriate.  The interaction here's a bit rough, so I may change it at some point, but it does that it's supposed to for now.  Maybe I'll just switch it to use the dictionary directly.  Remember -- this is just a quick sketch of one approach.

As you can see in the following, you use it as you would use DataContext normally:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Sample
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var db = new NorthwindWrapper() { Log = Console.Out })
            {

                var q = from c in db.Customers
                        where c.ContactName.StartsWith("B")
                        select c;

                foreach (var item in q)
                {
                    Console.WriteLine("Name = {0}; City = {1}", item.ContactName, item.City);
                }
            }

            Console.ReadKey(true);
        }
    }
}

And it will produce the following output:

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle],
       [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country],
       [t0].[Phone], [t0].[Fax] 
FROM [dbo].[Customers] AS [t0] 
WHERE ([t0].[ContactName] LIKE @p0) AND ([t0].[City] = @p1)
-- @p0: Input NVarChar (Size = 2; Prec = 0; Scale = 0) [B%] 
-- @p1: Input NVarChar (Size = 6; Prec = 0; Scale = 0) [London] 
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.7

As you can see, the restriction on City was added automatically.

Future Considerations

This is not the approach that would solve the must-not-be-able-to-access-information-even-through-reflection approach.  Reflection is powerful -- if you wanted to, you could access the internals of the change tracker.  For those cases -- where information is secured by its absence, rather than secured by hiding it in the closet -- I'd suggest enforcing it at the server.  Otherwise, of course, what is to prevent someone from using ADO.NET?

Also, don't fall into the trap of thinking there's only one possible data context for a given database.  Create multiple ones when it makes sense:  an administrative context with everything, a narrowly-focused context for plug-in modules, and so forth.  The context isn't so much the database made manifest as it is just the limit of LINQ to SQL's view.  Anything outside that view just isn't known about -- it can't be queried, and it can't be meddled with.  Provided you encapsulate your data context, you can prevent gratuitous GetTable calls on newly-mapped types.

LINQ and 3-Tier Dogma

One of the most frequent questions we've received about LINQ to SQL deals with fitting it into the classic three-tiered scenario.  That is:

Presentation --- Logic --- Data

I know someone somewhere is going to accuse me of heresy for what I'm about to state, but it's something I've wanted to get off my chest for a couple years, since I first encountered LINQ to SQL, and started to realize what it meant...

The 3-Tier Model is just a pattern. 

And, like all patterns,

It is meant to serve our needs, not the other way around. 

Don't get hung up on academic concerns such as how many tiers you have in your system.  You'll just waste the time you could spend doing things you actually need to do.  Such as watching Avatar or playing Warcraft.

There, I said it.

And saying that, I think LINQ to SQL fits perfectly well into the 3-Tier Model.

The spirit of the 3TM is that, in keeping your layers seperate, you achieve the ability to swap out data access, business logic, or presentation interfaces without disturbing the other two (or the functionality that depends on them).  This, without dispute, is a useful thing.  But there are various levels of purity that people seem to think apply:

  • contractual purity (there is a set of well-known interfaces between the tiers)
  • binary purity (the types are completely seperate in each tier, *and* there is a contract between each)

Frequently, I've seen people get hung up on the idea of binary purity.  I'll say it:  I don't think it honestly matters.  Within the scope of a single project (and I'll submit that a web service and a web client are in fact two different projects), there is no gain in creating parallel sets of objects just for the sake of doing so.  That is, there is no longer a compellingly good reason, and in fact I now consider it questionable design, in having "ProductDataObject" and "ProductLogicalObject" in the same project, when the shapes of the two are the same.  You'll spend all your time copying property values back and forth, your references won't track well, and you'll forgo any change- or identity- tracking that the data layer happens to give you. 

Don't underestimate the value of tracking features in a data layer.

So, let's just ditch the idea of binary purity for purity's sake; it's only real use is when transmitting data packets to foreign systems.  Instead, let's talk about contractual purity.

Contractual purity is great.  I like it a lot, and I think this is where the true heart of n-tier modelling is to be found.  The only question is how you define your contracts.

Really, that's up to you.

Consider LINQ to SQL for a moment.  LINQ to SQL was designed for those situations where the schema of your database matches the shape of data in what we'll loosely call your "logic" or "business" layer.  (Loosely, because there can be many layers to this, or none at all for a dumb reporting app.)  That is:

Presentation --- Logic --- Data
A --- B --- B

LINQ to SQL presents us with this:

A --- B === B

LINQ to SQL's already encapsulated your database access code (that is why it was written); it's all contained in System.Data.Linq.dll and your mapping metadata.  It also already talks in terms of your logical model, and that's fine because we already said that was what LINQ to SQL was designed to do.

That leaves us with the question of what we use for presentation.  Again, that's entirely up to you.  If your project is such that is doesn't require formal contracts, and if the data being displayed has the same shape as what's available, why *not* use your logical model directly?

ui.DataSource = db.Customers;

B === B === B

More often, of course, you'll be displaying composite data, and the shape of that will be somewhat different than your logical data.  Fine.  We can do that:  project into anonymous types (eg, if you don't need to access the properties directly -- if your UI does reflection), or project into a contractually-defined type, and return that:

ui.DataSource = db.Customers.Select(c => new { c.LastName, c.FirstName });
ui.DataSource = db.Customers.Select(c => new CustomerName { Last = c.LastName, First = c.First });

A --- B === B

Ah, you say.  You feel its sinful to even expose the tables on your data context.  Okay, fine.  Remove them.  In fact,

Encapsulate your data context.

That is,

public class MyLogicLayer: IDisposable // MyLogicLayer is IDisposable because MyDataContext is IDisposable
{

// We retain an instance of the context.  It's not static, because contexts are designed to be short-lived.
MyDataContext db;

// We want to update data, right?
public void SubmitChanges()
{

db.SubmitChanges();

}

// I prefer IQueryable, because I want my filtering and sorting to happen at the server, not the client.
// That is, I want to write:  logicLayer.GetCustomersInWa().OrderBy(c => c.Age)
public IQueryable<Customer> GetCustomersInWA()
{

return db.Customers.Where(c => c.State == "WA").OrderBy(c => c.LastName).ThenBy(c => c.FirstName);

}

// TVFs are a much more useful alternative to sprocs.  In particular, they're composable.
public IQueryable<Customer> TVF_GetCustomersInWA()
{

return db.GetCustomersInStateTVF("WA").OrderBy(c => c.LastName).ThenBy(c => c.FirstName);

}

// Of course, getting a single customer is useful
public Customer GetCustomer(int id)
{

return db.Customers.Where(c => c.CustomerId == id);

}

// Maybe some admin doesn't want you to know about the customer's birthdate.  So make a trimmed-down ICustomer,
// and return based on that interface.
public ICustomer GetCustomer(int id)
{

// where the interface is sufficient protection
return db.Customers.Where(c => c.CustomerId == id);

or

// where CustomerWithoutBirthdate implements ICustomer, and whose properties defer to the contained Customer
return db.Customer.Where(c => c.CustomerID == id).Select(c => new CustomerWithoutBirthdate(c));

}

}

Which, interestingly, looks like:

A === [ A --- B ] === B
B === [ B === B ] === B
B' === [ B' === B ] === B

.. depending on which method you call.  In other words,

The encapsulation of the data context is the creation of the logical layer. 

I'm sure that in the upcoming years, there will be plenty of other ways people will find to adapt the traditional architectures to new innovations, and I'm certain there will be further innovations that make what I just proprosed easier to swallow.  And that's what makes programming interesting.  If we have indeed solved all the problems and programming is perfect forever, where's the fun in that?

 

PS CodePlex

For some reason I decided to write some PowerShell functions to manage checking out projects from CodePlex, and updating them in batch.  It makes use of the CodePlex Client (cpc), which allows anonymous access to CodePlex's TFS repository.

They're fairly primitive, but seem to do the job.  You'll need to have cpc in your path, and you'll notice I've defined a drive 'CodePlex' at the repository root.  There's also a $CodePlexDirectory variable I'm setting in my profile which contains the path to where CodePlex: is rooted.

What I should do is install them on one of my servers at home and set update-codeplex on a timer to just automatically retrieve the latest changes.  Maybe even do a build.

function view-codeplex
{
    param([string] $project = $(throw "Provide a project name")) 

    $destination = 'http://www.codeplex.com/' + $project

    start $destination
}

function get-codeplex
{
    param([string] $project = $(throw "Provide a project name")) 

    $destination = (Join-Path $CodePlexDirectory $project)

    cpc checkout $project $destination

    push-location $destination

    cpc info

    pop-location
}

function update-codeplex
{
    param([string] $project) 

    $list = , $project

    if ($project -eq "")
    {
        $list = (get-childitem 'CodePlex:\\' | foreach-object { $_.Name })
    }

    foreach ($item in $list)
    {
        push-location (Join-Path 'CodePlex:' $item)

        cpc info
        cpc update

       pop-location
    }
}

Quantum Computer Demonstration

Tomorrow, D-Wave Systems with either become a laughing stock, or the state of computing will arguably have advanced by 20 years.

They plan to demonstrate a 16-qubit computer running two commercial apps simultaneously, and repeat the feat on Thursday.

Now if they can figure out how to get it to simultaneously compute every possible bug-free application, they could save us a lot of effort.

You can sign up for premium content on their website.

Multi-threaded Linq

I decided to play around with the idea of a multi-threaded Select extension.  This could be useful, for example, if you're creating a system that farms out work to various web services and have a limited number of connections to use.

Attached should be the results of my excursion.

While I've created multi-threaded applications before, I hadn't yet created one where it continuously tried to ensure that the maximum number of threads possible were used, while also ensuring that the sequence of results matched the sequence of inputs.  The basic algorithm I came up with is fairly straightforward:

  1. Fetch the enumerator of the input.
  2. Set lastMoveNextWasTrue to true, as an initial value.
  3. While lastMoveNextWasTrue:
    1. While there are spare threads and buffer space:
      1. Set lastMoveNextWasTrue to enumerator.MoveNext()
      2. If lastMoveNextWasTrue, then create a new BackgroundWorker and results object, and store them off.  Build the arguments, and run the worker.  Else break.
    2. Wait for at least one worker to complete.
    3. While true:
      1. Remove completed workers.
      2. While the head of the buffer contains results from completed workers, yield them.
      3. If lastMoveNextWasTrue was false (no more items to consume) and there are still workers running, continue at (3).  Else break.

A similar logic can be had for the ForEach case (takes an Action<T>).

The results show that this sort of approach is worthwhile when the processing is IO-bound (simulated here with Thread.Sleep(1)):  with 5 threads and 3 maximum pending results, I get roughly 2x performance.  If instead I'm calculating factorials -- CPU-bound -- the overhead involved makes this just a little slower.

 

Indexed Linq

Brian Beckman shows off a simple hash join in LINQ, using VB9.  The performace boosts reinforce some work my partner and I did, experimenting with load-time generated indexes over collections.

There was a significant hit incurred when creating the index, but it was assumed that you wouldn't bother unless it actually did any good.

The code below is somewhat naive, but shows one approach.  In this case, the index tracks several different features of a number, even/odd, positive/negative, and the values modulo 10 and modulo 3.  The query is then over the index partitions, rather than the raw collection, and returns a new index containing just the partitions that matched.  Enumerating over an index yields the indexed items.

On my system, I get times similar to these.  The qa* are queries over a List<double>, while qb* are queries over an Indexable<double, NumberIndex>.  The total range of doubles went from -1e7 to +1e7.

Time to create index = 21943ms
qa1: Count = 1000000; Time = 2131ms
qa2: Count = 9999999; Time = 1470ms
qa3: Count = 10000000; Time = 2396ms
qa4: Count = 333333; Time = 1965ms
qb1: Count = 1000000; Time = 87ms
qb2: Count = 9999999; Time = 616ms
qb3: Count = 10000000; Time = 812ms
qb4: Count = 333333; Time = 22ms

Code follows...

using System;
using System.Collections.Generic;
using System.Text;
using System.Query;
using System.Xml.XLinq;
using System.Data.DLinq;

namespace IndexLinq
{
    public class Indexable<TItem, TKey> : IEnumerable<TItem>
    {
        internal Dictionary<TKey, List<TItem>> _index = new Dictionary<TKey, List<TItem>>();
        internal Func<TItem, TKey> _indexer;
        private int? _capacity = null;

        public Indexable(Func<TItem, TKey> indexer)
        {
            this._indexer = indexer;
        }

        public void AddRange(IEnumerable<TItem> source)
        {
            if (this._indexer != null)
            {
                Refresh(source);
            }
        }

        public void AddRange(IEnumerable<TItem> source, int capacity)
        {
            this._capacity = capacity;

            AddRange(source);
        }

        public void AddRange(IList<TItem> source)
        {
            AddRange(source, source.Count / 10);
        }

        public void Clear()
        {
            this._index.Clear();
        }

        private void Refresh(IEnumerable<TItem> items)
        {
            if (items == null || _index == null)
            {
                throw new InvalidOperationException("Unable to refresh when indexable does not have both a source and an indexer defined.");
            }

            foreach (TItem item in items)
            {
                TKey key = _indexer(item);

                if (!_index.ContainsKey(key))
                {
                    if (this._capacity.HasValue)
                    {
                        _index.Add(key, new List<TItem>(this._capacity.Value));
                    }
                    else
                    {
                        _index.Add(key, new List<TItem>());
                    }

                    Console.WriteLine("Added new key: {0}", key);
                }

                _index[key].Add(item);
            }
        }

        #region IEnumerable<TItem> Members

        public IEnumerator<TItem> GetEnumerator()
        {
            foreach (List<TItem> list in _index.Values)
            {
                foreach (TItem item in list)
                {
                    yield return item;
                }
            }
        }

        #endregion

        #region IEnumerable Members

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return this.GetEnumerator();
        }

        #endregion
    }
}

using System;
using System.Collections.Generic;
using System.Text;

namespace IndexLinq
{
    public static class IndexLinqExtensions
    {
        public static Indexable<TItem, TKey> Where<TItem, TKey>(this Indexable<TItem, TKey> indexable, Predicate<TKey> predicate)
        {
            Indexable<TItem, TKey> result = new Indexable<TItem, TKey>(indexable._indexer);

            foreach (TKey key in indexable._index.Keys)
            {
                if (predicate(key))
                {
                    result._index.Add(key, indexable._index[key]);
                }
            }

            return result;
        }
    }
}

using System;
using System.Collections.Generic;
using System.Text;
using System.Query;
using System.Xml.XLinq;
using System.Data.DLinq;
using IndexLinq;
using System.Diagnostics;

namespace IndexLinqTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var numbers = new List<double>(Numbers(-1e7, 1e7));

            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Reset();
            stopwatch.Start();

            var indexed = new Indexable(new Func(Indexer));
            indexed.AddRange(numbers);

            stopwatch.Stop();

            Console.WriteLine("Time to create index = {0}ms", stopwatch.ElapsedMilliseconds);

            // non-indexed

            var qa1 = from i in numbers where i % 10 == 1 select i; 
            var qa2 = from i in numbers where i > 0 select i;
            var qa3 = from i in numbers where i % 2 == 0 select i;
            var qa4 = from i in numbers where i % 10 == 1 && i % 3 == 0 select i;

            // indexed

            var qb1 = from i in indexed where i.Mod10 == 1 select i; 
            var qb2 = from i in indexed where i.IsPositive == true select i;            
            var qb3 = from i in indexed where i.IsEven == true select i;
            var qb4 = from i in indexed where i.Mod10 == 1 && i.Mod3 == 0 select i;

            var queries = new[] {
                new object[]{ qa1, "qa1" },
                new object[]{ qa2, "qa2" },
                new object[]{ qa3, "qa3" },
                new object[]{ qa4, "qa4" },
                new object[]{ qb1, "qb1" },
                new object[]{ qb2, "qb2" },
                new object[]{ qb3, "qb3" },
                new object[]{ qb4, "qb4" }
            };

            int count;
            
            foreach (object[] query in queries)
            {
                count = 0;

                stopwatch.Reset();
                stopwatch.Start();

                foreach (var i in query[0] as IEnumerable<double>)
                {
                    count++;
                }

                stopwatch.Stop();
                
                Console.WriteLine("{0}: Count = {1}; Time = {2}ms", 
                    query[1], 
                    count, 
                    stopwatch.ElapsedMilliseconds
                    );
            }

            Console.ReadKey(true);
        }

        public struct NumberIndex
        {
            public bool IsEven;
            public bool IsPositive;
            public double Mod10;
            public double Mod3;

            public override bool Equals(object obj)
            {
                if (!(obj is NumberIndex)) return false;

                NumberIndex numberIndex = (NumberIndex) obj;

                return this.IsEven == numberIndex.IsEven
                    && this.IsPositive == numberIndex.IsPositive
                    && this.Mod10 == numberIndex.Mod10
                    && this.Mod3 == numberIndex.Mod3;
            }

            public override int GetHashCode()
            {
                return (int) (this.Mod10 * 1000 + (this.Mod3 * 100) + (this.IsEven ? 10 : 0) + (this.IsPositive ? 1 : 0));
            }

            public override string ToString()
            {
                return String.Format(@"[Mod10 = {0}; Mod3 = {3}; IsEven = {1}; IsPositive = {2}]", this.Mod10, this.IsEven, this.IsPositive, this.Mod3);
            }
        }

        static NumberIndex Indexer(double i)
        {
            NumberIndex result = new NumberIndex();

            result.IsEven = (i % 2 == 0);
            result.IsPositive = (i > 0);
            result.Mod10 = (i % 10);
            result.Mod3 = (i % 3);

            return result;
        }

        static IEnumerable<double> Numbers(double start, double end)
        {
            if (start > end)
            {
                double x = end;
                end = start;
                start = x;
            }

            for (double i = start; i < end; i++)
            {
                yield return i;
            }
        }
    }
}

ChemLinq

A customer posed a good question to me the other day:  outside of relational databases, what good is LINQ?  As an example, he suggested a chemistry drawing application.

I couldn't help him directly on the GUI aspects, but I decided to figure out, given a graph of atoms and bonds representing a molecule, how to find patterns in the structure.  In the specific example, I decided to locate hydroxyl ions (-OH).

The query boils down to this:  find the hydrogen atoms with a single bond, which goes to an oxygen atom, which has only one other bond not going to the original hydrogen.

The following is a working example, and compiles against the CTP release of LINQ. Obviously, with some more thought put into the chemistry object model, the code could be simplified, and potentially more sophisticated queries executed.  One early (and as yet untested) speculation had me writing something like:

from atom in molecule.Atoms
where atom.Element == Element.Hydrogen
and atom.Bonds.Count == 1
from atom2 in atom.Bonds[0].FindAtom(a => a != atom && a.Element == Element.Oxygen)
where atom2.FindBonds(b => b != atom.Bonds[0]).Count == 1
select atom2.FindBonds(b => b != atom.Bonds[0])[0].FindAtom(a => a != atom2)

Code follows:

using System;
using System.Collections.Generic;
using System.Text;
using System.Query;
using System.Xml.XLinq;
using System.Data.DLinq;
using System.Collections.ObjectModel;

namespace ChemLinq
{
    class Program
    {
        enum Element
        {
            Hydrogen = 1,
            Oxygen = 8,
            Carbon = 12
        }

        class Atom
        {
            public Element Element;

            public Collection<Bond> Bonds;

            public Atom(Element element)
            {
                this.Element = element;
                Bonds = new Collection<Bond>();
            }
        }

        class Bond
        {
            public Collection<Atom> Atoms;

            public Bond(params Atom[] atoms)
            {
                Atoms = new Collection<Atom>(atoms);

                foreach (Atom a in atoms)
                {
                    a.Bonds.Add(this);
                }
            }
        }

        static void Main(string[] args)
        {
            // create Carbon Ring -- a MoleculeBuilder class would be useful here

            var c1 = new Atom(Element.Carbon);
            var c2 = new Atom(Element.Carbon);
            var c3 = new Atom(Element.Carbon);
            var c4 = new Atom(Element.Carbon);
            var c5 = new Atom(Element.Carbon);
            var c6 = new Atom(Element.Carbon);

            var bond12 = new Bond(c1, c2);
            var bond23 = new Bond(c2, c3);
            var bond34 = new Bond(c3, c4);
            var bond45 = new Bond(c4, c5);
            var bond56 = new Bond(c5, c6);
            var bond61 = new Bond(c6, c1);

            // create Hydroxyl ions

            var o1 = new Atom(Element.Oxygen);
            var o3 = new Atom(Element.Oxygen);
            var o5 = new Atom(Element.Oxygen);

            var h1 = new Atom(Element.Hydrogen);
            var h3 = new Atom(Element.Hydrogen);
            var h5 = new Atom(Element.Hydrogen);

            var bondOh1 = new Bond(o1, h1);
            var bondOh3 = new Bond(o3, h3);
            var bondOh5 = new Bond(o5, h5);

            // bond the ions to the ring

            var bond1 = new Bond(o1, c1);
            var bond3 = new Bond(o3, c3);
            var bond5 = new Bond(o5, c5);

            // bundle these into a molecule
            var molecule = new[] { c1, c2, c3, c4, c5, c6, o1, o3, o5, h1, h3, h5 };

            // query the structure to locate the hydroxyls

            // this could be cleaned up with some appropriate helper methods
            // on Atom and Bond

            var query = 
                from hAtom in molecule                          // search the molecule
                where hAtom.Element == Element.Hydrogen         // for hydrogen atoms
                && hAtom.Bonds.Count == 1                       // with a single bond;
                    from hBond in hAtom.Bonds                   // search those bonds
                    where hBond.Atoms.Count == 2                // which have only 2 atoms
                        from oAtom in hBond.Atoms               // and find those where
                        where oAtom.Element == Element.Oxygen   // the other atom is oxygen
                        && oAtom.Bonds.Count == 2               // and which has only 2 bonds;
                            from oBond in oAtom.Bonds           // then take the bond
                            where oBond.Atoms.Count == 2        // that go to only 1 other atom
                            && !oBond.Atoms.Contains(hAtom)     // but not to the original hAtom
                            select oAtom;                       // and select that oxygen atom


            Console.WriteLine(query.Count()); // outputs 3

            Console.ReadKey(true);
        }
    }
}
 
Page view tracker