In the Orcas timeframe, Microsoft will be supplying a couple of specialized flavors of Linq to address common data access scenarios.  DLinq covers SQL servers and XLinq handles XML documents, but what about the countless other data sources out there that a user might want to interact with using Linq?  For some, you can simply gather your data into CLR collection and make use of the default Linq experience.  For example, if you wanted to find new *.exe files on your hard drive, you might use Linq to do something like this:

Dim newExe = From fileName In Directory.GetFiles( _

My.Computer.FileSystem.SpecialDirectories.MyDocuments, _

"*.exe", SearchOption.AllDirectories) _

Where (New FileInfo(fileName)).CreationTime > #6/30/2007# _

Select fileName

This is pretty cool, but it’s also very simple and doesn’t make for a very interesting blog post, so let’s go ahead and complicate things…

For many backend data sources out there in the IT world today, there exist query APIs and object models to represent the various “entities” found in the data.  A good example of what I’m talking about can be found in none other than the Windows file system.  Beyond using the methods in the .NET Framework to find files, Windows Desktop Search (available in Windows Vista or downloadable from http://www.microsoft.com/windows/desktopsearch/default.mspx) exposes an OLE DB Provider, allowing you to query its index.  Wouldn’t it be nice if we could use somehow use Linq to query the index instead having to type out SQL queries as Strings to send along to OLE DB?  Well, by creating a custom Linq provider, we can.

When it comes to creating a custom Linq provider, several informative blog posts exist on the Internet (see resources section), but I hope to detail a bit more of a practical “HowTo” on implementing a useful custom Linq provider.

Creating your own provider starts with implementing the IQueryable and IQueryProvider Interfaces.  Often, a custom object model exists for representing data in OO form.  For files on disk, we can use the FileInfo class.  Here’s the code we’ve got so far:

Imports System.IO

Public Class WDSQueryObject

    Implements IQueryable(Of FileInfo), IQueryProvider

End Class

If you type the above into VS, you’ll immediately notice that there are several methods on IQueryable and IQueryProvider that you must implement.  In the following post(s), I will detail each one.

NOTES:

  • The full source code for this project is available in under the resources section.  It may be useful to download it and step through the code in a debugger as you read along. 
  • All of my code samples are based on Orcas Beta 2.  The IQueryable interface has been refactored since the Beta 1 release.

 

CreateQuery

There are two CreateQuery methods on the IQueryProvider Interface.  One returns a generic IQueryable(Of TElement), and the other returns the non-generic IQueryable.  For most implementations, you can probably just have the non-generic one call the generic one:

    Public Function CreateQuery(ByVal expression As Expression) As IQueryable Implements IQueryProvider.CreateQuery

        Return CreateQuery1(Of FileInfo)(expression)

    End Function

For a simple query, CreateQuery will be called once for every for the “Where” clause and once for the “Select” clause.  For example, if we had a query like:

        Dim r = From file In index _

                Where file.Name Like "%.exe" _

          Select file.FullName

Then CreateQuery will be called once with the expression ‘file.Name Like "%.exe"’ and once with the expression ’file.FullName’.  Here’s my skeleton implementation of CreateQuery to handle both cases:

    Public Function CreateQuery1(Of TElement)(ByVal expression As Expression) As IQueryable(Of TElement) Implements IQueryProvider.CreateQuery

        Dim querySource As IQueryable(Of TElement) = Nothing

 

        Dim nodeType = expression.NodeType

        Select Case nodeType

            Case ExpressionType.Call

                Dim m As MethodCallExpression = expression

                Dim methodName = m.Method.Name

                Select Case methodName

                    Case "Select"

                        ' insert Select processing code

                    Case "Where"

                        ' insert Where processing code

                    Case Else

                        Throw New NotSupportedException("Queries using '" & methodName & "' are not supported for this collection.")

                End Select

            Case Else

                Throw New NotSupportedException("Creating a query from an expression of type '" & nodeType & "' is supported.")

        End Select

 

        Return querySource

    End Function

You’ll notice that the expression we get in CreateQuery actually contains the information about who is calling us (i.e. Select, Where, etc), and we’ll use that information to process the rest of the query appropriately.  For the filesystem example, the “Where” clause is the most interesting, so we’ll discuss that first.  The signature for the Where extension method defined on Queryable is:

                Public Shared Function Where(Of TSource)( _

ByVal source As IQueryable(Of TSource), _

ByVal predicate As Expression(Of System.Func(Of TSource, Boolean)) _

) As System.Linq.IQueryable(Of TSource)

If you look at the details of the expression tree, you’ll see that all of the information regarding the above signature is encoded (method call to to Where(Of FileInfo), value for source, and quoted lambda for predicate).

Expression tree for Where 

Below is the ‘Where processing code’ I use to process the above expression.  Let me explain what’s going on and then we’ll dive into the implementation.

m_query = New StringBuilder()

m_funclets = New List(Of KeyValuePair(Of String, Func(Of String)))()

Dim lambda As LambdaExpression = CType(m.Arguments(1), UnaryExpression).Operand

ExpandExpression(lambda.Body)

m_query.Insert(0, "SELECT System.ItemPathDisplay FROM SystemIndex WHERE NOT CONTAINS(System.ItemType, 'folder') AND (")

m_query.Append(")")

querySource = Me

You can pretty much ignore the "SELECT" string in the above code for now.  It's simply a boiler plate Windows Desktop Search query with an incomplete "WHERE" clause (see links under resources section for more on querying WDS).  What we'll be doing here is simply filling in the "WHERE" clause.  The actual projection (SELECT) for our query will be handled in the second call to CreateQuery.  As you can see in the Expression tree above, the first argument to Where (the ConstantExpression) is simply a reference to us (i.e. whatever is returned by the implementor of IQueryable.Expression).  Since we are both the IQueryable and IQueryProvider implementor, we don’t need to worry about this argument.  The second argument (the quoted lambda expression) is interesting, because we need to translate it into an SQL string.  This is the heart of our IQueryable implementation—translating expressions from a Linq query into a set of instructions that can be used to retrieve data from the underlying data source.  In my implementation, this translation is handled by ExpandExpression.  It recursively traverses the expression tree and expands them into SQL strings.  After it returns, m_query will contain the string we need to plug in to above the "WHERE" clause.   Here’s the code for ExpandExpression:

    Private Sub ExpandExpression(ByVal e As Expression)

        Select Case e.NodeType

            Case ExpressionType.And

                ExpandBinary(e, "AND")

            Case ExpressionType.Equal

                ExpandBinary(e, "=")

            Case ExpressionType.GreaterThan

                ExpandBinary(e, ">")

            Case ExpressionType.GreaterThanOrEqual

                ExpandBinary(e, ">=")

            Case ExpressionType.LessThan

                ExpandBinary(e, "<")

            Case ExpressionType.LessThanOrEqual

                ExpandBinary(e, "<=")

            Case ExpressionType.NotEqual

                ExpandBinary(e, "!=")

            Case ExpressionType.Not

                ExpandUnary(e, "NOT")

            Case ExpressionType.Or

                ExpandBinary(e, "OR")

            Case ExpressionType.Call

                ExpandCall(e)

            Case ExpressionType.MemberAccess

                ExpandMemberAccess(e)

            Case ExpressionType.Constant

                ExpandConstant(e)

            Case Else

                Throw New NotSupportedException("Expressions of type '" & e.NodeType.ToString() & "' are not supported.")

        End Select

    End Sub

You’ll see that we simply go through all the different expression tree nodes we want to support and call the appropriate processing method (note also that the implementation is incomplete, but we cover most of the common types of expressions).  Recursive processing continues until all the nodes in the expression tree have been evaluated.  Let’s have a look at a simple query and walk through the processing methods that will be called.  Given the following query:

        Dim index As New WDSQueryObject

        Dim cutoffDate = #6/28/2007#

 

        Dim r = From file In index _

                Where file.CreationTime > cutoffDate And _

                file.Name Like "%.exe" _

                Select file.FullName

The first method that will get called is ExpandBinary.  This, in turn, calls ConcatBinary and combines the left and right hand expressions using the appropriate operator (in this case, “AND”).

    Private Sub ExpandBinary(ByVal b As BinaryExpression, ByVal op As String)

        ConcatBinary(b.Left, b.Right, op)

    End Sub

 

    Private Sub ConcatBinary(ByVal left As Expression, ByVal right As Expression, ByVal op As String)

        ExpandExpression(left)

        m_query.Append(" ")

        m_query.Append(op)

        m_query.Append(" ")

        ExpandExpression(right)

    End Sub

 

Processing the left hand side of the expression will end up calling ConcatBinary again (this time with the “>” operator) and will subsequently call ExpandMemberAccess.  This is where the interesting processing begins.

 

    Private Sub ExpandMemberAccess(ByVal m As MemberExpression)

        Dim member = m.Member

        Dim e = m.Expression

        Select Case e.NodeType

            Case ExpressionType.Parameter

                ' Parameter processing code

            Case ExpressionType.Constant

                ' Constant processing code

            Case Else

                Throw New NotSupportedException("Accessing member '" & member.Name & "' is not supported in this context.")

        End Select

    End Sub

 

The first block that we’re going to hit is the ‘Parameter processing code’.  In this context, ‘parameters’ are going to be the iteration variables of the query (the ‘file’ object).  What we need to do with that information is translate the property access on the FileInfo object (file.CreationTime) into a Windows filesystem attribute name.  Here’s the code I use to do that:

 

    Private Function GetAttributeName(ByVal m As MemberInfo) As String

        Dim name As String

 

        Dim memberName = m.Name

        Select Case memberName

            Case "CreationTime"

                name = "System.DateCreated"

            Case "Name"

                name = "System.FileName"

            Case Else

                Throw New NotSupportedException("Using the property '" & memberName & "' in filter expressions is not supported.")

        End Select

 

        Return name

    End Function

As before, the implementation is incomplete, but adding translations for more properties should be very straightforward.  A complete list of supported filesystem attributes can be found in the links under the resources section.  The next block of code we’re going to hit is the ‘Constant processing code’.  Here, we will need to intrepret the access to the variable cutoffDate.  The code I use is as follows:

 

Dim valueName = "[value" & m_funclets.Count & "]"

Dim valueFunc As Func(Of String) = Nothing

Dim memberType = member.MemberType

 

If m.Type Is GetType(String) OrElse m.Type Is GetType(Date) Then

m_query.Append("'")

m_query.Append(valueName)

m_query.Append("'")

Else

m_query.Append(valueName)

End If

 

Dim funclet As Func(Of String) = Nothing

Select Case memberType

Case MemberTypes.Field

Dim f As FieldInfo = member

Dim c As ConstantExpression = e

If m.Type Is GetType(Date) Then

funclet = Function() CDate(f.GetValue(c.Value)).ToString("yyyy-MM-dd")

Else

funclet = Function() CStr(f.GetValue(c.Value))

End If

Case Else

                  Throw New NotSupportedException("Accessing member of type'" & memberType & "' is not supported.")

End Select

m_funclets.Add(New KeyValuePair(Of String, Func(Of String))(valueName, funclet))

 

So what’s all this ‘funclet’ nonsense?  Well, the Linq architecture revolves around the concept of delayed execution.  In other words, I create the query at one point, but I don’t actually evaluate it (capture input values and query underlying data source) until I start to use the query results.  Because of this, we want to capture the information about how to access the contents of cutoffDate, but we don’t want to store the value away just yet.  What I’m doing is placing a token ([value*]) in the query string and then creating a function that I can use to get the value of cutoffDate when the results of the query are accessed.  I create the function using a lambda expression.  This is basically a convenient way to create an inline, anonymous delegate in my code.  It also has the benefit of automatically creating a closure class to store all of the information about the variables I access in the current block.  For example, when I enter the ‘Case’ block, a new closure class will be generated, and the values for ‘f’ and ‘c’ will be stored in it.  The compiler automatically translates these local variable accesses into field accesses on the appropriate members of the closure class.  When the query is executed, and I execute the funclets to replace the [value*] tokens, I will get the value of the variables at that point in program’s execution (rather than at the point when the query is created).  You’ll notice that the MemberAccessExpression for cutoffDate also represents a lifted local variable.  This is why the member type is ‘Field’.  Since cutoffDate is being used in a query, the value is actually being stored in a field on a closure class.

 

The next expression that we’ll end up exanding is the one representing ‘file.Name Like "%.exe"’.  You might be surprised to find out that we process this node in ExpandCall rather than ExpandBinary.  As it turns out, the VB compiler translates several of the common binary operators into calls to VB runtime functions.  This allows VB to add extra functionality that is not supported by the CLR.  LikeString (generated for the VB ‘Like’ operator) and CompareString (generated for string comparison expressions like “a” = “A”) are examples of this behavior.  Here’s my implementation of ExpandCall that takes LikeString into account:

 

    Private Sub ExpandCall(ByVal m As MethodCallExpression, Optional ByVal op As String = "")

        Dim methodName = m.Method.Name

        Select Case methodName

            Case "LikeString"

                ConcatBinary(m.Arguments(0), m.Arguments(1), "LIKE")

            Case Else

                Throw New NotSupportedException("Using method '" & methodName & "' in a filter expression is not supported.")

        End Select

    End Sub

 

The last thing we need to to do in order to wrap up processing of the “Where” is process the constant string value "%.exe".  This translation straightforward, and the only thing worth paying attention to is that the default conversion for some data types may not work for your data source.  In this case, WDS requires dates to be in a specific format.

 

    Private Sub ExpandConstant(ByVal c As ConstantExpression)

        Dim value = c.Value

        If value.GetType() Is GetType(String) Then

            m_query.Append("'")

            m_query.Append(CStr(value))

            m_query.Append("'")

        ElseIf value.GetType() Is GetType(Date) Then

            m_query.Append("'")

            m_query.Append(CDate(value).ToString("yyyy-MM-dd"))

            m_query.Append("'")

        Else

            m_query.Append(value.ToString())

        End If

    End Sub

 

Here ends the processing of the “Where” and the conclusion of the CreateQuery method.  We have build up the following query to pass along to WDS:

 

"SELECT System.ItemPathDisplay FROM SystemIndex WHERE NOT CONTAINS(System.ItemType, 'folder') AND (System.DateCreated > '[value0]' AND System.FileName LIKE '%.exe')"

 

In my next post, I’ll finish up by covering GetEnumerator and Select.

 

Resources

 

Full source code for this project:

http://hresult.members.winisp.net/FileSystemQuery.zip

 

Bart De Smet’s excellent blog on Implementing IQueryable for Linq to LDAP:

http://community.bartdesmet.net/blogs/bart/archive/2007/04/05/the-iqueryable-tales-linq-to-ldap-part-0.aspx

 

Fabrice Marguerie’s blog in implementing Linq to Amazon:

http://weblogs.asp.net/fmarguerie/archive/2006/06/26/Introducing-Linq-to-Amazon.aspx

 

Catherine Heller’s blog on Windows Desktop (Vista) Search:

http://blogs.msdn.com/cheller/archive/2006/06/21/642220.aspx

 

List of query attributes supported by the Windows filesystem

http://msdn2.microsoft.com/en-us/library/aa830600.aspx