In the Orcas timeframe, Microsoft will be supplying a couple of specialized flavors of Linq to address common data access scenarios. DLinq covers SQL servers and XLinq handles XML documents, but what about the countless other data sources out there that a user might want to interact with using Linq? For some, you can simply gather your data into CLR collection and make use of the default Linq experience. For example, if you wanted to find new *.exe files on your hard drive, you might use Linq to do something like this:
Dim newExe = From fileName In Directory.GetFiles( _
"*.exe", SearchOption.AllDirectories) _
Where (New FileInfo(fileName)).CreationTime > #6/30/2007# _
This is pretty cool, but it’s also very simple and doesn’t make for a very interesting blog post, so let’s go ahead and complicate things…
For many backend data sources out there in the IT world today, there exist query APIs and object models to represent the various “entities” found in the data. A good example of what I’m talking about can be found in none other than the Windows file system. Beyond using the methods in the .NET Framework to find files, Windows Desktop Search (available in Windows Vista or downloadable from http://www.microsoft.com/windows/desktopsearch/default.mspx) exposes an OLE DB Provider, allowing you to query its index. Wouldn’t it be nice if we could use somehow use Linq to query the index instead having to type out SQL queries as Strings to send along to OLE DB? Well, by creating a custom Linq provider, we can.
When it comes to creating a custom Linq provider, several informative blog posts exist on the Internet (see resources section), but I hope to detail a bit more of a practical “HowTo” on implementing a useful custom Linq provider.
Creating your own provider starts with implementing the IQueryable and IQueryProvider Interfaces. Often, a custom object model exists for representing data in OO form. For files on disk, we can use the FileInfo class. Here’s the code we’ve got so far:
Public Class WDSQueryObject
Implements IQueryable(Of FileInfo), IQueryProvider
If you type the above into VS, you’ll immediately notice that there are several methods on IQueryable and IQueryProvider that you must implement. In the following post(s), I will detail each one.
There are two CreateQuery methods on the IQueryProvider Interface. One returns a generic IQueryable(Of TElement), and the other returns the non-generic IQueryable. For most implementations, you can probably just have the non-generic one call the generic one:
Public Function CreateQuery(ByVal expression As Expression) As IQueryable Implements IQueryProvider.CreateQuery
Return CreateQuery1(Of FileInfo)(expression)
For a simple query, CreateQuery will be called once for every for the “Where” clause and once for the “Select” clause. For example, if we had a query like:
Dim r = From file In index _
Where file.Name Like "%.exe" _
Then CreateQuery will be called once with the expression ‘file.Name Like "%.exe"’ and once with the expression ’file.FullName’. Here’s my skeleton implementation of CreateQuery to handle both cases:
Public Function CreateQuery1(Of TElement)(ByVal expression As Expression) As IQueryable(Of TElement) Implements IQueryProvider.CreateQuery
Dim querySource As IQueryable(Of TElement) = Nothing
Dim nodeType = expression.NodeType
Select Case nodeType
Dim m As MethodCallExpression = expression
Dim methodName = m.Method.Name
Select Case methodName
' insert Select processing code
' insert Where processing code
Throw New NotSupportedException("Queries using '" & methodName & "' are not supported for this collection.")
Throw New NotSupportedException("Creating a query from an expression of type '" & nodeType & "' is supported.")
You’ll notice that the expression we get in CreateQuery actually contains the information about who is calling us (i.e. Select, Where, etc), and we’ll use that information to process the rest of the query appropriately. For the filesystem example, the “Where” clause is the most interesting, so we’ll discuss that first. The signature for the Where extension method defined on Queryable is:
Public Shared Function Where(Of TSource)( _
ByVal source As IQueryable(Of TSource), _
ByVal predicate As Expression(Of System.Func(Of TSource, Boolean)) _
) As System.Linq.IQueryable(Of TSource)
If you look at the details of the expression tree, you’ll see that all of the information regarding the above signature is encoded (method call to to Where(Of FileInfo), value for source, and quoted lambda for predicate).
Below is the ‘Where processing code’ I use to process the above expression. Let me explain what’s going on and then we’ll dive into the implementation.
m_query = New StringBuilder()
m_funclets = New List(Of KeyValuePair(Of String, Func(Of String)))()
Dim lambda As LambdaExpression = CType(m.Arguments(1), UnaryExpression).Operand
m_query.Insert(0, "SELECT System.ItemPathDisplay FROM SystemIndex WHERE NOT CONTAINS(System.ItemType, 'folder') AND (")
querySource = Me
You can pretty much ignore the "SELECT" string in the above code for now. It's simply a boiler plate Windows Desktop Search query with an incomplete "WHERE" clause (see links under resources section for more on querying WDS). What we'll be doing here is simply filling in the "WHERE" clause. The actual projection (SELECT) for our query will be handled in the second call to CreateQuery. As you can see in the Expression tree above, the first argument to Where (the ConstantExpression) is simply a reference to us (i.e. whatever is returned by the implementor of IQueryable.Expression). Since we are both the IQueryable and IQueryProvider implementor, we don’t need to worry about this argument. The second argument (the quoted lambda expression) is interesting, because we need to translate it into an SQL string. This is the heart of our IQueryable implementation—translating expressions from a Linq query into a set of instructions that can be used to retrieve data from the underlying data source. In my implementation, this translation is handled by ExpandExpression. It recursively traverses the expression tree and expands them into SQL strings. After it returns, m_query will contain the string we need to plug in to above the "WHERE" clause. Here’s the code for ExpandExpression:
Private Sub ExpandExpression(ByVal e As Expression)
Select Case e.NodeType
Throw New NotSupportedException("Expressions of type '" & e.NodeType.ToString() & "' are not supported.")
You’ll see that we simply go through all the different expression tree nodes we want to support and call the appropriate processing method (note also that the implementation is incomplete, but we cover most of the common types of expressions). Recursive processing continues until all the nodes in the expression tree have been evaluated. Let’s have a look at a simple query and walk through the processing methods that will be called. Given the following query:
Dim index As New WDSQueryObject
Dim cutoffDate = #6/28/2007#
Where file.CreationTime > cutoffDate And _
file.Name Like "%.exe" _
The first method that will get called is ExpandBinary. This, in turn, calls ConcatBinary and combines the left and right hand expressions using the appropriate operator (in this case, “AND”).
Private Sub ExpandBinary(ByVal b As BinaryExpression, ByVal op As String)
ConcatBinary(b.Left, b.Right, op)
Private Sub ConcatBinary(ByVal left As Expression, ByVal right As Expression, ByVal op As String)
Processing the left hand side of the expression will end up calling ConcatBinary again (this time with the “>” operator) and will subsequently call ExpandMemberAccess. This is where the interesting processing begins.
Private Sub ExpandMemberAccess(ByVal m As MemberExpression)
Dim member = m.Member
Dim e = m.Expression
' Parameter processing code
' Constant processing code
Throw New NotSupportedException("Accessing member '" & member.Name & "' is not supported in this context.")
The first block that we’re going to hit is the ‘Parameter processing code’. In this context, ‘parameters’ are going to be the iteration variables of the query (the ‘file’ object). What we need to do with that information is translate the property access on the FileInfo object (file.CreationTime) into a Windows filesystem attribute name. Here’s the code I use to do that:
Private Function GetAttributeName(ByVal m As MemberInfo) As String
Dim name As String
Dim memberName = m.Name
Select Case memberName
name = "System.DateCreated"
name = "System.FileName"
Throw New NotSupportedException("Using the property '" & memberName & "' in filter expressions is not supported.")
As before, the implementation is incomplete, but adding translations for more properties should be very straightforward. A complete list of supported filesystem attributes can be found in the links under the resources section. The next block of code we’re going to hit is the ‘Constant processing code’. Here, we will need to intrepret the access to the variable cutoffDate. The code I use is as follows:
Dim valueName = "[value" & m_funclets.Count & "]"
Dim valueFunc As Func(Of String) = Nothing
Dim memberType = member.MemberType
If m.Type Is GetType(String) OrElse m.Type Is GetType(Date) Then
Dim funclet As Func(Of String) = Nothing
Select Case memberType
Dim f As FieldInfo = member
Dim c As ConstantExpression = e
If m.Type Is GetType(Date) Then
funclet = Function() CDate(f.GetValue(c.Value)).ToString("yyyy-MM-dd")
funclet = Function() CStr(f.GetValue(c.Value))
Throw New NotSupportedException("Accessing member of type'" & memberType & "' is not supported.")
m_funclets.Add(New KeyValuePair(Of String, Func(Of String))(valueName, funclet))
So what’s all this ‘funclet’ nonsense? Well, the Linq architecture revolves around the concept of delayed execution. In other words, I create the query at one point, but I don’t actually evaluate it (capture input values and query underlying data source) until I start to use the query results. Because of this, we want to capture the information about how to access the contents of cutoffDate, but we don’t want to store the value away just yet. What I’m doing is placing a token ([value*]) in the query string and then creating a function that I can use to get the value of cutoffDate when the results of the query are accessed. I create the function using a lambda expression. This is basically a convenient way to create an inline, anonymous delegate in my code. It also has the benefit of automatically creating a closure class to store all of the information about the variables I access in the current block. For example, when I enter the ‘Case’ block, a new closure class will be generated, and the values for ‘f’ and ‘c’ will be stored in it. The compiler automatically translates these local variable accesses into field accesses on the appropriate members of the closure class. When the query is executed, and I execute the funclets to replace the [value*] tokens, I will get the value of the variables at that point in program’s execution (rather than at the point when the query is created). You’ll notice that the MemberAccessExpression for cutoffDate also represents a lifted local variable. This is why the member type is ‘Field’. Since cutoffDate is being used in a query, the value is actually being stored in a field on a closure class.
The next expression that we’ll end up exanding is the one representing ‘file.Name Like "%.exe"’. You might be surprised to find out that we process this node in ExpandCall rather than ExpandBinary. As it turns out, the VB compiler translates several of the common binary operators into calls to VB runtime functions. This allows VB to add extra functionality that is not supported by the CLR. LikeString (generated for the VB ‘Like’ operator) and CompareString (generated for string comparison expressions like “a” = “A”) are examples of this behavior. Here’s my implementation of ExpandCall that takes LikeString into account:
Private Sub ExpandCall(ByVal m As MethodCallExpression, Optional ByVal op As String = "")
Dim methodName = m.Method.Name
Select Case methodName
ConcatBinary(m.Arguments(0), m.Arguments(1), "LIKE")
Throw New NotSupportedException("Using method '" & methodName & "' in a filter expression is not supported.")
The last thing we need to to do in order to wrap up processing of the “Where” is process the constant string value "%.exe". This translation straightforward, and the only thing worth paying attention to is that the default conversion for some data types may not work for your data source. In this case, WDS requires dates to be in a specific format.
Private Sub ExpandConstant(ByVal c As ConstantExpression)
Dim value = c.Value
If value.GetType() Is GetType(String) Then
ElseIf value.GetType() Is GetType(Date) Then
Here ends the processing of the “Where” and the conclusion of the CreateQuery method. We have build up the following query to pass along to WDS:
"SELECT System.ItemPathDisplay FROM SystemIndex WHERE NOT CONTAINS(System.ItemType, 'folder') AND (System.DateCreated > '[value0]' AND System.FileName LIKE '%.exe')"
In my next post, I’ll finish up by covering GetEnumerator and Select.
Full source code for this project:
Bart De Smet’s excellent blog on Implementing IQueryable for Linq to LDAP:
Fabrice Marguerie’s blog in implementing Linq to Amazon:
Catherine Heller’s blog on Windows Desktop (Vista) Search:
List of query attributes supported by the Windows filesystem