Kevin's VB Adventures

  • How to implement IQueryable (Part 2)

    This is Part 2 of my 2 part series on “How to Implement IQueryable”.  Please see the first post for additional resource links and full source code download (http://blogs.msdn.com/kevin_halverson/archive/2007/07/10/how-to-implement-iqueryable.aspx).

     

    GetEnumerator

    In many Linq providers, it will probably make sense to combine the expressions from the various query methods (Where, Select, etc) into a single set of instructions and allow the “server” to do all of the data processing (returning the final results of the query).  For some underlying data APIs, however, it may not be possible or may not make sense to try to translate the rich set of expressions available in the client language.  In the interest of providing the most flexible end-user scenario (greatest functionality), it may be useful to process part of the query on the “client” side for these cases (as if you were using Linq on an ordinary CLR collection).  In my example, this is how I chose to handle “SELECT” expressions.  As was detailed in my previous post, we will generate an SQL string to query the “server” (Windows Desktop Search index) for “WHERE” expressions.  This query will return a collection of FileInfo objects that we can in turn project into the appropriate form with a little help from the expression tree compiler.  To do this processing, I created a new IQueryable object called WDSQueryObjectProjector.   Inside of CreateQuery1, I construct one of these and pass along the “SELECT” expression:

     

        Public Function CreateQuery1(Of TElement)(ByVal expression As Expression) As IQueryable(Of TElement) Implements IQueryProvider.CreateQuery

           

            Select Case nodeType

                Case ExpressionType.Call

                   

                    Select Case methodName

                        Case "Select"

                            querySource = New WDSQueryObjectProjector(Of TElement)(expression)

                        Case "Where"

                           

                        Case Else

                            

                    End Select

                Case Else

                   

            End Select

     

            Return querySource

        End Function

     

    Let’s take a look at the expression we get when CreateQuery1 is called for the “SELECT” case.

     

    Expression tree for Select 

    As before (with the Where method), the outer node in the expression is the query method that is calling us (in this case, Select).  The first argument is the instance of the WDSQueryObject returned by CreateQuery1 when we processed the “WHERE” expression.  Also as before, the second argument is a quoted lambda.  In this case, it represents the projection expression ‘file.FullName’ (reference the original query in the previous post for details).  When we attempt to enumerate the results of the query, the GetEnumerator method will be called on the WDSQueryObjectProjector instance.  It will in turn enumerate over the items in the WDSQueryObject instance (first argument above) and apply the projection expression (returning the results of the projection).  To make that a bit more clear, here’s the implementation of WDSQueryObjectProjector.GetEnumerator:

     

        Public Function GetEnumerator() As IEnumerator(Of TResult) Implements IEnumerable(Of TResult).GetEnumerator

            Dim m As MethodCallExpression = m_expression

            Dim qo As WDSQueryObject = CType(m.Arguments(0), ConstantExpression).Value

            Dim quote As UnaryExpression = m.Arguments(1)

            Dim projector As Expression(Of Func(Of FileInfo, TResult)) = quote.Operand

            Dim func = projector.Compile()

     

            Dim tuples As New List(Of TResult)()

            For Each obj In qo

                tuples.Add(func(obj))

            Next

     

            Return tuples.GetEnumerator()

        End Function

     

    The interesting part of the code above is when we call LambdaExpression.Compile on the Expression(Of Func(Of FileInfo, TResult)).  The expression tree compiler is doing a lot of work for us to translate the “SELECT” expression into a delegate function that can take a FileInfo object and transform it into the type of our result.  In our case (file.FullName), it will simply return a String, but the expression tree compiler is a very powerful tool and supports every kind of expression supported by the expression tree API (something far outside the scope of my little project).  So for example, if you had an expression like:

     

                    Select file.FullName, file.CreationTime

     

    Then the delegate will correctly return an anonymous type with properties for FullName (as String) and CreationTime (as Date).  Pretty cool, huh?  Since the expression tree compiler is doing all the heavy lifting here, we simply need to loop through the results of the Where method and apply the projection.  Speaking of which, how we get the results from the Where (‘qo’ in the code above)?  Well, when we iterate over ‘qo’, WDSQueryObject.GetEnumerator will be called, and we’ll finally execute the query string that we built up when processing the “WHERE” expression.  Here’s the code:

     

        Public Function GetEnumerator() As IEnumerator(Of FileInfo) Implements IEnumerable(Of FileInfo).GetEnumerator

            Dim tuples As New List(Of FileInfo)()

     

            EvaluateFunclets()

     

            Dim connection As New OleDbConnection("Provider=Search.CollatorDSO;Extended Properties='Application=Windows';")

            connection.Open()

     

            Dim command As New OleDbCommand()

            command.Connection = connection

            command.CommandText = m_query.ToString()

     

            Dim reader = command.ExecuteReader()

            While reader.Read()

                Dim col = reader(0)

                If Not IsDBNull(col) Then

                    tuples.Add(New FileInfo(col))

                End If

            End While

            reader.Close()

     

            connection.Close()

     

            Return tuples.GetEnumerator()

        End Function

     

    This is pretty straight forward.  We open a connection to the WDS OLE DB provider, execute the query and gather the results (doing some null checking).  The thing to note is the call to EvaluateFunclets.  In my previous post, I talked a little about lambdas and how we captured the information necessary to access the variables used in our query expression (but didn’t actually capture the values).  Since this is the point when the query is executed, now is the time to evaluate all of the variables and plug their values into our query.  I accomplish this by simply iterating through all the funclets and inserting the result of their invocation into the query string.

     

        Private Sub EvaluateFunclets()

            For Each pair In m_funclets

                m_query.Replace(pair.Key, pair.Value.Invoke())

            Next

        End Sub

     

    And that’s it—happy Linqing!

  • How to implement IQueryable (Part 1)

    In the Orcas timeframe, Microsoft will be supplying a couple of specialized flavors of Linq to address common data access scenarios.  DLinq covers SQL servers and XLinq handles XML documents, but what about the countless other data sources out there that a user might want to interact with using Linq?  For some, you can simply gather your data into CLR collection and make use of the default Linq experience.  For example, if you wanted to find new *.exe files on your hard drive, you might use Linq to do something like this:

    Dim newExe = From fileName In Directory.GetFiles( _

    My.Computer.FileSystem.SpecialDirectories.MyDocuments, _

    "*.exe", SearchOption.AllDirectories) _

    Where (New FileInfo(fileName)).CreationTime > #6/30/2007# _

    Select fileName

    This is pretty cool, but it’s also very simple and doesn’t make for a very interesting blog post, so let’s go ahead and complicate things…

    For many backend data sources out there in the IT world today, there exist query APIs and object models to represent the various “entities” found in the data.  A good example of what I’m talking about can be found in none other than the Windows file system.  Beyond using the methods in the .NET Framework to find files, Windows Desktop Search (available in Windows Vista or downloadable from http://www.microsoft.com/windows/desktopsearch/default.mspx) exposes an OLE DB Provider, allowing you to query its index.  Wouldn’t it be nice if we could use somehow use Linq to query the index instead having to type out SQL queries as Strings to send along to OLE DB?  Well, by creating a custom Linq provider, we can.

    When it comes to creating a custom Linq provider, several informative blog posts exist on the Internet (see resources section), but I hope to detail a bit more of a practical “HowTo” on implementing a useful custom Linq provider.

    Creating your own provider starts with implementing the IQueryable and IQueryProvider Interfaces.  Often, a custom object model exists for representing data in OO form.  For files on disk, we can use the FileInfo class.  Here’s the code we’ve got so far:

    Imports System.IO

    Public Class WDSQueryObject

        Implements IQueryable(Of FileInfo), IQueryProvider

    End Class

    If you type the above into VS, you’ll immediately notice that there are several methods on IQueryable and IQueryProvider that you must implement.  In the following post(s), I will detail each one.

    NOTES:

    • The full source code for this project is available in under the resources section.  It may be useful to download it and step through the code in a debugger as you read along. 
    • All of my code samples are based on Orcas Beta 2.  The IQueryable interface has been refactored since the Beta 1 release.

     

    CreateQuery

    There are two CreateQuery methods on the IQueryProvider Interface.  One returns a generic IQueryable(Of TElement), and the other returns the non-generic IQueryable.  For most implementations, you can probably just have the non-generic one call the generic one:

        Public Function CreateQuery(ByVal expression As Expression) As IQueryable Implements IQueryProvider.CreateQuery

            Return CreateQuery1(Of FileInfo)(expression)

        End Function

    For a simple query, CreateQuery will be called once for every for the “Where” clause and once for the “Select” clause.  For example, if we had a query like:

            Dim r = From file In index _

                    Where file.Name Like "%.exe" _

              Select file.FullName

    Then CreateQuery will be called once with the expression ‘file.Name Like "%.exe"’ and once with the expression ’file.FullName’.  Here’s my skeleton implementation of CreateQuery to handle both cases:

        Public Function CreateQuery1(Of TElement)(ByVal expression As Expression) As IQueryable(Of TElement) Implements IQueryProvider.CreateQuery

            Dim querySource As IQueryable(Of TElement) = Nothing

     

            Dim nodeType = expression.NodeType

            Select Case nodeType

                Case ExpressionType.Call

                    Dim m As MethodCallExpression = expression

                    Dim methodName = m.Method.Name

                    Select Case methodName

                        Case "Select"

                            ' insert Select processing code

                        Case "Where"

                            ' insert Where processing code

                        Case Else

                            Throw New NotSupportedException("Queries using '" & methodName & "' are not supported for this collection.")

                    End Select

                Case Else

                    Throw New NotSupportedException("Creating a query from an expression of type '" & nodeType & "' is supported.")

            End Select

     

            Return querySource

        End Function

    You’ll notice that the expression we get in CreateQuery actually contains the information about who is calling us (i.e. Select, Where, etc), and we’ll use that information to process the rest of the query appropriately.  For the filesystem example, the “Where” clause is the most interesting, so we’ll discuss that first.  The signature for the Where extension method defined on Queryable is:

                    Public Shared Function Where(Of TSource)( _

    ByVal source As IQueryable(Of TSource), _

    ByVal predicate As Expression(Of System.Func(Of TSource, Boolean)) _

    ) As System.Linq.IQueryable(Of TSource)

    If you look at the details of the expression tree, you’ll see that all of the information regarding the above signature is encoded (method call to to Where(Of FileInfo), value for source, and quoted lambda for predicate).

    Expression tree for Where 

    Below is the ‘Where processing code’ I use to process the above expression.  Let me explain what’s going on and then we’ll dive into the implementation.

    m_query = New StringBuilder()

    m_funclets = New List(Of KeyValuePair(Of String, Func(Of String)))()

    Dim lambda As LambdaExpression = CType(m.Arguments(1), UnaryExpression).Operand

    ExpandExpression(lambda.Body)

    m_query.Insert(0, "SELECT System.ItemPathDisplay FROM SystemIndex WHERE NOT CONTAINS(System.ItemType, 'folder') AND (")

    m_query.Append(")")

    querySource = Me

    You can pretty much ignore the "SELECT" string in the above code for now.  It's simply a boiler plate Windows Desktop Search query with an incomplete "WHERE" clause (see links under resources section for more on querying WDS).  What we'll be doing here is simply filling in the "WHERE" clause.  The actual projection (SELECT) for our query will be handled in the second call to CreateQuery.  As you can see in the Expression tree above, the first argument to Where (the ConstantExpression) is simply a reference to us (i.e. whatever is returned by the implementor of IQueryable.Expression).  Since we are both the IQueryable and IQueryProvider implementor, we don’t need to worry about this argument.  The second argument (the quoted lambda expression) is interesting, because we need to translate it into an SQL string.  This is the heart of our IQueryable implementation—translating expressions from a Linq query into a set of instructions that can be used to retrieve data from the underlying data source.  In my implementation, this translation is handled by ExpandExpression.  It recursively traverses the expression tree and expands them into SQL strings.  After it returns, m_query will contain the string we need to plug in to above the "WHERE" clause.   Here’s the code for ExpandExpression:

        Private Sub ExpandExpression(ByVal e As Expression)

            Select Case e.NodeType

                Case ExpressionType.And

                    ExpandBinary(e, "AND")

                Case ExpressionType.Equal

                    ExpandBinary(e, "=")

                Case ExpressionType.GreaterThan

                    ExpandBinary(e, ">")

                Case ExpressionType.GreaterThanOrEqual

                    ExpandBinary(e, ">=")

                Case ExpressionType.LessThan

                    ExpandBinary(e, "<")

                Case ExpressionType.LessThanOrEqual

                    ExpandBinary(e, "<=")

                Case ExpressionType.NotEqual

                    ExpandBinary(e, "!=")

                Case ExpressionType.Not

                    ExpandUnary(e, "NOT")

                Case ExpressionType.Or

                    ExpandBinary(e, "OR")

                Case ExpressionType.Call

                    ExpandCall(e)

                Case ExpressionType.MemberAccess

                    ExpandMemberAccess(e)

                Case ExpressionType.Constant

                    ExpandConstant(e)

                Case Else

                    Throw New NotSupportedException("Expressions of type '" & e.NodeType.ToString() & "' are not supported.")

            End Select

        End Sub

    You’ll see that we simply go through all the different expression tree nodes we want to support and call the appropriate processing method (note also that the implementation is incomplete, but we cover most of the common types of expressions).  Recursive processing continues until all the nodes in the expression tree have been evaluated.  Let’s have a look at a simple query and walk through the processing methods that will be called.  Given the following query:

            Dim index As New WDSQueryObject

            Dim cutoffDate = #6/28/2007#

     

            Dim r = From file In index _

                    Where file.CreationTime > cutoffDate And _

                    file.Name Like "%.exe" _

                    Select file.FullName

    The first method that will get called is ExpandBinary.  This, in turn, calls ConcatBinary and combines the left and right hand expressions using the appropriate operator (in this case, “AND”).

        Private Sub ExpandBinary(ByVal b As BinaryExpression, ByVal op As String)

            ConcatBinary(b.Left, b.Right, op)

        End Sub

     

        Private Sub ConcatBinary(ByVal left As Expression, ByVal right As Expression, ByVal op As String)

            ExpandExpression(left)

            m_query.Append(" ")

            m_query.Append(op)

            m_query.Append(" ")

            ExpandExpression(right)

        End Sub

     

    Processing the left hand side of the expression will end up calling ConcatBinary again (this time with the “>” operator) and will subsequently call ExpandMemberAccess.  This is where the interesting processing begins.

     

        Private Sub ExpandMemberAccess(ByVal m As MemberExpression)

            Dim member = m.Member

            Dim e = m.Expression

            Select Case e.NodeType

                Case ExpressionType.Parameter

                    ' Parameter processing code

                Case ExpressionType.Constant

                    ' Constant processing code

                Case Else

                    Throw New NotSupportedException("Accessing member '" & member.Name & "' is not supported in this context.")

            End Select

        End Sub

     

    The first block that we’re going to hit is the ‘Parameter processing code’.  In this context, ‘parameters’ are going to be the iteration variables of the query (the ‘file’ object).  What we need to do with that information is translate the property access on the FileInfo object (file.CreationTime) into a Windows filesystem attribute name.  Here’s the code I use to do that:

     

        Private Function GetAttributeName(ByVal m As MemberInfo) As String

            Dim name As String

     

            Dim memberName = m.Name

            Select Case memberName

                Case "CreationTime"

                    name = "System.DateCreated"

                Case "Name"

                    name = "System.FileName"

                Case Else

                    Throw New NotSupportedException("Using the property '" & memberName & "' in filter expressions is not supported.")

            End Select

     

            Return name

        End Function

    As before, the implementation is incomplete, but adding translations for more properties should be very straightforward.  A complete list of supported filesystem attributes can be found in the links under the resources section.  The next block of code we’re going to hit is the ‘Constant processing code’.  Here, we will need to intrepret the access to the variable cutoffDate.  The code I use is as follows:

     

    Dim valueName = "[value" & m_funclets.Count & "]"

    Dim valueFunc As Func(Of String) = Nothing

    Dim memberType = member.MemberType

     

    If m.Type Is GetType(String) OrElse m.Type Is GetType(Date) Then

    m_query.Append("'")

    m_query.Append(valueName)

    m_query.Append("'")

    Else

    m_query.Append(valueName)

    End If

     

    Dim funclet As Func(Of String) = Nothing

    Select Case memberType

    Case MemberTypes.Field

    Dim f As FieldInfo = member

    Dim c As ConstantExpression = e

    If m.Type Is GetType(Date) Then

    funclet = Function() CDate(f.GetValue(c.Value)).ToString("yyyy-MM-dd")

    Else

    funclet = Function() CStr(f.GetValue(c.Value))

    End If

    Case Else

                      Throw New NotSupportedException("Accessing member of type'" & memberType & "' is not supported.")

    End Select

    m_funclets.Add(New KeyValuePair(Of String, Func(Of String))(valueName, funclet))

     

    So what’s all this ‘funclet’ nonsense?  Well, the Linq architecture revolves around the concept of delayed execution.  In other words, I create the query at one point, but I don’t actually evaluate it (capture input values and query underlying data source) until I start to use the query results.  Because of this, we want to capture the information about how to access the contents of cutoffDate, but we don’t want to store the value away just yet.  What I’m doing is placing a token ([value*]) in the query string and then creating a function that I can use to get the value of cutoffDate when the results of the query are accessed.  I create the function using a lambda expression.  This is basically a convenient way to create an inline, anonymous delegate in my code.  It also has the benefit of automatically creating a closure class to store all of the information about the variables I access in the current block.  For example, when I enter the ‘Case’ block, a new closure class will be generated, and the values for ‘f’ and ‘c’ will be stored in it.  The compiler automatically translates these local variable accesses into field accesses on the appropriate members of the closure class.  When the query is executed, and I execute the funclets to replace the [value*] tokens, I will get the value of the variables at that point in program’s execution (rather than at the point when the query is created).  You’ll notice that the MemberAccessExpression for cutoffDate also represents a lifted local variable.  This is why the member type is ‘Field’.  Since cutoffDate is being used in a query, the value is actually being stored in a field on a closure class.

     

    The next expression that we’ll end up exanding is the one representing ‘file.Name Like "%.exe"’.  You might be surprised to find out that we process this node in ExpandCall rather than ExpandBinary.  As it turns out, the VB compiler translates several of the common binary operators into calls to VB runtime functions.  This allows VB to add extra functionality that is not supported by the CLR.  LikeString (generated for the VB ‘Like’ operator) and CompareString (generated for string comparison expressions like “a” = “A”) are examples of this behavior.  Here’s my implementation of ExpandCall that takes LikeString into account:

     

        Private Sub ExpandCall(ByVal m As MethodCallExpression, Optional ByVal op As String = "")

            Dim methodName = m.Method.Name

            Select Case methodName

                Case "LikeString"

                    ConcatBinary(m.Arguments(0), m.Arguments(1), "LIKE")

                Case Else

                    Throw New NotSupportedException("Using method '" & methodName & "' in a filter expression is not supported.")

            End Select

        End Sub

     

    The last thing we need to to do in order to wrap up processing of the “Where” is process the constant string value "%.exe".  This translation straightforward, and the only thing worth paying attention to is that the default conversion for some data types may not work for your data source.  In this case, WDS requires dates to be in a specific format.

     

        Private Sub ExpandConstant(ByVal c As ConstantExpression)

            Dim value = c.Value

            If value.GetType() Is GetType(String) Then

                m_query.Append("'")

                m_query.Append(CStr(value))

                m_query.Append("'")

            ElseIf value.GetType() Is GetType(Date) Then

                m_query.Append("'")

                m_query.Append(CDate(value).ToString("yyyy-MM-dd"))

                m_query.Append("'")

            Else

                m_query.Append(value.ToString())

            End If

        End Sub

     

    Here ends the processing of the “Where” and the conclusion of the CreateQuery method.  We have build up the following query to pass along to WDS:

     

    "SELECT System.ItemPathDisplay FROM SystemIndex WHERE NOT CONTAINS(System.ItemType, 'folder') AND (System.DateCreated > '[value0]' AND System.FileName LIKE '%.exe')"

     

    In my next post, I’ll finish up by covering GetEnumerator and Select.

     

    Resources

     

    Full source code for this project:

    http://hresult.members.winisp.net/FileSystemQuery.zip

     

    Bart De Smet’s excellent blog on Implementing IQueryable for Linq to LDAP:

    http://community.bartdesmet.net/blogs/bart/archive/2007/04/05/the-iqueryable-tales-linq-to-ldap-part-0.aspx

     

    Fabrice Marguerie’s blog in implementing Linq to Amazon:

    http://weblogs.asp.net/fmarguerie/archive/2006/06/26/Introducing-Linq-to-Amazon.aspx

     

    Catherine Heller’s blog on Windows Desktop (Vista) Search:

    http://blogs.msdn.com/cheller/archive/2006/06/21/642220.aspx

     

    List of query attributes supported by the Windows filesystem

    http://msdn2.microsoft.com/en-us/library/aa830600.aspx

     

This Blog

Syndication

Tags

No tags have been created or used yet.

Archives


© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker