Note : I plan to post this blog on the VB team blog eventually. However, because of the length and complexity, I'm posting this up on my blog for now to hopefully get some eyes on it so that I won't make a fool of myself on the team blog :) Please feel free to contact me if you think something is wrong, misleading, or could be improved on. Thanks!

 

A couple months ago, I had a chance to do a presentation at the New England Code Camp #8 where I talked about LINQ under the hood. This blog post is an attempt at reconstructing the presentation as best as I can.

Disclaimer: This is an advanced post on LINQ, the new VB9 features, and how the compiler does its magic to make LINQ work.

One day, I may try to record a screen cast of this as well if I manage to find time (and if people will find it helpful).

Overview

The goal for us today is to understand what exactly happens when the compiler sees a LINQ query that is like this:

Dim q = From c In customers Where c.ID Mod 2 = 0 Select c.ID, c.Name

In order to understand what happens, we're going to start with exploring the world as it existed in VB 2005.

We will attempt to create a doubly linked list from first principles (that is, without the help of existing FX types that were introduced in VB 2005, such as List(Of T)), because building this from scratch will help us understand the concepts at a deeper level.

Creating a doubly linked list

Let's try to create the doubly linked list from the ground up, without relying on FX collection types. A doubly linked list will contain elements that have forward and backward pointers, so the following basic implementation will suffice:

Class DoublyLinkedListElement(Of T)
    Public Sub New(ByVal data As T)
        Me.m_data = data
    End Sub

    Private m_data As T
    Public ReadOnly Property Data() As T
        Get
            Return m_data
        End Get
    End Property

    Private m_nextElement As DoublyLinkedListElement(Of T)
    Public Property NextElement() As DoublyLinkedListElement(Of T)
        Get
            Return Me.m_nextElement
        End Get
        Set(ByVal value As DoublyLinkedListElement(Of T))
            Me.m_nextElement = value
        End Set
    End Property

    Private m_prevElement As DoublyLinkedListElement(Of T)
    Public Property PrevElement() As DoublyLinkedListElement(Of T)
        Get
            Return m_prevElement
        End Get
        Set(ByVal value As DoublyLinkedListElement(Of T))
            m_prevElement = value
        End Set
    End Property
End Class

Class DoublyLinkedList(Of T)
    Private m_Head As DoublyLinkedListElement(Of T)
    Private m_Tail As DoublyLinkedListElement(Of T)

    Public Sub InsertFirst(ByVal data As T)
        Dim ele = New DoublyLinkedListElement(Of T)(data)
        ele.NextElement = m_Head

        If m_Head IsNot Nothing Then
            m_Head.PrevElement = ele
            m_Head = ele
        Else
            m_Head = ele
            m_Tail = ele
        End If
    End Sub
    Public Sub InsertLast(ByVal data As T)
        Dim ele = New DoublyLinkedListElement(Of T)(data)
        ele.PrevElement = m_Tail

        If m_Tail IsNot Nothing Then
            m_Tail.NextElement = ele
            m_Tail = ele
        Else
            m_Head = ele
            m_Tail = ele
        End If
    End Sub
End Class

(BTW the final code will be available as a ZIP file; but the intermediate steps won't be, so you'll have to follow this post through in order to understand the concepts).

Take some time to understand what this code is doing; it's a very simply, feature-lacking doubly linked list. All that you can do to it is add to the beginning or the end of the list. Nothing more.

Modeling some data on top of the doubly linked list

The first thing that we'll try to do is to model some data on top of the doubly linked list. A typically "real world" model that maps to doubly linked lists well are trains. As a result, let's try to build some train models on top of the doubly linked list.

Here's an example:

Class TrainCar
    Public ID As Integer
    Public Color As String

    Public Overrides Function ToString() As String
        Return "ID: " & ID & vbCrLf & "Color: " & Color
    End Function
End Class

Class PassengerCar : Inherits TrainCar
    Public Count As Integer
    Public Overrides Function ToString() As String
        Return "Passenger Car" & vbCrLf & "Count: " & Count & vbCrLf & MyBase.ToString()
    End Function
End Class

Class Cargo : Inherits TrainCar
    Public Company As String
    Public Overrides Function ToString() As String
        Return "Cargo Car" & vbCrLf & "Company: " & Company & vbCrLf & MyBase.ToString()
    End Function
End Class

Class Engine : Inherits TrainCar
    Public HorsePower As Integer
    Public Overrides Function ToString() As String
        Return "Engine Car" & vbCrLf & "Horsepower: " & HorsePower & vbCrLf & MyBase.ToString()
    End Function
End Class

Class Caboose : Inherits TrainCar
    Public Overrides Function ToString() As String
        Return "Caboose" & vbCrLf & MyBase.ToString()
    End Function
End Class

Class Train : Inherits DoublyLinkedList(Of TrainCar)
End Class

Here, we are creating a small hierarchy of car types, along with a Train model that simply inherits from DoublyLinkedList and providing a specific parameter for the generic argument.

This isn't necessarily the best way to model the train domain; it simply serves as an example for this blog post. Also, the code is "demo" code; a real production model for train cars will be much more robust.

So now we have a basic working train model from which you can add cars to it, and that's about it.

Populating the trains - new language syntax!

Here's a simple function that will populate a train class with a number of train cars:

    Function PopulateTrains() As Train
        Dim t = New Train()

        t.InsertFirst(New Engine With {.ID = 1, .Color = "Red", .HorsePower = 100})
        t.InsertLast(New Engine With {.ID = 2, .Color = "Black", .HorsePower = 100})
        t.InsertLast(New Engine With {.ID = 3, .Color = "Red", .HorsePower = 75})
        t.InsertLast(New Engine With {.ID = 4, .Color = "Green", .HorsePower = 75})
        t.InsertLast(New Engine With {.ID = 5, .Color = "Red", .HorsePower = 75})
        t.InsertLast(New PassengerCar With {.ID = 6, .Color = "Red", .Count = 50})
        t.InsertLast(New PassengerCar With {.ID = 7, .Color = "Green", .Count = 51})
        t.InsertLast(New PassengerCar With {.ID = 8, .Color = "Green", .Count = 50})
        t.InsertLast(New Cargo With {.ID = 9, .Color = "Blue", .Company = "Microsoft"})
        t.InsertLast(New Cargo With {.ID = 10, .Color = "Blue", .Company = "Microsoft"})
        t.InsertLast(New Cargo With {.ID = 11, .Color = "Blue", .Company = "Microsoft"})
        t.InsertLast(New PassengerCar With {.ID = 12, .Color = "Black", .Count = 55})
        t.InsertLast(New Cargo With {.ID = 13, .Color = "Green", .Company = "Starbucks"})
        t.InsertLast(New Cargo With {.ID = 14, .Color = "Green", .Company = "Starbucks"})
        t.InsertLast(New Caboose With {.ID = 15, .Color = "Gold"})

        Return t
    End Function

The first thing to notice is the new object initializer syntax. This allows you to construct an instance of an object and initialize it's fields and properties in an "expression context" - that is, any place in the VB language that expects an expression, you can use an object initializer. In the example above, we used the object initializer as an argument to a function call.

This is different from the with statement, which works as a statement, and not as an expression.

The second thing to notice is type inference. In the PopulateTrains method, the type of t is not object, as it would be in VS2005. Rather, the compiler inferred the type from the right hand side of the assignment statement, and t is typed as a Train.

What the compiler does in this scenario is that it notices that you did not specify a type for the variable t, and so it tries its best to determine what the type is of the right hand side of the assignment, and use that type as the type of t. However, if it cannot, then it will give up and generate an error.

This is especially useful in certain places where we introduced the concept of unnamed types. You'll see this later.

This new feature is controlled by a new option : option infer. By default, it is on for new projects. You can turn it off, and you'll get VS2005 behavior.

Here are a couple more posts that talk about type inference.

Enumerating the doubly linked list

Now that we have some data, we really need more support in the doubly linked list class for this data to be useful. In the .Net Framework, all collection types can be enumerated, so let's add a custom enumerator and take advantage of the fact that Visual Basic allows you to enumerate over a type that exposes a GetEnumerator method.

Here's a simple and straightforward implementation of IEnumerator for our doubly linked list (the Visual Basic IDE automatically generated the stubs for me, so I just filled in where necessary):

Class DoublyLinkedListEnumerator(Of T) : Implements IEnumerator(Of T)
    Private m_First As DoublyLinkedListElement(Of T)
    Private m_Current As DoublyLinkedListElement(Of T)
    Private m_Started As Boolean

    Public Sub New(ByVal elem As DoublyLinkedListElement(Of T))
        m_First = elem
        m_Current = Nothing
        m_Started = False
    End Sub

    Public ReadOnly Property Current() As T Implements System.Collections.Generic.IEnumerator(Of T).Current
        Get
            Return m_Current.Data
        End Get
    End Property

    Public ReadOnly Property Current1() As Object Implements System.Collections.IEnumerator.Current
        Get
            Return m_Current.Data
        End Get
    End Property

    Public Function MoveNext() As Boolean Implements System.Collections.IEnumerator.MoveNext
        If Not m_Started Then
            m_Started = True
            m_Current = m_First

            Return If(m_Current Is Nothing, False, True)
        End If

        m_Current = m_Current.NextElement
        If m_Current IsNot Nothing Then
            Return True
        Else
            Return False
        End If
    End Function

    Public Sub Reset() Implements System.Collections.IEnumerator.Reset
        m_Current = Nothing
        m_Started = False
    End Sub

    Private disposedValue As Boolean = False        ' To detect redundant calls

    ' IDisposable
    Protected Overridable Sub Dispose(ByVal disposing As Boolean)
        If Not Me.disposedValue Then
            If disposing Then
                ' TODO: free other state (managed objects).
            End If

            ' TODO: free your own state (unmanaged objects).
            ' TODO: set large fields to null.
        End If
        Me.disposedValue = True
    End Sub

#Region " IDisposable Support "
    ' This code added by Visual Basic to correctly implement the disposable pattern.
    Public Sub Dispose() Implements IDisposable.Dispose
        ' Do not change this code.  Put cleanup code in Dispose(ByVal disposing As Boolean) above.
        Dispose(True)
        GC.SuppressFinalize(Me)
    End Sub
#End Region

End Class

Take a look at the class above; it walks the doubly linked list as expected every time you ask the enumerator for the next element.

Now, we need to add the following method to the DoublyLinkedList(Of T) class defined at the beginning of the article:

    Public Overridable Function GetEnumerator() As IEnumerator(Of T)
        Return New DoublyLinkedListEnumerator(Of T)(m_Head)
    End Function 

We now added enumeration support to the doubly linked list, so that we can iterate the items and get the items out.

Example

Ok so we have enough pieces to actually get a somewhat meaningful example. Here's some code that will create a train with a bunch of train cars, and then iterate and print them out:

Module Module1
    Sub Main()
        Dim trains = PopulateTrains()
        For Each t in trains
            Console.WriteLine(t.ToString)
        Next
    End Sub
End Module

You should see all the train cars printed to the console. So far, so good.

Searching for specific cars

Right now, we have a bunch of data in the form of train cars, and the only thing that we can do is to iterate the data and print it out. That's not very interesting, because very quickly we will want to be able to ask questions about our data.

In real life, data is really useless unless you can ask questions about it. If you had a portfolio of stocks, you want to ask questions like, "which ones are outperforming others?", or "how long have I held these stocks?". If you had a social network of friends on Facebook, you want to ask questions like, "which friends have updated their profile recently?", or "which friends have recently poked me?".

If you aren't able to ask questions and filter your data, you'll quickly be overwhelmed.

So let's start by asking a specific question: let's find all the train cars that are red.

One of the most natural ways (and probably the one that most people would think of) is to add a new method to the Train class. Object oriented programming is all about sending a message (calling a method) on an object, and finding the right place to handle these is of paramount importance. In this scenario, the Train class is the right model.

Here's some code that will do this. Notice that we take advantage of the fact that we can iterate the train cars internally:

Function FindRedTrainCars() As Train
    Dim ret = New Train
    For Each c In Me
        If c.Color = "Red" Then
            ret.InsertLast(c)
        End If
    Next
    Return ret
End Function

This does specifically what we asked for, and nothing more. It's fairly straight forward; so far, so good.

More, and more, and more...

As with all things in life, things constantly change. The problem with the prior API, FindRedTrainCars, is that it's too inflexible. You can imagine that you would end up with dozens of public APIs that must be exposed with this model. The problem with this model is that:

  • It's inflexible for change - clients cannot easily customize the behavior
  • It's expensive to test - every public API must be rigorously tested, security reviewed, scrubbed for API consistency, etc
  • It "pollutes" your design - a class with many methods that do similar things (searching) just doesn't seem like it's been well thought out
  • It's inelegant - this one is a personal preference, but where possible, I prefer for methods to be extensible through aggregation, rather than having lots of specific methods

It's the last point that's really a kicker for me. So let's see what we can do about it here.

The template method pattern

As our first stab at making the train class "feel better", let's apply the template method pattern with a twist. In the official template method pattern, you define a "template" method (which has boilerplate code) and you customize with inheritance; for us, we'll customize with aggregation rather than inheritance for a variety of reasons (for me, any time where aggregation is viable I prefer it, because I don't like inheritance unless it's needed).

The way we'll do aggregation is via delegates.

Let's modify the find method so that it employs the template method pattern.

The first order of business is to define a delegate that is suitable. We'll follow the same implementation strategy as before, so that calls for a delegate that is applied to each train car, and we'll let the delegate decide whether or not the train car fits the bill.

Here's a straight forward implementation:

Public Delegate Function FindFilter(ByVal c As TrainCar) As Boolean
Public Function Find(ByVal filter As FindFilter) As Train
    Dim ret = New Train
    For Each c In Me
        If filter(c) Then
            ret.InsertLast(c)
        End If
    Next
    Return ret
End Function

There's nothing tricky about this : we define a delegate function that filters a train car and returns true if the filter is positive, and false otherwise.

Notice that with this strategy, we've accomplished a design goal - we've deferred filtering elements to the client, and in a way that doesn't require inheritance.

Built-in delegate types

One of the small niceties we added to the .NET Framework 3.5 is the Func type. You can use the Func type as a predefined delegate type. There are several definitions that follow this pattern:

Delegate Function Func(Of R)() As R
Delegate Function Func(Of TArg1, R)(a As TArg1) As R

And so on (there are definitions for up to 5 arguments).

Thus you can write the above without having to define your delegate, FindFilter, like the following:

Public Function Find(ByVal filter As Func(Of TrainCar, Boolean)) As Train
    Dim ret = New Train
    For Each c In Me
        If filter(c) Then
            ret.InsertLast(c)
        End If
    Next
    Return ret
End Function

This is just a nice addition so that you don't have to define delegate types for the most common (less then 5 arguments) delegates.

Using the new Find method

Ok cool, so now let's see how we can make use of this new find method. Here's the typical way you would pass a function as a delegate to Find. First, we define a function called Filter which matches the signature, and then we take the address of the function when we call the Find method:

Module Module1
    Private Function Filter(ByVal c As TrainCar) As Boolean
        Return c.Color = "Red"
    End Function

    Sub Main()
        Dim trains = PopulateTrains()
        Dim results = trains.Find(AddressOf Filter)
        For Each t in results
            Console.WriteLine(t.ToString)
        Next
    End Sub
End Module

This is pretty good! We now control the filtering, and we (as the client) can do whatever filtering we want, without making requirements to the train class.

There are a couple of things that are not so cool about this though:

  1. When defining the Filter method, we have to make sure that the Filter method signature matches the delegate signature.
  2. The Filter method can be anywhere ... meaning it's not close to the usage of it. In this particular example, if the filter method was "far away" from main, you wouldn't know exactly what criteria you were using to filter the trains, unless you stuck with good naming conventions. And we know where "good" naming conventions lead ...

Ok, so can we do better?

Lambda expressions

We definitely can do better. We'll need a new feature in VB9 called lambda expressions. Think of a lambda expression as a "inline function" that contains an expression. The following is equivalent to the above:

Module Module1
    Sub Main()
        Dim trains = PopulateTrains()
        Dim results = trains.Find(Function(c) c.Color = "Red")
        For Each t in results
            Console.WriteLine(t.ToString)
        Next
    End Sub
End Module

Notice all the following cool things:

  1. You do not need to create your own private method that implements the function! The compiler essentially generates the moral equivalent of "Filter" in the above example.
  2. If you followed through by typing this in the IDE, you will notice the rocking Intellisense. When you typed "c.", the VB IDE presented to you rich Intellisense! This is because the compiler was able to infer (there's type inference again) the type of c by looking at the delegate signature this time. Since the delegate said that the type of the lambda method is a TrainCar, and you asked the compiler to find the type, we did. And you got rocking Intellisense.
  3. Locality - the filter is right there! No need to "go to definition" and find the filter method : it's all in the code!

Lambda expressions are really powerful.

But it gets better.

Let the user determine what the filter criteria is

Let's suppose that we want to allow the user to control what color to filter the trains with. If we go back to the original design with the delegates, we essentially need a way to pass data to the Filter method. Since delegates don't allow you pass data to it, there are several unpleasant workarounds:

  1. Use a global variable (boy this sucks, I didn't even want to write it down as an option)
  2. You could also use a member variable, which is marginally better than a global variable.
  3. You could wrap a class around your filter method.

The third option is the most viable, so let's see what that looks like:

Class ColorFilter
    Private m_Color As String
    Public Sub New(ByVal color As String)
        Me.m_Color = color
    End Sub
    Private Function Filter(ByVal c As TrainCar) As Boolean
        Return c.Color = m_Color
    End Function
End Class

Module Module1
    Sub Main()
        Dim trains = PopulateTrains()
        Dim filter = new ColorFilter(Console.Readline())
        Dim results = trains.Find(AddressOf filter.Filter)
        For Each t in results
            Console.WriteLine(t.ToString)
        Next
    End Sub
End Module

This just got a lot worst. We now need a private class just for filtering. If we wanted 100 filters, we'd need 100 private classes. This sucks. Not much better than just inheriting 100 classes from the Train class, either.

It would be nice if the lambda expression feature can handle this scenario, and it most definitely can.

Lambda expressions part two - lifting variables

Let's consider the following example, which is slightly modified from the lambda expression example above. Pay particular attention to the body of the lambda expression:

Module Module1
    Sub Main()
        Dim trains = PopulateTrains()
        Dim color = Console.Readline()
        Dim results = trains.Find(Function(c) c.Color = color)
        For Each t in results
            Console.WriteLine(t.ToString)
        Next
    End Sub
End Module

Notice that there's a subtlety here : the lambda expression is referring to a variable, color, which is not a lambda parameter, but is defined in Main. When this happens, the compiler generates code that is morally equivalent to the ColorFilter class, except we call the ColorFilter class a closure.

That is, the two examples are equivalent. The compiler provides powerful variable lifting services for you so that you can write code that refers to variables that are not in the scope of the lambda expression, and the compiler will "make it work".

You've seen the magic (we wrote the code in the ColorFilter class example!) - now you know what the compiler does.

Lambda expressions are extremely powerful : check out my Lambda expression article on MSDN magazine for more coverage.

Now, we really have a clean, good solution. I like what we have. But as usual, let's push on, and see what else we can do to make this even better.

Can we make it more general?

Everything we've done so far has been pretty cool, and I feel that we are converging on a good design. But let's continue to push further : we added a very general find method to the train class, but it feels like it's in the wrong place, since the act of finding elements is not necessarily restricted to finding train cars in trains.

We really want to add this method to the DoublyLinkedList class, so that the train class just gains the benefit of the find method, as well as any other class that uses the DoublyLinkedList class.

But what if we've already shipped the DoublyLinkedList class, and we can't modify the interface anymore? Or what if the DoublyLinkedList class belong to someone else (and we purchased the rights to the API library, for example)?

We need a way to extend the behavior of an existing class.

Extension methods

One of the most trivial ways we can "extend" the behavior of an existing class is to create a shared method that takes this class as the first parameter (sort of like the ADT way of doing things way before OOP became popular). Here's some code that will do just that:

Module DoublyLinkedListExtensions
    Public Function Find(Of TElement)(ByVal col As DoublyLinkedList(Of TElement), _
ByVal filter As Func(Of TElement, Boolean)) As DoublyLinkedList(Of TElement) Dim ret = New DoublyLinkedList(Of TElement) For Each c In col If filter(c) Then ret.InsertLast(c) End If Next Return ret End Function End Module

The implementation is pretty straight forward and is very similar to what you would do if you were to implement this function as an instance method (just like above).

You use this method like so:

Module Module1
    Sub Main()
        Dim t = PopulateTrains()
        Dim results = Find(t, Function(c) c.Color = "Red")
        For Each train in results
            Console.WriteLine(train.ToString)
        Next
    End Sub
End Module

This is how you would call the shared method. There are a couple of things with the shared method approach that are a bit lacking:

  1. We really want Find to be like an instance method on DoublyLinkedList. The shared method syntax obfuscates this.
  2. In this particular case, the type argument TElement is fixed by the element type for DoublyLinkedList, but there's no way the compiler knows this. We really want to tell the compiler that TElement should be inferred based on only the DoublyLinkedList's argument.

For example, here's what the tooltip and intellisense show:

shared_method

Notice that in this case, we have no idea what TElement is. As a result, when you are writing the lambda expression, you don't get intellisense:

shared_method_2

So while shared methods work, they don't really give the greatest user experience.

This is where extension methods come in. One of my team mates, Scott, has a series of articles on extension methods that are extremely well written and in-depth : I encourage you to check those out if you aren't too familiar with extension methods.

What I want to highlight is that extension methods have two very nice properties:

  1. You can "see" extension methods as if they were instance methods.
  2. A feature called partial type inference makes the intellisense scenarios amazing.

As above, I'll demonstrate with some samples. To make your shared method an extension method, simply supply the <Extension> attribute : and that's it! Nothing more. It's really easy to do:

Module DoublyLinkedListExtensions
    <Extension> _
    Public Function Find(Of TElement)(ByVal col As DoublyLinkedList(Of TElement), _
ByVal filter As Func(Of TElement, Boolean)) As DoublyLinkedList(Of TElement) Dim ret = New DoublyLinkedList(Of TElement) For Each c In col If filter(c) Then ret.InsertLast(c) End If Next Return ret End Function End Module

Be sure to import the System.Runtime.CompilerServices namespace.

Now, you can write code that looks like this:

Module Module1
    Sub Main()
        Dim t = PopulateTrains()
        Dim results = t.Find(Function(c) c.Color = "Red")
        For Each t in results
            Console.WriteLine(t.ToString)
        Next
    End Sub
End Module

As you can see, you now call the Find method as if it were an instance method : in a sense, we've "extended" the type DoublyLinkedList with a new method called Find.

But the power really comes from the user experience. Check out the intellisense:

extension_method

Check that out! The partial type inference feature essentially performs type inference on TElement based on the "instance" type : in this case, it decided that TElement is a TrainCar, and now, you've got better tooltips!

And not only that, when you write your lambda expression, we now know what the lambda argument is, so you get intellisense there as well:

extension_method_2

Look at that! We now know that x is a TrainCar, so you get intellisense there!

This is the power of extension methods. Again, please check out Scott's articles for more details.

Projections

To hammer home the idea of extension methods, I want to define another extension method as an example. This method transforms a DoublyLinkedList of type T to a DoublyLinkedList of type R, like so:

<Extension()> _
Public Function Transform(Of TSource, TResult)(ByVal col As DoubleLinkedList(Of TSource), _
ByVal Projection As Func(Of TSource, TResult)) As DoublyLinkedList(Of TResult) Dim ret = New DoublyLinkedList(Of TResult) For Each c In col ret.InsertLast(Projection(c)) Next Return ret End Function

This is a simple extension method that just converts elements in a list. Check it out yourself and make sure you understand the concepts, and try to call it and see what results you get.

Where are we?

So now we have a very good design. We can choose to use instance methods or extension methods to provide a template method pattern of access for our clients, and our clients can write clear, concise code to use our collection.

But we can do more. I have not written at all about LINQ yet, and we're now going to see how LINQ works.

Using LINQ on our DoublyLinkedList class

The wishful thinking part of me wishes that the following code would compile, but it does not:

Module Module1
    Sub Main()
        Dim trains = PopulateTrains()
        Dim q = From t In trains _
                Where t.Color = "Red" _
                Select t
    End Sub
End Module

We really want to make this work, because this would be the ultimate syntax for what we want. It would allow us to ask information about our train classes (and any class that inherits from our doublylinkedlist class) through the new LINQ operators.

Are you ready to see this work? Simply change the method "Find" to "Where" and "Transform" to "Select":

Module DoublyLinkedListExtensions
    <Extension()> _
    Public Function Where(Of TElement)(ByVal col As DoublyLinkedList(Of TElement), _
ByVal filter As Func(Of TElement, Boolean)) As DoublyLinkedList(Of TElement) Dim ret = New DoublyLinkedList(Of TElement) For Each c In col If filter(c) Then ret.InsertLast(c) End If Next Return ret End Function <Extension()> _ Public Function [Select](Of TSource, TResult)(ByVal col As DoublyLinkedList(Of TSource), _
ByVal Projection As Func(Of TSource, TResult)) As DoublyLinkedList(Of TResult) Dim ret = New DoublyLinkedList(Of TResult) For Each c In col ret.InsertLast(Projection(c)) Next Return ret End Function End Module

And the code compiles! We can use LINQ over our DoublyLinkedList!

Wrapping up

Whew. I had originally hoped to have this post done a long time ago. And the initial draft ideas had this post at about a quarter of the length. If you're still reading, I hope you've gotten a very good idea of what exactly the compiler does when it sees a LINQ statement.

After all, the whole point of this post is to understand what magic the compiler performs, in order for you to have the knowledge and know-how of how LINQ works.

To summarize in reverse order:

  1. We've seen how the compiler takes a LINQ query and decomposes it into method calls (extension methods or instance methods).
  2. We've seen how the compiler takes the expressions in a LINQ query and builds lambda expressions out of them.
  3. We've seen how the compiler takes lambda expressions and creates closure classes and methods that have the expression body, and replaces the call site with the address of the compiler generated method.
  4. We've seen that the pattern LINQ employs is a template pattern with customization via delegates, and how that is a better way to write code then customization via inheritance.

I hope that you can move forward and backwards : that is, that not only do you understand how to decompose a LINQ query into its parts (for those rare cases where you need to call a method that's not available in query format), but you also understand how to take advantage of LINQ in your legacy applications, by following a sequence of refactorings similar to the one I did in this article.

And there's still much more to say about LINQ:

  • Anonymous types
  • Deferred execution
  • Expression trees
  • Other complex operators, like groupby and join

I'll save these for a post in the future.

As always, if you have questions, feel free to email me or leave a comment!

Thanks, and I hope you enjoy Visual Basic 2008! On the product team, we love this product, we love LINQ, and we hope that you find it as enjoyable, productive, and useful as we do.

Tim

 

Technorati Tags: , ,