Welcome to MSDN Blogs Sign in | Join | Help

Hey there! In this post, I'll continue describing some of the things that are interesting about the VB compiler, especially related to expression trees and the consumption of expression trees in your LINQ provider.

Again, this may not be too interesting if you aren't writing a LINQ provider, but I hope you read on anyway because I'll be discussing some of the subtle features of the language. I'm going to diverge far and wide, and I apologize up front if some of this is esoteric, but I want to give the appropriate background information.

Coalesce

One of the new operators introduced in VB is the coalesce operator. The coalesce operator looks like the ternary operator:

REM Ternary operator
if(a, b, c)

REM Coalesce operator
if(a, b)

The coalesce operator allows you to return the value "a" if "a" is not nothing, and if "a" is nothing, then it will return the value "b". We typically use this operator if we don't know whether "a" has a value or not, and want to use a "default value" in the case where "a" has no value.

For example:

Dim b = GetCityFromDatabase()
Dim city = if(b, "Seattle")

In this case, we get a city from a database using some API, and if there is no city in the database (the database contains a NULLABLE column and no value was stored there), then we'll use the default text "Seattle" as the city.

Ok, cool - but what does this have to do with expression trees?

The important thing to remember is this: in C#, the compiler generates the Coalesce operator node if the user explicit uses the coalesce operator (the ?? operator in C#). In VB, the compiler will generate a Coalesce operator node in some cases even if the user did not explicitly use the coalesce operator (if).

This means that even if you don't plan to support the coalesce operator, you still need to support it for the best VB experience possible.

Nullables

In order to explore this in the appropriate amount of detail, I have to diverge into VB's nullable semantics. In VB, the language design team decided to implement three-value nullable semantics rather than two-value nullable semantics (like C#).

What's the difference? Here's a table for VB for x OP y, where OP is any relational operator (<, <=, =, >=, >), and the type of x and y are not boolean (VB has special semantics for these operators if x and y are boolean).

x y RESULT
Not Nothing Not Nothing True/False
Not Nothing Nothing Nothing
Nothing Not Nothing Nothing
Nothing Nothing Nothing

In essence, if any of the operands are Nothing, then the result is nothing. In C#, you always get a true or false answer.

Let's just do a quick exercise:

dim x as integer = 10
dim y as integer = 12
dim z as integer? = nothing
dim u as integer? = nothing

dim b1 = x < y
dim b2 = x < z
dim b3 = z < x
dim b4 = z < u

Here are the results:

b1 = true
b2 = nothing
b3 = nothing
b4 = nothing

What falls out of this is that the type returned by the logical operators is Boolean?, and not Boolean.

Conversions

In addition, a conversion from Boolean? to Boolean is a narrowing conversion. That is, if the Boolean? value is Nothing, converting it to Boolean will throw an exception.

So what are the implications?

If you have a method that takes a Boolean, and you pass it a Boolean? argument:

sub bar(b as boolean)
end sub

bar(x < z)

Then if you have option strict on, you will get a compiler error. But if option strict is off, the compiler, as usual, applies narrowing conversions implicitly. In this particular case, you'll get a runtime exception.

This may or may not surprise you; but it falls out logically from the facts discussed above. Keep this in mind, because you're going to see this pop up again later when we discuss queries.

Conditional context

Ok, here's a quick quiz. What happens in the following code, with option strict on/off?

if x < z then
end if

If you said that this code is an error when option strict is on and throws an exception when option strict is off (just like how it does in the above example with the method call to "bar"), you would be correct given what I've told you.

However, VB has a special mode called the conditional context mode. In certain cases, the compiler tries really hard to get a boolean out of an expression. We may do things like look for a CBool cast, or look for an IsTrue operator on your type. Or we may apply the coalesce operator if the expression is a nullable-type result (I knew I would get back to the coalesce operator eventually :)

And yes, you guessed correctly, the if statement is one of those places where we do this special work.

In the above example, the compiler generates code that's equivalent to the following:

if if(x < z, false) then
end if

This almost looks like a syntax error! But what you are really doing is applying the coalesce operator to the expression x < z, and if that expression is nothing, then use the RHS (that is, false) as the result.

I hope everything so far has been clear. The compiler in spirit applies a coalesce operator in order to remove the potential nothing-ness from an expression safely, without the throwing a runtime error.

Queries

Another place where we apply the conditional context mode is in queries. In particular, the where clause has this mode applied. For example:

dim q = from c in customers where x < z select c

This case is similar to the if; the compiler generates something like the coalesce operator in order to try to get a boolean result out of x < z, and so, this code does not throw an exception.

As a result, when the above where clause gets converted to an expression tree, we will generate a coalesce operator node to get the same semantics as in the non-expression tree case (I knew we were talking about expression trees!)

Aside: We recognize that generating a coalesce is inefficient for LINQ to SQL because that translates to a CASE statement in SQL - essentially, this means that every row in the table needs to be run through the query : therefore, no indexes are used, and the performance of such a query will be very slow.

For the relational operators, we can optimize the coalesce away by using a different form of the relational operator, but this won't work in scenarios that don't have a relational operator (such as a scalar Boolean? or cases where you funcletize a function that returns Boolean?). We decided that being consistent and providing the full VB semantic understanding was important, because there are more scenarios for expression trees then just LINQ to SQL.

Update: I held off posting this blog long enough to report that the LINQ to SQL team fixed their query analyzer for VB and will now recognize this pattern and generate optimal SQL so that you get full index search speeds :) This fix should be available in an update coming later this year.

The query scenario is exactly the scenario I described at the beginning of the article. This means that if you write a LINQ provider, you will almost invariably have to handle the coalesce node. This is because if your source is a database, you'll almost always have nullable columns in a database, meaning we will generate a tree with the coalesce node.

Extension methods

Before I close, I want to examine the case where you call the where operator in extension method format, rather than query format:

dim q = customers.where(function(c) x < z)

Here's the final quiz: will this throw an exception when option strict is off? If you said yes, you totally rock :) Method calls, as I described above, do not fall under the conditional context group, and thus, this is as unsafe as before, when we called bar(x < z).

In this case, the expression tree that you get will not have a coalesce operator node, and thus, you'll get the unsafe, lifted version of the < operator, which results in a Boolean? result.

How do you get the same semantics in the extension method version? Use the coalesce operator:

dim q = customers.where(function(c) if(x < z, false))

Now you know why we chose to generate the coalesce operator in the case of the where query operator: to make the code as consistent as if you, the user, wanted to apply the safe conversion from Boolean? to Boolean, mapping Nothing to False.

That is, the following query and extension method call is equivalent:

dim q = from c in customers where c.id < 10) select c
dim q = customers.where(function(c) if(c.id < 10)).select(function(c) c)

Summary

I think I said a lot more about this subject then I intended to. I hope this all makes sense; there's definitely a lot of subtle things when you start to look at how various language features interact together.

The bottom line is that a good LINQ provider should handle the coalesce operator, and in the VB case, it's very important to handle the coalesce operator.

As always, if you have questions, feel free to send me email.

Consider the following code fragment:

Module Module1
Sub Foo(Of T)(ByVal x As T)
If x Is Nothing Then

End If
End Sub
Sub Main()
Foo(10)
End Sub
End Module

The first thing you might ask is : "hey, how come the compiler allows me to check for Nothing? Isn't that a reference comparison? There is no constraint on T, so why does the compiler allow you to do this?" The simple answer is : it makes your life easier. Imagine that the compiler did not allow this : you now have to write two versions of every generic method : one that assumes the parameter can be Nothing and thus must make a null-check, and one that assumes the parameter must have a value. That really sucks.

Ok, but what code does the compiler generate? Here's the bit of IL code that's interesting:

IL_0001:  ldarg.0
IL_0002: box !!T
IL_0007: ldnull
IL_0008: ceq

Ah ha! The compiler treats x as a value type, and emits a box instruction. The it emits code that checks against null : since we've now boxed x, this pushes two reference types onto the stack for ceq to operate on.

One way therefore to think about this, is that when a valuetype is passed to a method that takes a generic argument which is not constrained, the compiler will emit box instructions to make them reference types. Indeed, if you take a look at the number of managed objects, you'll definitely see that when you call Foo with a value type, an extra object is created on the heap (to see this, use the !dumpheap instruction in Visual Studio with SOS loaded - check this for more information). But when you call Foo with a reference type, no object is created on the heap.

Why does this happen? Let's take a look at how the CLR jits the above IL.

Here's the retail x86:

00000000  push        edi  
00000001 push esi
00000002 mov edi,ecx
00000004 cmp dword ptr ds:[007795E0h],0
0000000b je 00000012
0000000d call 79826E47
00000012 mov ecx,79102290h
00000017 call FFE60B1C
0000001c mov esi,eax
0000001e mov dword ptr [esi+4],edi
00000021 test esi,esi
00000023 jne 0000003B

There, definitely, the first call is constructing the boxed value, and the second call is calling the managed helper to test object equivalence.

Here's the version of retail x86 that's generated when Foo is called with a reference type:

00000000  push        esi  
00000001 mov esi,ecx
00000003 cmp dword ptr ds:[007795E0h],0
0000000a je 00000011
0000000c call 79826D47
00000011 test esi,esi
00000013 jne 0000001C

See, no extra call to construct an object on the heap.

Next time, I hope to post about some of the changes we are considering making for the next release.

Tim

There has been quite a few discussions on immutability lately. Patrick has written an excellent blog post summarizing some of the various thoughts on immutability along with a cool feature in NDepend to do immutable queries. The only thing that I would add is that immutability is definitely something to consider early on in the design process. It's much easier to start with a design that promotes immutability, rather then to try to graft immutability into an inherently imperative design.

Next time I will post on a specific bug that we had in the VB compiler that would have been prevented if the VB compiler implementation was designed to be more immutable then it is today.

Technorati Tags: ,

Last time, I showed how you can use the Cast() extension method defined on IEnumerable/IEnumerable(Of T) to do an element conversion for a sequence that implements one of the enumerable interfaces.

Today I want to talk about the ToDictionary extension method. This extension method lets you take a sequence that implements IEnumerable(Of T) and build a dictionary out of the elements of the sequence, allowing you to specify the key for each element. The key is the hash for the dictionary.

For example, consider the following code:

Class Car
    Public id As Integer
    Public manufacturer As String
    Public color As String
End Class

Function GetCars() As List(Of Car)

    Dim ret = New List(Of Car)
    ret.Add(New Car With {.id = 1, .manufacturer = "Toyota", .color = "Red"})
    ret.Add(New Car With {.id = 2, .manufacturer = "Honda", .color = "Red"})

    Return ret
End Function

Sub Main()
    Dim cars = GetCars()
    Dim dictionary = cars.ToDictionary(Function(c) c.id)
End Sub

This simple example builds a list of cars. We then get this list in Main(), and assign it to the variable "cars". Then we use type inference and assign to dictionary the value returned from the ToDictionary() extension method. This method returns a IDictionary(Of K, V), and in this case, the type argument V is inferred to be Car, and the type argument K is inferred based on the result of the lambda expression. Here, since we're returning the ID, it's going to be an integer.

Thus the final type of dictionary is IDictionary(Of Integer, Car). Neat eh? Type inference is really awesome because it allows you to "just let the compiler do its work".

You can then use dictionary like you would use any other IDictionary: you can retrieve elements using the () accessor, etc, etc. Play around with it, and see how powerful this extension method is.

Aside: how does the ToDictionary extension method work?

The ToDictionary method creates a new Dictionary(Of K, V) and adds elements to the dictionary, using the lambda expression you supplied to get the value of the key. Since the compiler has resolved all the type parameters at the call site (using both the hints from IEnumerable(Of T) and the lambda expression), you do not need to specify any type arguments.

Since your lambda expression is used to get the key value, this implies that your lambda expression should return values that are "good hashes".

 

Technorati Tags: , ,

If you take a look at the set of extension methods offered on IEnumerable (and IEnumerable(Of T)), you'll find that there are plenty of very interesting extension methods. One of them is the Cast() extension method.

What does this extension method do? It provides a way to convert an IEnumerable(Of T) to an IEnumerable(Of U). Even more conveniently, it allows you to convert an IEnumerable to an IEnumerable(Of T).

Many pre-VS2005 frameworks do not take advantage of generic collection types (since they weren't invented then). As a result, if you have the following code, it may not do what you expect:

Dim s = New ArrayList()
s.Add("String")
s.Add("two")
For Each c In s
    Console.WriteLine(c.ToUpper)
Next

If you compile this code with option strict on, you'll get an error saying that latebinding is not supported. Indeed, if you hover over the variable c, you'll notice that the type is object. Huh? I thought VB supported type inference?

VB in fact does support type inference, but because an ArrayList is essentially an IEnumerable of type object, the compiler does not know what element type to infer. Therefore, it infers object, since you can think of ArrayList as isomorphic to IEnumerable(Of Object).

You can tell the compiler that you really have an IEnumerable(Of T) by applying the Cast extension method like so:

Dim s = New ArrayList()
s.Add("String")
s.Add("two")
For Each c In s.Cast(Of String)
    Console.WriteLine(c.ToUpper)
Next

And now, the compiler infers c as String, and you get your method, intellisense, etc. Sweet!

Be warned though : the elements of ArrayList better be strings! Or else Cast will throw an exception.

 

Technorati Tags: , ,

Note : I plan to post this blog on the VB team blog eventually. However, because of the length and complexity, I'm posting this up on my blog for now to hopefully get some eyes on it so that I won't make a fool of myself on the team blog :) Please feel free to contact me if you think something is wrong, misleading, or could be improved on. Thanks!

 

A couple months ago, I had a chance to do a presentation at the New England Code Camp #8 where I talked about LINQ under the hood. This blog post is an attempt at reconstructing the presentation as best as I can.

Disclaimer: This is an advanced post on LINQ, the new VB9 features, and how the compiler does its magic to make LINQ work.

One day, I may try to record a screen cast of this as well if I manage to find time (and if people will find it helpful).

Overview

The goal for us today is to understand what exactly happens when the compiler sees a LINQ query that is like this:

Dim q = From c In customers Where c.ID Mod 2 = 0 Select c.ID, c.Name

In order to understand what happens, we're going to start with exploring the world as it existed in VB 2005.

We will attempt to create a doubly linked list from first principles (that is, without the help of existing FX types that were introduced in VB 2005, such as List(Of T)), because building this from scratch will help us understand the concepts at a deeper level.

Creating a doubly linked list

Let's try to create the doubly linked list from the ground up, without relying on FX collection types. A doubly linked list will contain elements that have forward and backward pointers, so the following basic implementation will suffice:

Class DoublyLinkedListElement(Of T)
    Public Sub New(ByVal data As T)
        Me.m_data = data
    End Sub

    Private m_data As T
    Public ReadOnly Property Data() As T
        Get
            Return m_data
        End Get
    End Property

    Private m_nextElement As DoublyLinkedListElement(Of T)
    Public Property NextElement() As DoublyLinkedListElement(Of T)
        Get
            Return Me.m_nextElement
        End Get
        Set(ByVal value As DoublyLinkedListElement(Of T))
            Me.m_nextElement = value
        End Set
    End Property

    Private m_prevElement As DoublyLinkedListElement(Of T)
    Public Property PrevElement() As DoublyLinkedListElement(Of T)
        Get
            Return m_prevElement
        End Get
        Set(ByVal value As DoublyLinkedListElement(Of T))
            m_prevElement = value
        End Set
    End Property
End Class

Class DoublyLinkedList(Of T)
    Private m_Head As DoublyLinkedListElement(Of T)
    Private m_Tail As DoublyLinkedListElement(Of T)

    Public Sub InsertFirst(ByVal data As T)
        Dim ele = New DoublyLinkedListElement(Of T)(data)
        ele.NextElement = m_Head

        If m_Head IsNot Nothing Then
            m_Head.PrevElement = ele
            m_Head = ele
        Else
            m_Head = ele
            m_Tail = ele
        End If
    End Sub
    Public Sub InsertLast(ByVal data As T)
        Dim ele = New DoublyLinkedListElement(Of T)(data)
        ele.PrevElement = m_Tail

        If m_Tail IsNot Nothing Then
            m_Tail.NextElement = ele
            m_Tail = ele
        Else
            m_Head = ele
            m_Tail = ele
        End If
    End Sub
End Class

(BTW the final code will be available as a ZIP file; but the intermediate steps won't be, so you'll have to follow this post through in order to understand the concepts).

Take some time to understand what this code is doing; it's a very simply, feature-lacking doubly linked list. All that you can do to it is add to the beginning or the end of the list. Nothing more.

Modeling some data on top of the doubly linked list

The first thing that we'll try to do is to model some data on top of the doubly linked list. A typically "real world" model that maps to doubly linked lists well are trains. As a result, let's try to build some train models on top of the doubly linked list.

Here's an example:

Class TrainCar
    Public ID As Integer
    Public Color As String

    Public Overrides Function ToString() As String
        Return "ID: " & ID & vbCrLf & "Color: " & Color
    End Function
End Class

Class PassengerCar : Inherits TrainCar
    Public Count As Integer
    Public Overrides Function ToString() As String
        Return "Passenger Car" & vbCrLf & "Count: " & Count & vbCrLf & MyBase.ToString()
    End Function
End Class

Class Cargo : Inherits TrainCar
    Public Company As String
    Public Overrides Function ToString() As String
        Return "Cargo Car" & vbCrLf & "Company: " & Company & vbCrLf & MyBase.ToString()
    End Function
End Class

Class Engine : Inherits TrainCar
    Public HorsePower As Integer
    Public Overrides Function ToString() As String
        Return "Engine Car" & vbCrLf & "Horsepower: " & HorsePower & vbCrLf & MyBase.ToString()
    End Function
End Class

Class Caboose : Inherits TrainCar
    Public Overrides Function ToString() As String
        Return "Caboose" & vbCrLf & MyBase.ToString()
    End Function
End Class

Class Train : Inherits DoublyLinkedList(Of TrainCar)
End Class

Here, we are creating a small hierarchy of car types, along with a Train model that simply inherits from DoublyLinkedList and providing a specific parameter for the generic argument.

This isn't necessarily the best way to model the train domain; it simply serves as an example for this blog post. Also, the code is "demo" code; a real production model for train cars will be much more robust.

So now we have a basic working train model from which you can add cars to it, and that's about it.

Populating the trains - new language syntax!

Here's a simple function that will populate a train class with a number of train cars:

    Function PopulateTrains() As Train
        Dim t = New Train()

        t.InsertFirst(New Engine With {.ID = 1, .Color = "Red", .HorsePower = 100})
        t.InsertLast(New Engine With {.ID = 2, .Color = "Black", .HorsePower = 100})
        t.InsertLast(New Engine With {.ID = 3, .Color = "Red", .HorsePower = 75})
        t.InsertLast(New Engine With {.ID = 4, .Color = "Green", .HorsePower = 75})
        t.InsertLast(New Engine With {.ID = 5, .Color = "Red", .HorsePower = 75})
        t.InsertLast(New PassengerCar With {.ID = 6, .Color = "Red", .Count = 50})
        t.InsertLast(New PassengerCar With {.ID = 7, .Color = "Green", .Count = 51})
        t.InsertLast(New PassengerCar With {.ID = 8, .Color = "Green", .Count = 50})
        t.InsertLast(New Cargo With {.ID = 9, .Color = "Blue", .Company = "Microsoft"})
        t.InsertLast(New Cargo With {.ID = 10, .Color = "Blue", .Company = "Microsoft"})
        t.InsertLast(New Cargo With {.ID = 11, .Color = "Blue", .Company = "Microsoft"})
        t.InsertLast(New PassengerCar With {.ID = 12, .Color = "Black", .Count = 55})
        t.InsertLast(New Cargo With {.ID = 13, .Color = "Green", .Company = "Starbucks"})
        t.InsertLast(New Cargo With {.ID = 14, .Color = "Green", .Company = "Starbucks"})
        t.InsertLast(New Caboose With {.ID = 15, .Color = "Gold"})

        Return t
    End Function

The first thing to notice is the new object initializer syntax. This allows you to construct an instance of an object and initialize it's fields and properties in an "expression context" - that is, any place in the VB language that expects an expression, you can use an object initializer. In the example above, we used the object initializer as an argument to a function call.

This is different from the with statement, which works as a statement, and not as an expression.

The second thing to notice is type inference. In the PopulateTrains method, the type of t is not object, as it would be in VS2005. Rather, the compiler inferred the type from the right hand side of the assignment statement, and t is typed as a Train.

What the compiler does in this scenario is that it notices that you did not specify a type for the variable t, and so it tries its best to determine what the type is of the right hand side of the assignment, and use that type as the type of t. However, if it cannot, then it will give up and generate an error.

This is especially useful in certain places where we introduced the concept of unnamed types. You'll see this later.

This new feature is controlled by a new option : option infer. By default, it is on for new projects. You can turn it off, and you'll get VS2005 behavior.

Here are a couple more posts that talk about type inference.

Enumerating the doubly linked list

Now that we have some data, we really need more support in the doubly linked list class for this data to be useful. In the .Net Framework, all collection types can be enumerated, so let's add a custom enumerator and take advantage of the fact that Visual Basic allows you to enumerate over a type that exposes a GetEnumerator method.

Here's a simple and straightforward implementation of IEnumerator for our doubly linked list (the Visual Basic IDE automatically generated the stubs for me, so I just filled in where necessary):

Class DoublyLinkedListEnumerator(Of T) : Implements IEnumerator(Of T)
    Private m_First As DoublyLinkedListElement(Of T)
    Private m_Current As DoublyLinkedListElement(Of T)
    Private m_Started As Boolean

    Public Sub New(ByVal elem As DoublyLinkedListElement(Of T))
        m_First = elem
        m_Current = Nothing
        m_Started = False
    End Sub

    Public ReadOnly Property Current() As T Implements System.Collections.Generic.IEnumerator(Of T).Current
        Get
            Return m_Current.Data
        End Get
    End Property

    Public ReadOnly Property Current1() As Object Implements System.Collections.IEnumerator.Current
        Get
            Return m_Current.Data
        End Get
    End Property

    Public Function MoveNext() As Boolean Implements System.Collections.IEnumerator.MoveNext
        If Not m_Started Then
            m_Started = True
            m_Current = m_First

            Return If(m_Current Is Nothing, False, True)
        End If

        m_Current = m_Current.NextElement
        If m_Current IsNot Nothing Then
            Return True
        Else
            Return False
        End If
    End Function

    Public Sub Reset() Implements System.Collections.IEnumerator.Reset
        m_Current = Nothing
        m_Started = False
    End Sub

    Private disposedValue As Boolean = False        ' To detect redundant calls

    ' IDisposable
    Protected Overridable Sub Dispose(ByVal disposing As Boolean)
        If Not Me.disposedValue Then
            If disposing Then
                ' TODO: free other state (managed objects).
            End If

            ' TODO: free your own state (unmanaged objects).
            ' TODO: set large fields to null.
        End If
        Me.disposedValue = True
    End Sub

#Region " IDisposable Support "
    ' This code added by Visual Basic to correctly implement the disposable pattern.
    Public Sub Dispose() Implements IDisposable.Dispose
        ' Do not change this code.  Put cleanup code in Dispose(ByVal disposing As Boolean) above.
        Dispose(True)
        GC.SuppressFinalize(Me)
    End Sub
#End Region

End Class

Take a look at the class above; it walks the doubly linked list as expected every time you ask the enumerator for the next element.

Now, we need to add the following method to the DoublyLinkedList(Of T) class defined at the beginning of the article:

    Public Overridable Function GetEnumerator() As IEnumerator(Of T)
        Return New DoublyLinkedListEnumerator(Of T)(m_Head)
    End Function 

We now added enumeration support to the doubly linked list, so that we can iterate the items and get the items out.

Example

Ok so we have enough pieces to actually get a somewhat meaningful example. Here's some code that will create a train with a bunch of train cars, and then iterate and print them out:

Module Module1
    Sub Main()
        Dim trains = PopulateTrains()
        For Each t in trains
            Console.WriteLine(t.ToString)
        Next
    End Sub
End Module

You should see all the train cars printed to the console. So far, so good.

Searching for specific cars

Right now, we have a bunch of data in the form of train cars, and the only thing that we can do is to iterate the data and print it out. That's not very interesting, because very quickly we will want to be able to ask questions about our data.

In real life, data is really useless unless you can ask questions about it. If you had a portfolio of stocks, you want to ask questions like, "which ones are outperforming others?", or "how long have I held these stocks?". If you had a social network of friends on Facebook, you want to ask questions like, "which friends have updated their profile recently?", or "which friends have recently poked me?".

If you aren't able to ask questions and filter your data, you'll quickly be overwhelmed.

So let's start by asking a specific question: let's find all the train cars that are red.

One of the most natural ways (and probably the one that most people would think of) is to add a new method to the Train class. Object oriented programming is all about sending a message (calling a method) on an object, and finding the right place to handle these is of paramount importance. In this scenario, the Train class is the right model.

Here's some code that will do this. Notice that we take advantage of the fact that we can iterate the train cars internally:

Function FindRedTrainCars() As Train
    Dim ret = New Train
    For Each c In Me
        If c.Color = "Red" Then
            ret.InsertLast(c)
        End If
    Next
    Return ret
End Function

This does specifically what we asked for, and nothing more. It's fairly straight forward; so far, so good.

More, and more, and more...

As with all things in life, things constantly change. The problem with the prior API, FindRedTrainCars, is that it's too inflexible. You can imagine that you would end up with dozens of public APIs that must be exposed with this model. The problem with this model is that:

  • It's inflexible for change - clients cannot easily customize the behavior
  • It's expensive to test - every public API must be rigorously tested, security reviewed, scrubbed for API consistency, etc
  • It "pollutes" your design - a class with many methods that do similar things (searching) just doesn't seem like it's been well thought out
  • It's inelegant - this one is a personal preference, but where possible, I prefer for methods to be extensible through aggregation, rather than having lots of specific methods

It's the last point that's really a kicker for me. So let's see what we can do about it here.

The template method pattern

As our first stab at making the train class "feel better", let's apply the template method pattern with a twist. In the official template method pattern, you define a "template" method (which has boilerplate code) and you customize with inheritance; for us, we'll customize with aggregation rather than inheritance for a variety of reasons (for me, any time where aggregation is viable I prefer it, because I don't like inheritance unless it's needed).

The way we'll do aggregation is via delegates.

Let's modify the find method so that it employs the template method pattern.

The first order of business is to define a delegate that is suitable. We'll follow the same implementation strategy as before, so that calls for a delegate that is applied to each train car, and we'll let the delegate decide whether or not the train car fits the bill.

Here's a straight forward implementation:

Public Delegate Function FindFilter(ByVal c As TrainCar) As Boolean
Public Function Find(ByVal filter As FindFilter) As Train
    Dim ret = New Train
    For Each c In Me
        If filter(c) Then
            ret.InsertLast(c)
        End If
    Next
    Return ret
End Function

There's nothing tricky about this : we define a delegate function that filters a train car and returns true if the filter is positive, and false otherwise.

Notice that with this strategy, we've accomplished a design goal - we've deferred filtering elements to the client, and in a way that doesn't require inheritance.

Built-in delegate types

One of the small niceties we added to the .NET Framework 3.5 is the Func type. You can use the Func type as a predefined delegate type. There are several definitions that follow this pattern:

Delegate Function Func(Of R)() As R
Delegate Function Func(Of TArg1, R)(a As TArg1) As R

And so on (there are definitions for up to 5 arguments).

Thus you can write the above without having to define your delegate, FindFilter, like the following:

Public Function Find(ByVal filter As Func(Of TrainCar, Boolean)) As Train
    Dim ret = New Train
    For Each c In Me
        If filter(c) Then
            ret.InsertLast(c)
        End If
    Next
    Return ret
End Function

This is just a nice addition so that you don't have to define delegate types for the most common (less then 5 arguments) delegates.

Using the new Find method

Ok cool, so now let's see how we can make use of this new find method. Here's the typical way you would pass a function as a delegate to Find. First, we define a function called Filter which matches the signature, and then we take the address of the function when we call the Find method:

Module Module1
    Private Function Filter(ByVal c As TrainCar) As Boolean
        Return c.Color = "Red"
    End Function

    Sub Main()
        Dim trains = PopulateTrains()
        Dim results = trains.Find(AddressOf Filter)
        For Each t in results
            Console.WriteLine(t.ToString)
        Next
    End Sub
End Module

This is pretty good! We now control the filtering, and we (as the client) can do whatever filtering we want, without making requirements to the train class.

There are a couple of things that are not so cool about this though:

  1. When defining the Filter method, we have to make sure that the Filter method signature matches the delegate signature.
  2. The Filter method can be anywhere ... meaning it's not close to the usage of it. In this particular example, if the filter method was "far away" from main, you wouldn't know exactly what criteria you were using to filter the trains, unless you stuck with good naming conventions. And we know where "good" naming conventions lead ...

Ok, so can we do better?

Lambda expressions

We definitely can do better. We'll need a new feature in VB9 called lambda expressions. Think of a lambda expression as a "inline function" that contains an expression. The following is equivalent to the above:

Module Module1
    Sub Main()
        Dim trains = PopulateTrains()
        Dim results = trains.Find(Function(c) c.Color = "Red")
        For Each t in results
            Console.WriteLine(t.ToString)
        Next
    End Sub
End Module

Notice all the following cool things:

  1. You do not need to create your own private method that implements the function! The compiler essentially generates the moral equivalent of "Filter" in the above example.
  2. If you followed through by typing this in the IDE, you will notice the rocking Intellisense. When you typed "c.", the VB IDE presented to you rich Intellisense! This is because the compiler was able to infer (there's type inference again) the type of c by looking at the delegate signature this time. Since the delegate said that the type of the lambda method is a TrainCar, and you asked the compiler to find the type, we did. And you got rocking Intellisense.
  3. Locality - the filter is right there! No need to "go to definition" and find the filter method : it's all in the code!

Lambda expressions are really powerful.

But it gets better.

Let the user determine what the filter criteria is

Let's suppose that we want to allow the user to control what color to filter the trains with. If we go back to the original design with the delegates, we essentially need a way to pass data to the Filter method. Since delegates don't allow you pass data to it, there are several unpleasant workarounds:

  1. Use a global variable (boy this sucks, I didn't even want to write it down as an option)
  2. You could also use a member variable, which is marginally better than a global variable.
  3. You could wrap a class around your filter method.

The third option is the most viable, so let's see what that looks like:

Class ColorFilter
    Private m_Color As String
    Public Sub New(ByVal color As String)
        Me.m_Color = color
    End Sub
    Private Function Filter(ByVal c As TrainCar) As Boolean
        Return c.Color = m_Color
    End Function
End Class

Module Module1
    Sub Main()
        Dim trains = PopulateTrains()
        Dim filter = new ColorFilter(Console.Readline())
        Dim results = trains.Find(AddressOf filter.Filter)
        For Each t in results
            Console.WriteLine(t.ToString)
        Next
    End Sub
End Module

This just got a lot worst. We now need a private class just for filtering. If we wanted 100 filters, we'd need 100 private classes. This sucks. Not much better than just inheriting 100 classes from the Train class, either.

It would be nice if the lambda expression feature can handle this scenario, and it most definitely can.

Lambda expressions part two - lifting variables

Let's consider the following example, which is slightly modified from the lambda expression example above. Pay particular attention to the body of the lambda expression:

Module Module1
    Sub Main()
        Dim trains = PopulateTrains()
        Dim color = Console.Readline()
        Dim results = trains.Find(Function(c) c.Color = color)
        For Each t in results
            Console.WriteLine(t.ToString)
        Next
    End Sub
End Module

Notice that there's a subtlety here : the lambda expression is referring to a variable, color, which is not a lambda parameter, but is defined in Main. When this happens, the compiler generates code that is morally equivalent to the ColorFilter class, except we call the ColorFilter class a closure.

That is, the two examples are equivalent. The compiler provides powerful variable lifting services for you so that you can write code that refers to variables that are not in the scope of the lambda expression, and the compiler will "make it work".

You've seen the magic (we wrote the code in the ColorFilter class example!) - now you know what the compiler does.

Lambda expressions are extremely powerful : check out my Lambda expression article on MSDN magazine for more coverage.

Now, we really have a clean, good solution. I like what we have. But as usual, let's push on, and see what else we can do to make this even better.

Can we make it more general?

Everything we've done so far has been pretty cool, and I feel that we are converging on a good design. But let's continue to push further : we added a very general find method to the train class, but it feels like it's in the wrong place, since the act of finding elements is not necessarily restricted to finding train cars in trains.

We really want to add this method to the DoublyLinkedList class, so that the train class just gains the benefit of the find method, as well as any other class that uses the DoublyLinkedList class.

But what if we've already shipped the DoublyLinkedList class, and we can't modify the interface anymore? Or what if the DoublyLinkedList class belong to someone else (and we purchased the rights to the API library, for example)?

We need a way to extend the behavior of an existing class.

Extension methods

One of the most trivial ways we can "extend" the behavior of an existing class is to create a shared method that takes this class as the first parameter (sort of like the ADT way of doing things way before OOP became popular). Here's some code that will do just that:

Module DoublyLinkedListExtensions
    Public Function Find(Of TElement)(ByVal col As DoublyLinkedList(Of TElement), _
ByVal filter As Func(Of TElement, Boolean)) As DoublyLinkedList(Of TElement) Dim ret = New DoublyLinkedList(Of TElement) For Each c In col If filter(c) Then ret.InsertLast(c) End If Next Return ret End Function End Module

The implementation is pretty straight forward and is very similar to what you would do if you were to implement this function as an instance method (just like above).

You use this method like so:

Module Module1
    Sub Main()
        Dim t = PopulateTrains()
        Dim results = Find(t, Function(c) c.Color = "Red")
        For Each train in results
            Console.WriteLine(train.ToString)
        Next
    End Sub
End Module

This is how you would call the shared method. There are a couple of things with the shared method approach that are a bit lacking:

  1. We really want Find to be like an instance method on DoublyLinkedList. The shared method syntax obfuscates this.
  2. In this particular case, the type argument TElement is fixed by the element type for DoublyLinkedList, but there's no way the compiler knows this. We really want to tell the compiler that TElement should be inferred based on only the DoublyLinkedList's argument.

For example, here's what the tooltip and intellisense show:

shared_method

Notice that in this case, we have no idea what TElement is. As a result, when you are writing the lambda expression, you don't get intellisense:

shared_method_2

So while shared methods work, they don't really give the greatest user experience.

This is where extension methods come in. One of my team mates, Scott, has a series of articles on extension methods that are extremely well written and in-depth : I encourage you to check those out if you aren't too familiar with extension methods.

What I want to highlight is that extension methods have two very nice properties:

  1. You can "see" extension methods as if they were instance methods.
  2. A feature called partial type inference makes the intellisense scenarios amazing.

As above, I'll demonstrate with some samples. To make your shared method an extension method, simply supply the <Extension> attribute : and that's it! Nothing more. It's really easy to do:

Module DoublyLinkedListExtensions
    <Extension> _
    Public Function Find(Of TElement)(ByVal col As DoublyLinkedList(Of TElement), _
ByVal filter As Func(Of TElement, Boolean)) As DoublyLinkedList(Of TElement) Dim ret = New DoublyLinkedList(Of TElement) For Each c In col If filter(c) Then ret.InsertLast(c) End If Next Return ret End Function End Module

Be sure to import the System.Runtime.CompilerServices namespace.

Now, you can write code that looks like this:

Module Module1
    Sub Main()
        Dim t = PopulateTrains()
        Dim results = t.Find(Function(c) c.Color = "Red")
        For Each t in results
            Console.WriteLine(t.ToString)
        Next
    End Sub
End Module

As you can see, you now call the Find method as if it were an instance method : in a sense, we've "extended" the type DoublyLinkedList with a new method called Find.

But the power really comes from the user experience. Check out the intellisense:

extension_method

Check that out! The partial type inference feature essentially performs type inference on TElement based on the "instance" type : in this case, it decided that TElement is a TrainCar, and now, you've got better tooltips!

And not only that, when you write your lambda expression, we now know what the lambda argument is, so you get intellisense there as well:

extension_method_2

Look at that! We now know that x is a TrainCar, so you get intellisense there!

This is the power of extension methods. Again, please check out Scott's articles for more details.

Projections

To hammer home the idea of extension methods, I want to define another extension method as an example. This method transforms a DoublyLinkedList of type T to a DoublyLinkedList of type R, like so:

<Extension()> _
Public Function Transform(Of TSource, TResult)(ByVal col As DoubleLinkedList(Of TSource), _
ByVal Projection As Func(Of TSource, TResult)) As DoublyLinkedList(Of TResult) Dim ret = New DoublyLinkedList(Of TResult) For Each c In col ret.InsertLast(Projection(c)) Next Return ret End Function

This is a simple extension method that just converts elements in a list. Check it out yourself and make sure you understand the concepts, and try to call it and see what results you get.

Where are we?

So now we have a very good design. We can choose to use instance methods or extension methods to provide a template method pattern of access for our clients, and our clients can write clear, concise code to use our collection.

But we can do more. I have not written at all about LINQ yet, and we're now going to see how LINQ works.

Using LINQ on our DoublyLinkedList class

The wishful thinking part of me wishes that the following code would compile, but it does not:

Module Module1
    Sub Main()
        Dim trains = PopulateTrains()
        Dim q = From t In trains _
                Where t.Color = "Red" _
                Select t
    End Sub
End Module

We really want to make this work, because this would be the ultimate syntax for what we want. It would allow us to ask information about our train classes (and any class that inherits from our doublylinkedlist class) through the new LINQ operators.

Are you ready to see this work? Simply change the method "Find" to "Where" and "Transform" to "Select":

Module DoublyLinkedListExtensions
    <Extension()> _
    Public Function Where(Of TElement)(ByVal col As DoublyLinkedList(Of TElement), _
ByVal filter As Func(Of TElement, Boolean)) As DoublyLinkedList(Of TElement) Dim ret = New DoublyLinkedList(Of TElement) For Each c In col If filter(c) Then ret.InsertLast(c) End If Next Return ret End Function <Extension()> _ Public Function [Select](Of TSource, TResult)(ByVal col As DoublyLinkedList(Of TSource), _
ByVal Projection As Func(Of TSource, TResult)) As DoublyLinkedList(Of TResult) Dim ret = New DoublyLinkedList(Of TResult) For Each c In col ret.InsertLast(Projection(c)) Next Return ret End Function End Module

And the code compiles! We can use LINQ over our DoublyLinkedList!

Wrapping up

Whew. I had originally hoped to have this post done a long time ago. And the initial draft ideas had this post at about a quarter of the length. If you're still reading, I hope you've gotten a very good idea of what exactly the compiler does when it sees a LINQ statement.

After all, the whole point of this post is to understand what magic the compiler performs, in order for you to have the knowledge and know-how of how LINQ works.

To summarize in reverse order:

  1. We've seen how the compiler takes a LINQ query and decomposes it into method calls (extension methods or instance methods).
  2. We've seen how the compiler takes the expressions in a LINQ query and builds lambda expressions out of them.
  3. We've seen how the compiler takes lambda expressions and creates closure classes and methods that have the expression body, and replaces the call site with the address of the compiler generated method.
  4. We've seen that the pattern LINQ employs is a template pattern with customization via delegates, and how that is a better way to write code then customization via inheritance.

I hope that you can move forward and backwards : that is, that not only do you understand how to decompose a LINQ query into its parts (for those rare cases where you need to call a method that's not available in query format), but you also understand how to take advantage of LINQ in your legacy applications, by following a sequence of refactorings similar to the one I did in this article.

And there's still much more to say about LINQ:

  • Anonymous types
  • Deferred execution
  • Expression trees
  • Other complex operators, like groupby and join

I'll save these for a post in the future.

As always, if you have questions, feel free to email me or leave a comment!

Thanks, and I hope you enjoy Visual Basic 2008! On the product team, we love this product, we love LINQ, and we hope that you find it as enjoyable, productive, and useful as we do.

Tim

 

Technorati Tags: , ,

Here's another way to use the generic parameter trick to get an anonymous type as a field. The other day, I was trying to store the result of a lambda expression in a field, but the lambda was typed as Func(Of <anonymoustype>) and since there is no type inference for fields, and I didn't want to use object (which would be perfectly fine normally, but in this case I wanted a typed anonymous type), I used a similar trick to a post I did a while ago on creating collections of anonymous types.

However, this is not as straightforward because we cannot rely on type inference in the constructor. What I would have liked to be able to do is something like this:

Public Class Test(Of T)
Private x As Func(Of T)
Sub New(y As T)
End Sub
End Class

Sub UseTest()
Dim x = new Test(New With {.x = 10, .y = "Tim"})
End Sub

However, this doesn't quite work because we currently don't allow type inference on constructor calls; that is, you have to explicitly state the type parameter for T when you new an object.

Ok, so let's try another technique, which involves using a factory method:

Public Class TestFactory
Public Shared Function CreateTest(Of T)(x As T) As Test(Of T)
Return New Test(Of T)
End Function
End Class

Public Class Test(Of T)
Private x As Func(Of T)
End Class

Sub UseTest()
Dim x = TestFactory.CreateTest(New With {.x = 10, .y = "Tim"})
End Sub

Ok, this works now, and you get a Test(Of <anonymoustype>) which has a field that's Func(Of <anonymoustype>), but we had to jump through hoops and use some unnatural techniques to get this.

In fact, since the main benefit of anonymous types is to avoid creating types, but in this case we have to create a factory class in order to get to this "hack", we didn't really gain anything.

This definitely belongs to the category "interesting, but I would never use this unless I wanted to confuse people". But it really shows how powerful type inference is for generic parameters.

However, I'm sure someone will find a use for this.

Anyways, I'll try to stop beating this horse :)

 

Technorati Tags: , ,

It's official - I'll be going to the New England Code Camp this weekend and presenting two talks on LINQ features and Visual Basic. I hope that it will be a blast, and if you're around the New England area, I hope you drop by!

The two talks I have planned are "LINQ Fundamentals" (all about understanding the core concepts of LINQ and how LINQ queries translate into language features) and "LINQ-ing your data" (all about understanding how LINQ changes the way you view your data, with particular focus on LINQ to XML and XML features in VB).

I'll be hanging around before/after the talks, so if you just wanna stop by and talk VB, LINQ, or compiler stuff in general, that'll be cool too! It looks like there will be a bunch of good talks at this code camp as well, so definitely try to stop by if you're in the area.

After the talks, I'll post information and samples/slide decks on my blog (or the VB team blog, which ever is more appropriate).

Hope to see you there!

Hey guys, just a quick note to let you know that my article on lambda expressions has been published on MSDN magazine. I'll be updating this blog (actually, probably the VB team blog) with more information on lambda expressions that I didn't have time to get into the article.

Hope you enjoy!

In the mean time, our team has been busy wrapping up Orcas, fixing the remaining bugs, and otherwise getting ready to ship Orcas. I hope to have some time to discuss later the kind of changes we are exploring for the compiler in general, li