Welcome to MSDN Blogs Sign in | Join | Help
I'm proud to tell you all about the availabilty *today* of C# Express for the insanely high price of $0.  That's right.  You can get streamlined and slick version of Visual Studio with complete support for C# editing, WinForms development and debugging for FREE!  I worked hard on this product (from the feature level all the way to Sku creation) and i'm incredibly proud of it.  Having a free development platform is something that i've been wanting to provide ever since i came to the company.  You were always able to develop on windows for free using the SDKs we've provided.  However, there's a vast difference between using notepad and a csc.exe and having a rich development environment that can make development so much faster.

So if you want a great,  free IDE head over to the C# Express page and go get it.  Or, heck, if you want to try out the IDE to see what it's like before shooting for one of it's more expsensive simplblings, you now can find out what they're like without needing a beta.  And, i might as well mention it, if you want great, free VB, C++, ASP.Net, J# tools or even SQL Server, you can get those for free as well!!!

This is an awesome day for me and i hope it is for you as well!

Well, unless you've been living under a rock, you know that we finally signed off on VS 2005 and that it's available now (or soon) to MSDN subscribers.  The "boxed" versions should be coming in just a few weeks, and i can't wait to see what you think about it.  I was involved in many parts of the VS2005 release, including:

  1. Working heavily on the C# IDE experience.   Primarily IntelliSense, but also in a myriad of other design time components
  2. Working some on the C# compiler.
  3. Working on creating the C# Express SKU.  I was heavily involved in piecing things together and helping with componentization so that we could sculpt vs into the lean and mean app that is C# express

So this is a huge release for me.  I wasn't involved with 2k3 and so i don't really feel strongly about that tool.  But with 2k5 you're going to be using my code (potentially for hours upoin hours every day) and that's a huge thing for me.  I'm feeling a rush of emotions right now like you wouldn't believe.  Tons of excitement, but also a lot of fear.  Will you guys like what we've done?  Will you *love* it?  Will i be the one who introduced a bug that you're going to be cursing out for months?  Who knows?!

Now, with teh release of Whidbey i can start setting my sites on Orcas.  As you've probably been able to tell, i've taken on a position of helping to craft what will eventually become the C# 3.0 language.  It's my hope that as you all start using 2.0 day in and day out you can then work with us (and us with you) on trying to make 3.0 awesome.   Toward that goal i've already tried to be intimitely involved in the prototyping we've been doing.  If you're tried out the Beta2 preview then you're using code that i was actively involved with creating.  I'd like to talk about that a lot more at some point, but that will have to wait.  But for now, i wanted to let you know that we've updated the Linq preview bits to work with the RTM version of VS2005.  You can find the download for that on the linq page here: http://msdn.microsoft.com/netframework/future/linq/

I'm going to be very clear here: Do not run these bits on Beta2.  And do not run the Beta2 bits on RTM.  There were binary breaking changes and things will crash almost immediately :-)

Normally we'd try to detect something like that so we could at least give a useful message to the user.  However, since 99% of our energy was focussed on Whidbey we simply couldn't spare the time for such niceties.

Anyways, i hope you can get this stuff soon.  And when you do, let me know what you think!

For all who missed it (like me), you can now see all the 2005 PDC sessions.  They're available here: http://microsoft.sitestream.com/PDC05/ 

Usually these are available for a while (but not indefinitely), so grab 'em while they're hot.

Obviously i think you should check out the Linq videos, but i also recommend the Monad video.

Cheers!

So this is the start of a series of posts that will dive a little deeper into the new C# 3.0 features.  My previous posts covered an overall view of what we were doing with the language as well as the Linq initiative, but didn't delve deep into the nitty gritty behind everything.  Hopefully through these posts i can make each individual facet of the future work clearer.  I'll also bring up open questions we still have and my own personal thoughts on the where we are now and where i hope to be.

It should be noted that what we have shown at the PDC is just a *preview*.  Nothing is set in stone, and it's quite likely that things will definitively change before the final future release.  It's my hope that by communicating with the developer community we can end up creating a better C# 3.0 for everyone.

So, i'm going start with the "var" feature.  The full name is actually "Implicitly Typed Local Variables", but that's quite a mouthful, so we'll just be calling it the "var" feature for now.  So what is this feature?  Well, as the full name would imply, it's a feature that allows you to declare a local variable without having to explicitly declare its type.  For example, say you currently had the following code within some member:

int i = 5;
string s = "Hello";
double d = 1.0;
int[] numbers = new int[] {1, 2, 3};
Dictionary<int,Order> orders = new Dictionary<int,Order>();

You could now write that as:

var i = 5;
var s = "Hello";
var d = 1.0;
var numbers = new int[] {1, 2, 3};
var orders = new Dictionary<int,Order>();

The important thing to realize is that "var" does *not* mean "object".  In fact, both code samples above will compile to the *exact* same thing. 

So how does this work?  Well, unlike a regular local variable declaration, a "var" declaration is required to have not just a name, but also an initializer as well.  The compiler will then figure out the type of the initializer expression (which is well defined as per the rules of the C# language) and treat the declaration as if you'd used that type as the local variable type.  So if you then try to type:

s.IndexOf('{'); //This will compile.  's' is a string, and string has an IndexOf(char) method on it.
s.FoobyBoob(); //This won't compile.  string doesn't have 'FoobyBoob' method

To make things clear.  "var" isn't some "variant" type (although the name is certainly unfortunate), and it doesn't imply some sort of "dynamic" typing system going on.  Your code is statically checked exactly as it would be if you had explicitly written down the type.

Now, right now "var" is just for local variables.  Why not allow it for something like a method parameter declaration?  Well, say you had the following:

interface IExample {
    void ShowOff(var parameter); //what type would this be?  It has no initializer to determine the type from
}

class OtherExample {
    void Demonstration(var parameter) { //what type would this be?
        parameter.Convert();            //we can't figure out what type it should be based on what we see here.
    }
}

In the first case, there's simply nothing we could do.  Without any sort of code in scope that uses "parameter" we couldn't hope to determine what its type was.  In the second case, it's possible we could try to figure out the type somehow, but it would probably be enormously complex and confusing (And often we'd still be unable to figure it out).  By limiting to local variables that *have* to have an initializer, we ensure a feature that will be usable and available in pretty all places where local variables are allowed..

So... um... neat... but why would i want that?

That's a fantastic question.  Let's start by referring to some shortcomings/negatives first.  While implicitness can be quite handy for writing code, it can make things quite difficult when trying to just read code.  In all the above examples it's fairly easy to determine what the type of each of the variables is.  That's because you have either a nice primitive that you can look at, or the call to some constructor which you can then look directly at to figure out the type of variable.  But it's not always that simple.  What if you were to have:

var lollerskates = GetSkates(these, parameters, will, affect, overload, resolution);

Now what do you do?  As i said before the type of the variable will be statically determined by the compiler using the expression binding rules that are spelled out in detail in the C# specification.  But that in itself is an extraordinary problem.  There is a huge number of binding rules, and some of them (like generic type inference) are quite complex.  Keeping all those rules in your head and correctly trying to apply them on-the-fly in order to just comprehend what your code means sounds like a rather onerous burden to put on you.  In effect we'd be asking you to do the compilers job just so you could answer the question "is lollerskates an IEnumerable or an IList??"

On the other hand, var does make other things nicer.  For one thing, it avoids some pretty ugly duplication that arises when you start to write heavily genericized code.  Instead of needing to write:

Dictionary<IOptional<IList<string>>,IEnumerable<ICollection<int>>> characterClasses = 
    new Dictionary<IOptional<IList<string>>,IEnumerable<ICollection<int>>>()

you can now write:

var characterClasses = new Dictionary<IOptional<IList<string>>,IEnumerable<ICollection<int>>>()

There's a heck of a lot of duplication that you can now cut out.  It cleans up your code *and* makes it more readable (IMO).  Two plusses that are always welcome in my book.  Another benefit is if you have code like this:

object o = ExistingMethod();

...

object ExistingMethod() { ... }

If you then update ExistingMethod like so:

string ExistingMethod() { ... }

Well, your code still compiles, however if you want to take advantage of the fact that ExistingMethod returns "string" (i.e. to call some method like IndexOf on them) you'll have to update all your declarations that call from "object" to "string".  If you had declared it as:

var o = ExistingMethod();

then you would only have to update one location while still being able to take advantage of that redefinition everywhere.

Ok.  Well, those both seem somewhat *meh*'ish.  Convenient sure, but worth the potential negatives in code readability?  With C# 2.0 as it exists today... probably not.  But with some of the Linq work we have coming up, then the picture changes quite a bit.  Let's start by looking at a potential way you can write queries in C# 3.0:

   var q = 
      from c in customers
      where c.City == "Seattle"
      select new {
          c.Name,
          Orders = 
              from o in c.Orders
              where o.Cost > 1000
              select new { o.Cost, o.Date }
      };

In this case we're generating a hierarchical anonymous type (which i'll go in depth on in a future article).  Say we didn't have "var", but we *did* have some hypothetical syntax for expressing an anonymous type.  We'd end up having to write the following:

  IEnumerable<class { string Name; IEnumerable<class { int Cost; DateTime Date }> Orders }> q =
      from c in customers 
      where c.City == "Seattle" 
      select new { 
          c.Name, 
          Orders = 
              from o in c.Orders 
              where o.Cost > 1000 
              select new { o.Cost, o.Date } 
      };

Good golly!  That's a lot to type!  As with the "Dictionary" example above, you end up with a declaration with a lot of duplication in it.  Why should i have to fully declare this hierarchical type when the structure of the type i'm creating is fairly clear from the query initializing it.  You could make things easier for yourself by defining your own types ahead of time instead of using anonymous types, but that would make the act of projecting in a query far less simple than what you can do with anonymous types.   Of course, if you want to do this, you're completely able to do so while being fully able to work within Linq system.  And, because this seems like a feature with it's own plusses/minuses, and because it seems like people will want to move back and forth between implicit/explicit types depending on the code they're writing, it will make a lot of sense for us to provide user tools for this space.  Some sort of refactoring like "Reify variables".  Or... since people won't know what the heck that means: "make variable implicit/explicit."   :-)

So what do you think?  Are there things you do/don't like about "var"?  Personally, i think the name is somewhat of a problem.  C++ is introducing a similar concept, while calling it "auto".  I'm partial to that, but leaning more to making "infer" a keyword itself.  I think writing down "infer a = "5" reads very well and helps alleviate confusion issues that might arise with "var".

FYI: There seems to be a problem with the blog software i'm using where i'm not getting notified about all posts that you guys are making.  It's being actively investigated, and i'm hoping for a resolution soon.  So if you're finding that i'm not responding to a post in a timely manner (say a few days max), then feel free to directly reach me through the Contact link at the top of the page.

Sorry for all this.  And please don't think that i'm ignoring you or your questions!

In the last post i  discussed a little bit of background on why we wanted to introduce Linq, as well as a bit of info on what some basic C# Linq looked like.  In this post i'm going to dive in a little bit deeper to some other interesting things we're introducing as well

Here's the current example we've been using to drive the discussion along

        Customer[] customers = GetCustomers();
        var custs = customers.Where(c => c.City == "Seattle").Select(c => c.Name);

Now, so far that's a very C#-centric way to do queries over data.  However, it's still a little bit heavyweight.  What about a more query-like syntax to do the same that's far more convenient?  Well, it turns out htat we have that as well:

   var q = 
      from c in customers
      where c.City == "Seattle"
      select c.Name;

This new query syntax is in fact just syntactic sugar that uses patterns to transform itself into the *exact* same C# query that i listed above.  In fact, this is the same way that we handle foreach (specifically by transforming it into a loop with calls to MoveNext, Current, Dispose).

Now, when looking at this you'll almost certainly notice how it looks *almost*, but not quite like SQL.  And, you'll probably be asking: "can't you just make it look like SQL if it's that close!  Is this just MS wanting to be a pain just for the heck of it??"  In this case, the answer is "No".  One of hte problems with the straight SQL like approach is that we'd have to put the "select" first.  "Ok... what's wrong with that" you say.   Well, let's take a look:

   var q = 
      select c<dot>

Now, at this point, you're constructing the final shape for this query.  You know you want to write "c.Name" and you'd like to utilize handy features like IntelliSense to help speed you up with typing that.  But you can't!  Because you haven't even stated where your data is coming from, there's no way to understand what's going on this early in the expression.  This is because in SQL the scope of a variable actually flows backwards.  i.e. you use variables before you've even declared this.  However, in C# you can only use something after it's been declared.  So in order to better fit within this model (which has some very nice benefits), we made it so that from has to come first.  Beyond statement completion there are also issues of being able to construct large hierarchical queries in an understandable way.  Having the scope flow from left to right, top to bottom, makes that much simpler and brings a lot of clarity to your expressions.

Now what about projections?  They're incredibly common operations in SQL.  You're aways doing things like "select a, b, c" and in essence projection out the information you care about into these columns.  So how would we go about doing this sort of thing in C# 3.0?  Well, you could do this:

   var q = 
      from c in customers
      where c.City == "Seattle"
      select new NameAndAge(c.Name, c.Age);

but that's a real pain.  Any time i want to project any information out, i need to generate a new type and fill all it's gunk in.  That means writing the class somewhere.  Creating a constructor for it.  Creating fields and properties.  Implementing .Equals and .GetHashCode.  etc. etc.  yech.  Far too much work, error prone and causes API clutter.  So what can we do to alleviate that?  Well, in C# 3.0 a new feature called "Anonymous Types" comes to the rescue.  We can now write the following:

   var q = 
      from c in customers
      where c.City == "Seattle"
      select new { c.Name, c.Age };

What this is doing is projecting the customer out into a new structural type with two properties "Name" and "Age", both of which are strongly typed and which have been assigned the values of their corresponding properties in "c".  What's the type of Q at this point?  Well, it's an IEnumerable<???> where ??? is some anonymous type with those two properties on it.  BTW, it should now seem somewhat more obvious why the "var" keyword was added to the language.  In this case you cannot actually write down the type of "q", but you need some way to declare it.  "var" comes to the rescue here.

So i could now write:

   foreach (var c in q) {
      Console.WriteLine(c.Name);
   }

and that would compile and run just file.

Now "wait a minute!" you're saying.  "Is this some sort of late-binding thang where we're using refelction to pull out this data?"  No sir-ee.  In fact, if you were to try and write:

   foreach (var c in q) {
      Console.WriteLine(c.Company);
   }

then you would get a compiler error immediate.  Why?  Well, the compiler knows that the anonymous type which you've instantiated only has two members on it (Name and Age), and it's able to flow that information into the type signature of 'q'.  Then when foreach'ing over 'q', it knows that the type of 'c' is the same structural anonymous type we created earlier in the 'select'.  So it will know that it has no "Company" property and appropriately inform you that your code is bogus.  All the strong, static typeing of C# is there.  You are just allowed to exchew writing the type now and instead allow inference to to take care of all of it for you.  Users of languages like OCaml will find this immeditely familiar and comfortable.

Now, one thing that's quite common in the object world is the usage and manipulation of hierarchical data.  i.e. objects formed by collection of other objects formed by collections of... you get the idea.  Now, say you wanted to query your customers to get not only the customer name, but information about the orders they've been creating.  You could write the following very SQL-esque query:

   var q = 
      from c in customers
      where c.City == "Seattle"
      from o in c.Orders 
      where o.Cost > 1000
      select new { c.Name, o.Cost, o.Date };

We've now joined the customer with their own orders.  This would get the job done, but maybe it's not really returning the information in the structure you want.  For one thing, the data isn't grouped by customer.  So for every order made by the same customer you're going to get a new element.  So let's take it a little further:

   var q = 
      from c in customers
      where c.City == "Seattle"
      select new {
          c.Name,
          Orders = 
              from o in c.Orders
              where o.Cost > 1000
              select new { o.Cost, o.Date }
      };

Voila.  We've now created a hierarchical result.  Now, per customer you'll only get one item returned.  And that item will have information about all the different orders they've made that fit your criteria.  Now you can trivially create queries that get you the results you want in the exact shape you want.

Next up!  Drill downs into many of the specific new features that we're bringing to the table.

But first: a teaser!  Say you have the following code:

   var customers = GetCustomersFromDataBase();
   var q = 
      from c in customers
      where c.City == "Seattle"
      select new {
          c.Name,
          Orders = 
              from o in c.Orders
              where o.Cost > 1000
              select new { o.Cost, o.Date }
      };

   foreach (var c in q) {
       //Do something with c
   }

Did you know that you will be able to write that code in C# 3.0 and DLinq will make sure that that query executes on the DB using SQL?  It will then only suck down the results that matched the query, and only when you foreach over them.   That's right.  That entire "from ... " expression will execure server side.  And it didn't need to even be in "from" form.  If you'd written it as "customers.Where(c => c.City == "Seattle").Select(c => c.Name)" then the same  would be true.  How's that for cool.  Stay tuned and a later post will tell you how that all works!

I've been mulling over the best way to talk about the new C# 3.0 stuff we've been working on.  I presented the post on how you could use the new C# 3.0 features to go beyond the basic query functionality we've been targetting it at.  The was to help give an appreciation about how we've added strong query support through the addition of several new smaller features that can be used for more than query (although that's the formost area that we're trying to attack).  However, i then realized that it was somewhat interesting that i would present the post on "what *else* you can do with C# 3.0" before anyone even had idea of what you "could" did with C# 3.0 first.

I could do a fairly detailed drill down of the new C# features, but i actually thought a more holistic approach would be better in this case.  So i'm actually going to talk about the general problem space we're confronting, and i'll try to provide some running examples to help carry me through this.

So what is Linq?  Well, Linq is the culmination of a number of techniques we're producing to help deal with the large disconnect between data programming and general purpose programming languages. Linq stands for Language INtegrated Query, and simply put, it's about taking query, set operations and transforms and making them first class concepts in the .Net world.  This means making them available in the CLR, in .Net programming languages, and in the APIs that you're going to be using to program against data in the future.  Through all this you can get a completely unified query experience against objects, XML, and relational data.  i.e. the most common forms of data that will appear in your application.  And, what's best, if you happen to have your own form of data that doesn't fit into those different models, then you can use our extensible system to target that model as well.  After all, our XML and relational data access models (called XLinq and DLinq respectively) are just APIs built on top of the core Linq infrastructure.  As such, i'm not going to dive too deeply into those specific models.  I'm going to let the individual teams who are responsible for that (and who know those APIs far more intimately) to give you all the information at their disposal.

So, let's first talk about data access today and how our new approach most likely differs from that you've been used to.  If you're accessing a database somewhere in your application, then there's a good chance that you've embedded some bit of SQL somewhere.  Maybe you've kept it fairly clean and abstracted away, or maybe you have SqlCommand's left rigth and center all with their own "select *"'s or other raw SQL commands stored hither.  Of course, when writing this code you had no compile time checking that your SQL strings were well formed, no IntelliSense, etc.  Because, effectively, you are using two completely different languages in an environment that only understands one.  This is pretty bad, but really only begins to scratch the surface of the deep mismatch between this relational data domain and the object domain.

Through and through you have mismatches between objects and relational data and XML in your system.  Different types.  Different operations.  Different programming models.  Your code which works on XML won't work on relational data.  You code which works on relational won't work on objects. etc.  But there's a better way.  Now we can allow you to work with all these different data systems right within C# (or VB).  This means using the same syntax, the same types, and the same programm ing models to query and manipulate all these different forms of data in a unified manner.  And, because support for these models has been built on top of an extensible system, it means that if necessary you can do the same as what we've done to bring this strong query support anywhere you need to it go where we don't currently have an offerring.

To ground this discussion a little, let's start looking at a simple example of C# 3.0/Linq in action.  (Note: this example might look very familiar.  That's because many demos and examples are made to run against the Northwind DB.  This allows us to all talk about the same thing and have consistent and clear names for entities).   You start with a simple list of Customers:

        Customer[] customers = GetCustomers();

Nothing magic going on here.  Nothing up my sleaves.   Just a regular .Net array initialized from some source.  Now, to make things a little simpler (especially for later examples) we can then write that as:

        Customer[] customers = GetCustomers();
        var custs = customers;

What's going on in that second line? Well, "var" is are way of introducing "local variable type inference".  It's a new C# 3.0 feature that allows you to save space by not writing the type of a local variable, while also having the type inferred from the expression that initializes the variable.  So, in the above code, "custs" is known at compile time to be a "Customer[]".  If you were to write:

        var i = 10;
        var b = true;
        var s = "hello";

then it would be the *exact* same as writing:

        int    i = 10;
        bool   b = true;
        string s = "hello";

We'll see later on why this can be quite a handy thing.  Now, let's extend our code a bit further to start querying that array of customers:

        Customer[] customers = GetCustomers();
        var custs = customers.Where(c => c.City == "Seattle");

Here we're simplying querying all our customers for the set of customers that are from Seattle.  And "custs" will be an IEnumerable<Customer>.  We can even carry that a little further in to the following query:

        Customer[] customers = GetCustomers();
        var custs = customers.Where(c => c.City == "Seattle").Select(c => c.Name);

Here we're projecting out the name of all our customers from Seattle.   So custs will be an IEnumerable<string>.  Now, what the heck is this code.  This isn't your daddy's C# anymore.  What are those funky arrows?  And where did the "Where" and "Select" methods come from??  They're certainly don't seem to be defined on array type when i look at it in ILDasm!  Well, to answer the first question, the funky => arrow the new C# 3.0 syntax that allows you to create a lambda expression. You can think of a lambda expression as a natural evolution of the anonymous methods introduced in C# 2.0.  Lambda expressions benefit from simpler syntax and the ability to use inference.  So now you can write:

        c => c.City == "Seattle"  //instead of
        delegate (Customer c) { return c.City == "Seattle"; }

As you can see, the C# 2.0 method just drowns you in syntax and it makes it a rather poor choice to use in queries (heck! there's a 2x increase in query size between the two).  However, the new C# lambda expression succitly encapsulates the test we want to perform, with only about 5 characters overhead.

That answers the first question, but what about the second?  Where, oh where did "Where" come from?  This is an example of another new C# 3.0 feature we call "extension methods".  Extensions are a way to allow you to add operations to existing types that aren't under your control.  While that may give you the heebie-jeebies, rest assured, you're not actually modifying the actual type.  Rather, you're being allowed to use succint syntax to in effect execute a method as if it existed on this type.  Specifically, extension methods are static methods that look like so:

namespace System.Query {
    public static class Sequence {
        public static IEnumerable<T> Where<T>(this IEnumerable<T> e, Predicate<T> p) {
            foreach (T t in e) {
                if (p(t)) {
                    yield return t;
                }
            }
        }
    }
}

This declares an "extension method" on the IEnumerable<T> type.  When you import the namespace by writing "using System.Query", you now gain the ability to call teh "Where" method on anything that implements IEnumerable<T> (like Arrays).  With these extension methods we can now compose powerful query functions together to manipulate data easy.

So at this point we've seen three new C# 3.0 features that can be used together to build a powerful base for querying objects.  In future posts i'll include information about the rest of the new language features, and i'll give a more comprehensive view of how sophisticated our query support is.

If you've heard about the new C#/Linq work that got announced today, but don't know where to go to find it.  Well, the awesome folks at MSDN have put up a great page that highlights what we're working on, as well as providing great links to our latest specs as well as awesome videos that show off what this is all about.  In th enext few days i'll start going over this information in depth to give you an idea of what all the pieces are and how they all fit together.

Can’t attend PDC but still want to talk to the C# team about what's coming up? This chat is your chance! Join the C# team to discuss the newly announced C# 3.0 features like:

  1. extension methods
  2. lambda expressions
  3. type inference
  4. anonymous types
  5. query comprehensions
  6. Expression trees
  7. and the rest of the .NET Language Integrated Query Framework.

You've been hearing rumblings about this for a while, now we finally talk in depth about the future of the C# language.  Hope to see a lot of you there.  Current ETA is 9/22/05 at 1-2pm.  But check the main link to MSDN for more details and to track this on your calendar!

The previous post ended up showing that while visitors are available in C#, they lack usability brought by built in language constructs they could have that would make them an ideal choice to solve our problem.  In Java, we saw that anonymous inner classes provided such a convenient construct, and in this post i wanted to show how several of the future language enhancements we're bringing to C# will also make this convenient usage possible.  Now, from what you've seen from PDC so far, the new Linq enhancements are targetted all around object, data, and xml querying and manipulation.  However, as this post will show, the core functionality we've added to C# can be used for far more than just that. 

Specifically, we're going to make use of two of the new core language features: Lambda Expressions and Object Initializers.  Let's take a look at how we could use those two features to make visitors far more enjoyable.  To begin with, we're going to define our visitor interface and DefaultTokenVisitor slightly differently than how we did in Java.  In C# they're now going to look like this:

    public delegate void Action<A>(A a);

    public interface ITokenVisitor
    {
        Action<Token>                  VisitToken                  { get; set; }
        Action<KeywordToken>           VisitKeywordToken           { get; set; }
        Action<InterfaceToken>         VisitInterfaceToken         { get; set; }
        Action<ClassToken>             VisitClassToken             { get; set; }
        Action<IdentifierToken>        VisitIdentifierToken        { get; set; }
        Action<ContextualKeywordToken> VisitContextualKeywordToken { get; set; }
        Action<AccessibilityToken>     VisitAccessibilityToken     { get; set; }
        Action<NoisyToken>             VisitNoisyToken             { get; set; }
        Action<CommentToken>           VisitCommentToken           { get; set; }
        Action<WhitespaceToken>        VisitWhitespaceToken        { get; set; }
    }

    public class DefaultTokenVisitor : ITokenVisitor
    {
        public virtual Action<Token>                  Default                     { get { ... } set { ... } }
        public virtual Action<Token>                  VisitToken                  { get { ... } set { ... } }
        public virtual Action<KeywordToken>           VisitKeywordToken           { get { ... } set { ... } }
        public virtual Action<InterfaceToken>         VisitInterfaceToken         { get { ... } set { ... } }
        public virtual Action<ClassToken>             VisitClassToken             { get { ... } set { ... } }
        public virtual Action<IdentifierToken>        VisitIdentifierToken        { get { ... } set { ... } }
        public virtual Action<ContextualKeywordToken> VisitContextualKeywordToken { get { ... } set { ... } }
        public virtual Action<AccessibilityToken>     VisitAccessibilityToken     { get { ... } set { ... } }
        public virtual Action<NoisyToken>             VisitNoisyToken             { get { ... } set { ... } }
        public virtual Action<CommentToken>           VisitCommentToken           { get { ... } set { ... } }
        public virtual Action<WhitespaceToken>        VisitWhitespaceToken        { get { ... } set { ... } }
    }

So far, this code just makes use of C# 2.0 features.  It might take a couple of minutes for your mind to wrap around it, but what we've ended up doing here is creating an type whose methods can be overridden trivially at runtime.  We do this by simulating methods with properties that return delegates.  For example, you can provide a method implementation by writing: ".VisitToken = ..." and that method can then be invoked just as you would expect: "VisitToken()".  While they are actually properties and delegates, they appear (for all intents and purposes) as methods that you can change at runtime.  In fact, from a syntactic perspective, the code is indistinguishable, and so we do not even need to change any or our visitor accept methods on our token classes.

So what can we do with this?  Well, let's see what we can turn our parser code into thanks to the above classes:

        void parseType()
        {
            parseModifiers();

            CurrentToken.AcceptVisitor(new DefaultTokenVisitor {
                VisitClassToken     = token => parseClass(),
                VisitInterfaceToken = token => parseInterface(),
                /* include other cases */
                Default             = token => /*handle error*/
            });

            //Parse rest of type
        }

The code bahaves how it reads.  We create a simple visitor with most of the logic built in.  But at construction time we tell it how to behave when it sees a "class" or "interface" token.  And in those cases, we are able to specify the behavior through a lamba expression in a succint manner. 

The "Object Initializer" feature is what allows us to instantiate an object and assign into its fields (much like attribute constructors), in an expression form rather than a statement form.  The lambda expressions then allow us to easily create closures of code that we which to execute at a later point in time.  Both features are simply lightweight syntactic constructs on what's already available in C# 2.0, but as you can see from the following code, the difference it makes in code is enormous:

        void parseType()
        {
            parseModifiers();

            ITokenVisitor visitor = new DefaultTokenVisitor();
            visitor.VisitClassToken = delegate (ClassToken token) {
                this.parseClass();
            };

            visitor.VisitInterfaceToken = delegate (InterfaceToken token) {
                this.parseInterface();
            };

            /* include other cases */
            
            visitor.Default = delegate (Token token) {
                /*handle error*/
            };

            CurrentToken.AcceptVisitor(visitor);

            //Parse rest of type
        }

(In fact, it's probably worse than the initial example of visitors in C#).  Completely unintelligible IMO.

While this code appears similar to Java's anonymous inner classes, they actually don't share almost nothing in common.  In Java you would be creating a instance of an unamed subclass of the DefaultTokenVisitor type, whereas in C# you are just instantiating an instance of the actual DefaultTokenVisitor class.  In Java the method declarations we used were overrides of the methods defined in the DefaultTokenVisitor class, whereas in C# we're just assigning lambda expressions into our properties which we're using to simulate methods.  In Java, if we were to use the "this" keyword, we'd be referring to the instance of the anonymous inner class the code was executing in, whereas in C# the "this" still refers to the parser instance in scope.  By using these two constructs we end up with the ability to use visitors from C# easily with a syntax that is even less heavyweight than the corresponding java constructs.  There are many ways in which the code is simpler:

  1. The C# code doesn't need () on the instantiation of DefaultTokenVisitor.  Very minor.
  2. The C# code can eschew the "public void" on all it's visitation operations.  Nice, and cleans things up well.
  3. The C# code doesn't allows the type of the parameter to be optional.  If you want it, you can keep it.  However, you can leave it out and still have things statically checked at compile time.  It helps with code duplication to not have to say: VisitClassToken(ClassToken..., and instead say: VisitClassToken(token...
  4. The C# code doesn't need the {}'s if the code is a trivial expression or statement.  Very minor
  5. The C# code doesn't need to dismbiguate the instance of the parser and the instance of the inner class like java does.  So there's no need to write: Parser.this.parseInterface().  Instead you can just do the simple: this.parseInterface()

All in all, the C# version is about 2/3s the length and cuts out a lot of the cruft.  This allows you to intermingle your visitor logic with the rest of your code just as you'd like to be able to do.

So while Linq is a focused effort from many different teams at Microsoft to provide powerful integrated query support, you can see that the individual features that have been introduced in C# can be used in a lot of powerful ways beyond just that.  Hopefully you'll have your own good ideas about how to use these new language features in other interesting ways.  If so, let me know so we can share it with the rest of the development community out there!

So we left off on the previous post with the question of why we were using Java to work with our new token visitors.  Can't visitors be used in C# as well?  Well, yes.  However, not necessarily as conveniently as with Java.  How so?  Well, let's take a look at what the code would look like in C#.  First off, the visitor interfaces and DefaultTokenVisitor will be the same as with the java code (albeit with slight syntactic differences).  However, in order to write the parser we'd have to do the following:

    public class Parser {
        Token CurrentToken { get { ... } }

        void parseType() {
            parseModifiers();

            CurrentToken.AcceptVisitor(new DetermineTypeToParse(this));

            //Parse rest of type
        }

        void parseModifiers() { ... }
        void parseClass() { ... }
        void parseInterface() { ... }

        class DetermineTypeToParse : DefaultTokenVisitor {
            readonly Parser parser;

            public DetermineTypeToParse(Parser parser) {
                this.parser = parser;
            }

            public override void VisitClassToken(ClassToken token) {
                parser.parseClass();
            }

            public override void VisitInterfaceToken(InterfaceToken token) {
                parser.parseInterface();
            }

            /* include other cases */

            public override void Default(Token token) {
                /* handle error */
            }
        }
    }

Functionally, this is equivalent to the java code above (and in actuality is basically what the java compiler is generating when you type in the anonymous inner class), however we've lost quite a lot in the translation.  Specifically, we've now had to separate and make disjoint the parser's logic and the visitor's logic.  However, both sets of logic are closely related and benefit highly from tight locality in the code.  Depending on how the code is structured, and how many visitors are needed, you might end up having this logic hundreds of lines apart.  Verifying then that the parser code works as it should is far too difficult, and unclear.  You also end up creating a type that exists solely to be instantiated in one and only one location.  But you've now cluttered your namespace with this class which you then need to police to ensure that it's used properly.

So while the visitor pattern is fully functional from C#, it lacks the usability that one would want in order to use it as a core designing principle in your APIs.  Is there a way that we can get the power of visitors without this drawback?  Wait and see!

The previous post on this topic gave us a problem statement for us to look at.  Specifically, how to design an internal structure that we want to be easily consumable from many different locations, while not weighing down the structure with any orthogonal unnecessary functionality.  At the end of the previous post we also discussed how the ReplaceTypeCodeWithClass had several benefits and drawbacks.  Let's look at how that technique would address the code example we've been using so far.

When we replace the type switch with separate classes, we'll end up with a hierarchy that looks like this (simplified):

We now have a flexible type system that encodes the idempotent information in the type signature now instead of in the dicriminant ID field.  This carries a lot of benefits (least of which is that there is now compile time type checking on these types), however it also (initially) carries some drawbacks.  What would our parser code look like now?

    public class Token {
        public void DetermineWhichTypeToParse(Parser parser) {
            //error
        }
    }

    public class InterfaceToken : KeywordToken {
        public void DetermineWhichTypeToParse(Parser parser) {
            parser.parseInterface();
        }
    }

    public class ClassToken : KeywordToken {
        public void DetermineWhichTypeToParse(Parser parser) {
            parser.parseClass();
        }
    }

    public class Parser {
        Token CurrentToken { get { ... } }

        void parseType() {
            parseModifiers();

            CurrentToken.DetermineWhichTypeToParse(this);

            //Parse rest of type
        }
    }

Sure it works.  But *bleagh*.  Now our nice token hierarchy is cluttered with parser knowledge that is should know nothing about.  On top of that, it's quite possible that to be able to pull this off i'd have to expose private parser specific functionality to make this work (i.e. make my private parser functions internal).  This is really not the path that i want to go down.  I want to have this nice rich type system, and i want to be able to use it a flexible manner, but i don't want to end up with ugly code like the above. 

Is there a solution?  Luckily, yes, there are many.  One of which is multi-methods (which are already available in .Net, albeit not in a clear form), the other of which is to implement the well known visitor pattern on this new token hierarchy.  So what would that look like?  Well, we'd start with the following code:

    public interface ITokenVisitor {
        void VisitToken(Token token);
        void VisitKeywordToken(KeywordToken token);
        void VisitInterfaceToken(InterfaceToken token);
        void VisitClassToken(ClassToken token);
        void VisitIdentifierToken(IdentifierToken token);
        void VisitContextualKeywordToken(ContextualKeywordToken token);
        void VisitAccessibilityToken(AccessibilityToken token);
        void VisitNoisyToken(NoisyToken token);
        void VisitCommentToken(CommentToken token);
        void VisitWhitespaceToken(WhitespaceToken token);
    }

    public class (DefaultTokenVisitor implements ITokenVisitor {
        public void Default(Token token) {
        }

        public void VisitToken(Token token) {
            Default(token);
        }

        public void VisitKeywordToken(KeywordToken token) {
            Default(token);
        }

        /* all further implementation just defers to "Default" */
    }

    public class Token {
        public void AcceptVisitor(ITokenVisitor visitor) {
            visitor.VisitToken(this);
        }
    }

    public class KeywordToken extends Token { ... }
    public class InterfaceToken extends KeywordToken { ... }
    public class ClassToken extends KeywordToken { ... }
    public class IdentifierToken extends Token { ... }
    public class ContextualKeywordToken extends IdentifierToken { ... }
    public class AccessibilityToken extends KeywordToken { ... }
    public class NoisyToken extends Token { ... }
    public class CommentToken extends NoisyToken { ... }
    public class WhitespaceToken extends NoisyToken { ... }

Pretty standard Visitors right?  Yup, nothing special about them.  Except... these visitors are written in Java.  Why Java?  Well, as it turns out, Java has a very nice language construct that makes using visitors quite handy.  Let's take a look at how our parser code would look in Java:

    public class Parser {
        Token getCurrentToken() { ... }

        void parseType() {
            parseModifiers();

            getCurrentToken().AcceptVisitor(new DefaultTokenVisitor() {
                public void VisitClassToken(ClassToken token) {
                    Parser.this.parseClass();
                }

                public void VisitInterfaceToken(InterfaceToken token) {
                    Parser.this.parseInterface();
                }

                /* Further cases */

                public void Default(Token token) {
                    //handle error
                }
            });
        }
    }

Here i've used Java's "Anonymous Inner Classes" to trivially create a visitor that allows me drive my parser on top of these new tokens.  Specifically, the visitor says that when it "visits" a "class" token that the outer parser ("Parser.this") should start parsing a class, likewise with an interface token.  Any other token (beyond enum/delegate) will cause an error (recall that all Visit methods in DefaultTokenVisitor defer to the Default method which we've overridden in our anonymous inner class).  As you can see, this structure is completely isomorphic to the "switch" statement we saw in the original post.  Here it is again for reference:

    public class Parser {
       ...
            switch (CurrentToken.ID) {
                case TokenID.Class:
                    parseClass();
                    break;
                case TokenID.Interface:
                    parseInterface();
                    break;
                /* Further cases */

                default:
                    //Handle errors
                    break;
            }
        }
  1. switch (CurrentToken.ID)" corresponds to the "getCurrentToken().AcceptVisitor".
  2. the "case" statements correspond to the overridden "Visit" methods in the anonymous inner class
  3. the "default" case corresponds to the overriden "Default" method in the anonymous innder class

Seems great!  We now have a convenient hierarchy for describing tokens in a type safe manner, and we have a Visitor system that allows us to use them flexibly without clutter, while also allowing the code around the token handling to be self-describing.  i.e. i can easily look at the anonymous visitor and see what it's doing.

So we're done right?  This is the path we should go down?  Well... not yet... there's still one unanswered question: Why was i using Java to demonstrate this style of development?

One of the things we love doing around here is dicussing different design techniques for attacking problems in general and the work we do in specific.  One of the very common conversations we have is simply how to structure internal data within the C# compiler and language service to best facilitate things like maintainability, readability, performance, correctness, etc.  With that in mind, i thought i'd bring up a recent discussion we were having that related to that kind of talk.  We were specifically talking about token streams and parse trees, and how we'd like to see them represented.  To start, let's give a simple example of how a token could be represented:

    public enum TokenID {
        Class,
        Interface,
        Identifier,
        Comment,
        Whitespace,
        Public
        /* remaining enum values for the rest of the C# tokens */
    }

    public class Token {
        public TokenID  ID       { get { ... } }
        public Position Position { get { ... } }
        public string   Text     { get { ... } }
    }

A fairly simple representation.  In this case we're basically storing tokens as a discriminated union (with the ID as the discriminator).  One of the benefits of this approach is the ease of which a low overhead parser can be built.  For example, let's look at a hypothetical parser and how it would deal with such a structure:

    public class Parser {
        Token CurrentToken { get { ... } }

        ///This function handles parsing of classes/interfaces/enums/delegates
        void parseType() {
            parseModifiers();

            switch (CurrentToken.ID) {
                case TokenID.Class:
                    parseClass();
                    break;
                case TokenID.Interface:
                    parseInterface();
                    break;
                /* Further cases */

                default:
                    //Handle errors
                    break;
            }
        }
    }

The code above attempts to parse in modifiers (like public/internal) and then determines what to do next based on if it sees the "class" or "interface" tokens. 

This seems to work great, and there isn't a lot of good reason to change from that above style of code.  For a parser, which only uses tokens to determine it's parsing path, it's more than sufficient to have a simplified way for each token to easily affect flow control.  Now, once you go past parsing things get a little interesting.  For example, an IDE needs to examine the token stream intimately in order to do colorization, formatting, IntelliSense, handling incomplete generic declarations, etc.  You end up with lots of these switches all over the place, and in general your code can get very messy.  For example, say you want to have code that handles all accessibility tokens.  You then need to have a switch with fall-throughs defined for those tokens (yech).  Also, if you add a new token you need to make sure that you update all those switches appropriately.

Now, coming from what we know about refactorings, it seems like there's a very good alternative to this.  Specifically: Replace Type Code with Class.  This would work fine... except for one nigglign detail: in many cases you end up cluttering up an class with unecessary functionality.  For example, why should a token have any understanding of how colorization works?  That's a completely orthogonal idea that should reallly be kept seperate from tokens.  This keeps tokens simple, and simplicity is a good thing(tm).

So is there another alternative?  Wait and see :-)

The PDC keynote is going to be in a few hours.  And while i won't be there, i'm definitely going to be watching it live (to access it go here).  There's a lot i'm going to want to blog about at that point, but to help out my later posts i'm goign to start by presenting a short series of lead-up articles that discuss the state of affairs today, and how some of what we're working on can make a certain programming style much nicer.

As many of you may know, we recently announced a pretty big change to the C# 2.0 language.  The full details of the change can be found at Soma's blog but i'll include the information here.

We designed the Nullable type to be the platform solution, a single type that all applications can rely on to uniformly represent the null state for value types.  Languages like C# went ahead and built in further language features to make this new primitive feel even more at home.  The idea was to blur the subtle distinction between this new value-type null and the familiar reference-type null.  Yet, as it turns out, enough significant differences remained to cause quite a bit of confusion.

 

We soon realized the root of the problem sat in how we chose to define the Nullable type.  Generics were now available in the new runtime and it seemed quite simple to use this feature to build up a new parameterized type that could easily encode both a value type and an extra flag to describe its null state.  And by defining the Nullable type also as a value type we retained both the runtime behaviors and most of the performance of the underlying primitive. No need to special case anything in the runtime.  We could handle it all as just an addition to the runtime libraries, or so we thought.

 

As several of you pointed out, the Nullable type worked well only in strongly-typed scenarios.  Once an instance of the type was boxed (by casting to the base ‘Object’ type), it became a boxed value type, and no matter what its original ‘null’ state claimed, the boxed value-type was never null. 

 

      int? x = null;

      object y = x;

      if (y == null) {  // oops, it is not null?

        ...

      }

 

It also became increasingly difficult to tell whether a variable used in a generic type or method was ever null.

 

    void Foo<T>(T t) {

       if (t == null) {  // never true if T is a Nullable<S>?

       }

    }

 

Clearly this had to change.  We had a solution in Visual Studio 2005 Beta2 that gave users static methods that could determine the correct null-ness for nullable types in these more or less ‘untyped’ scenarios.  However, these methods were costly to call and difficult to remember to use.  The feedback you gave us was that you expected it to simply work right by default.

 

So we went back to the drawing board.  After looking at several different workarounds and options, it became clear to all that no amount of tweaking of the languages or framework code was ever going to get this type to work as expected.

 

The only viable solution was one that needed the runtime to change.  To do that, it would require concerted effort by a lot of different teams working under an already constrained schedule.  This was a big risk for us because so many components and products depend on the runtime that it has to be locked down much sooner than anything else.  Even a small change can have significant ripple effects throughout the company, adding work and causing delays.  Even the suggestion of a change caused quite a bit of turmoil.  Needless to say, many were against the proposal for very credible reasons.  It was a difficult decision to make. 

 

We were fortunate that so many here were willing to put in the extra work it took to explore the change, prototyping it and testing it, that a lot of the uncertainty and angst was put to rest, making the decision to go ahead all that much easier.

 

The outcome is that the Nullable type is now a new basic runtime intrinsic.  It is still declared as a generic value-type, yet the runtime treats it special.  One of the foremost changes is that boxing now honors the null state.  A Nullabe int now boxes to become not a boxed Nullable int but a boxed int (or a null reference as the null state may indicate.)  Likewise, it is now possible to unbox any kind of boxed value-type into its Nullable type equivalent. 

 

      int x = 10;

      object y = x; 

      int? z = (int?) y;  // unbox into a Nullable<int>

 

Together, these changes allow you to mix and match Nullable types with boxed types in a variety of loosely typed API’s such as reflection.  Each becomes an alternative, interchangeable representation of the other.

 

The C# language was then able to introduce additional behaviors that make the difference between the Nullable type and reference types even more seamless.  For example, since boxing now removes the Nullable wrapper, boxing instead the enclosed type, other kinds of coercions that also implied boxing became interesting.  It is now possible to coerce a Nullable type to an interface implemented by the enclosed type.

 

       int? x = 0;

       IComparable<int> ic = x;  // implicit coercion

The reason i'm bringing this up is that i wanted to call out something specific that Soma mentions:

In the past, I have talked about how your feedback is a critical part of us building the right product.  Recently, we took a big DCR (Design Change Request) into Visual Studio 2005 that was in response to your feedback.  This was a hard call, because it was a big change that touched many components including the CLR.  Nonetheless, we decided to take this change at this late stage in the game because a) this was the right product design and I always believe in optimizing for the long-term and b) I had confidence in the team(s) to be able to get this work done in time for Visual Studio 2005.  This is a classic example of how we are listening to your feedback that results in a better product for all of us.

I cannot stress to you how true and honest a statement this is.  This issue would not have been addressed had it not been for the amazing feedback we recieved from some amazingly helpful people.  There were several that i can think of, but i definitely wanted to call out one person in specific:

Stuard Ballard took the time on several occasions to send us the message that our Nullable solution was unsatisfactory.  However, instead of just saying "it sucks" and leaving it at that.  He willingly engaged us and took quite a lot of time to write up a full and detailed explanation of why is sucked, and why he felt that it was an unnacceptable solution for him and the rest of the development community.  He even wrote up a great blog post on the subject that drilled down into many different areas where our Nullable implementation was unsatisfactory.  This page was sent out to the entire language design group where we discussed it on many occasions.  While we were aware of the limiations of our original Nullable implementation, we had previously existed in a sort of limbo state where we felt the problems were unfortunate, but acceptable.  And, when we were considering the cost of "doing it right", we felt that this might be a case where it was OK to get it slightly wrong since we could do it so cheaply.  Great community members like Stuart told us, unequivocally that it wasn't. 

Thanks Stuart!  Thanks for letting us know that you woudn't let us settle for "good enough."  With your help we'll have made the VS2005 release that much better for everybody.  When it comes to C# 3.0 i hope that we'll be doing a lot more of this since the benefits are so fantastic to all.

More Posts Next page »
 
Page view tracker