Welcome to MSDN Blogs Sign in | Join | Help

Not too long ago, I did another Channel 9 video on Dynamic in C# with another compiler dev, Chris Burrows. In this video, we discuss the making of dynamic, as well as some of the drawbacks, design decisions, and philosophies behind the feature. Enjoy!

link: http://channel9.msdn.com/posts/CharlieCalvert/CSharp-4-Dynamic-with-Chris-Burrows-and-Sam-Ng/

One of the things I love about my job is that I get to make people happy. How do I do that? By giving them what they want of course! One of the things I don’t like so much about my job is going back on a decision that we made before, and having to revert some of the behavior. Well, we’ve been talking about COM interop in C# 4.0, and are now in the thick of things. Today we’ll chat about a feature which many people have asked for in the past, but we’ve stayed away from until now – Indexed Properties.

What are indexed properties?

Great question! Indexed properties are exactly what they sound like – they’re properties that you can index into. In a sense, you can think of them as an atomic form of accessing a property, and then indexing off of the return value of the property. The following is an example of an indexed property access:

myObject.MyIndexedProperty[index];

Indexed properties under the covers are simply accessor methods and a property, much like regular properties, except that both the accessor methods and the property have extra parameters. Or another way to look at them is that they’re just regular indexers except with a name other than “Item”.

Why didn’t we add them earlier?

We didn’t add these guys earlier because we felt that this wasn’t the design principle that we wanted people to conform to. We believed (and still do!) that the correct meaning of our sample statement is “I’m accessing a property named P off of my type C. Once I’ve got a resulting value, I’m going to index off of it.”

So what’s the issue here?

The issue is that while we firmly believe that the current C# semantics are the right ones, we unfortunately have a large legacy set of libraries that have shipped with indexed properties – COM.

We’ve applied the same philosophy to indexed properties that we discussed last time, and have put support in the language for them only in COM interop scenarios.

On the declaration front, we have not provided the ability to declare these guys. Since true COM libraries are created via tlbimp, these should never be created in C#.

On the consumption front, the compiler will now import anything that looks like an incdexed property off of a COM type (ie a type with the tdImport flag), and allow it to be used via the named indexer syntax. Keen readers will note that the above sentence reads “the compiler will allow the indexed property to be consumed via named indexer syntax”, and not “the compiler will treat the indexed property as a property”. The difference being, of course, that true properties cannot be accessed via their accessor methods directly, but in order to maintain backwards compatibility, accessing the accessor of an indexed property directly must still be allowed.

Overload resolution

Let me go into a bit more depth and using our sample statement again describe the changes that we’ve made to overload resolution to allow these guys to be bound.

The first thing the compiler does when it sees the statement is it binds the left side of the dot, and resolves myObject to its type, say MyType. Once it’s done that, it goes and looks up everything with the name MyIndexedProperty on MyType.

In order to maintain backwards compatibility, if the compiler finds something other than an indexed property, it will go ahead and bind to whatever it found. Indexed properties in that sense are like extension methods – they only apply when all the normal binding steps have failed.  Notice however, that this happens on name lookup time, and not on the indexer argument resolution time. More explicitly, if the compiler found a regular property named MyIndexedProperty of type C, and C does not have an indexer on it, the compiler will not bind to an indexed property named MyIndexedProperty. It will provide the same error you would have gotten with the C# 3.0 compiler – namely that there is not an indexer on the type C.

Once the compiler has verified that “normal” binding (ie binding that we would have done without considering indexed properties) has failed, it then checks the type MyType to see if it’s a ComImport type. If it is, then the compiler looks at all available indexed properties and performs the regular overload resolution algorithm on the indexed property candidates much like it would for regular indexers. It simply treats the candidates as indexers with names.

IL Generation

Because we bind the indexed property like named indexers, we generate IL as if we were calling an indexer that has a name – namely, we call the get_Index method, but with the name of the indexed property.

In our example, the IL we generate would simply be a call to myObject.get_MyIndexedProperty(index), which is exactly what users had to write in previous versions of the language. Because indexed properties (and indexers in general) are really just syntactic sugar for calling accessor methods, nothing special needs to happen here both from a compiler standpoint and a runtime standpoint.

So really, these guys are just sugar?

Yep. These guys are just nice syntactic sugar. The great thing about sugar though, is that it really is sweet. Let me just leave you with a comparison of a common scenario.

// C# 3.0
myObject.set_MyIndexedProperty(myObject.get_MyIndexedProperty(index) + 1);
// C# 4.0
myObject.MyIndexedProperty[index]++;

The indexed property syntax is much nicer isn’t it? If you’re a COM programmer, I’m absolutely positive that you’ll agree!

kick it on DotNetKicks.com

Wow, it’s been a while since I’ve last posted! Don’t worry, I’m still alive and kickin’, and we’re still workin’ on cool stuff for y’all to use.

Let’s take a bit of a recap of how far we’ve come. We’ve chatted about dynamic binding in C# and how that all plays in with the DLR, and about named and optional arguments and how they change the way methods are bound. The only other major piece in C# 4.0 is this notion of COM interop. We chatted about how dynamic really is a gateway to interop with different object models and languages (ie interacting with dynamic languages, dynamic object models, javascript objects, HTML DOM etc), but in C# 4.0, we want to go a bit further and provide you a few more tools to help make your interop life much easier.

These remaining features that we’ll chat about all have a strong tie to the COM world – that is, the features themselves require that the objects that you’re playing with are COM objects. How do we determine that? Well, you’ll soon find out!

Keep on rockin’ in a… COM… world!

Alright, I admit it – I’m a Neil Young fan. The man wrote some great tunes! Anyway, the point is that Rock’n Roll ain’t goin’ nowhere. And either is COM. No matter how hard we try to get rid of it, it just won’t die! So we decided this time around that instead of trying to beat ‘em, we might as well join ‘em.

We’ve therefore created several features that are geared towards making your COM programming experience much easier. First let me list them:

  1. Passing non-ref arguments to by-ref parameters (aka COM omit ref in our world)
  2. Indexed properties
  3. No PIA deployment

The first two of these are comparatively smaller features from a complexity standpoint, so we’ll tackle those one at a time. We’ll discuss the No PIA feature in length in the several posts following.

Passing non-ref arguments to by-ref parameters

The first feature is really an acknowledgement that the APIs generated for COM interop are quite poor. For those of you who have worked in any amount of detail with the Office PIAs, you’ll quickly realize that for some reason, just about everything is passed around by ref.

This can get quite tiresome! Lets look at a quick Office example.

static void Main()
{
    Word.Application app = new Word.Application();
    object m = Type.Missing;
    app.Documents.Add(ref m, ref m, ref m, ref m);
}

This is such typical code! I have to struggle with the type system to make it happy, just to add a simple Word document!

C# 4.0 makes this easier. The compiler will now determine that you’re working with a COM object by looking at the attributes of the type of that object, and checking to see if it has the [ComImport] attribute. Once it determines that you indeed are working with a COM object, it then gives you the ability to pass arguments to the method, index or property (yes, properties can have arguments a la indexed properties! We’ll talk about that later!) without giving them by ref.

This is really just compiler syntactic sugar – the compiler does the work to generate the temporaries for you, and slot them in place of the actual arguments. That means that in the following code, call (1) gets transformed into call (2).

static void Main()
{
    Word.Application app = new Word.Application();
    // (1) Your initial call.
    app.Documents.Add(Type.Missing, Type.Missing, Type.Missing, Type.Missing);

    // (2) Compiler generates locals and inserts them.
    object m = Type.Missing;
    app.Documents.Add(ref m, ref m, ref m, ref m);
}

The great thing about this too, is that with the introduction of named and optional arguments, and using the fact that the feature generates Type.Missing in place of default values for object on COM types, we can simply remove the arguments altogether!

static void Main()
{
    Word.Application app = new Word.Application();
    // (1) Your optional-parameter-omitted initial call.
    app.Documents.Add();

    // (2) Compiler generates locals and inserts them.
    object m = Type.Missing;
    app.Documents.Add(ref m, ref m, ref m, ref m);
}

Pretty cool stuff huh? Definitely makes programming against the Office APIs much nicer. The added bonus is that the IDE helps you out by letting you know that the parameters are optional, indicating that you can omit them, and indicating what the default value used in place will be.

So why now?

Let’s get into a bit of a philosophy discussion now, about why we’re doing all these different COM interop features now. In the past, we’ve been asked for these features – nay, we even pushed back against some of them.

For example, indexed properties is something that the VB language has had support for for quite some time now, but C# had decided that they weren’t the right way to go. Our standpoint was (and still is!) that the programming paradigm ought to be that the property is accessed, and that it is the thing that should supply the indexing.

So why are we adding all these features in now?

Well, for starters, COM’s pretty entrenched in the application programming world today. Many developers are having to struggle with the COM APIs and Office programming models every day, and it just doesn’t look like that model is going to be replaced, or will go away any time soon.

Next, C# 4.0 is really an interop release. Our focus this time around was to be able to interop with different programming languages and programming models. It seemed only fitting that one of the largest programming models still out there ought to be pretty high up on our list of priorities.

Dynamic binding allows for ease of interop with other object models and dynamic languages. The ability to use named and optional arguments allows ease of interop with legacy libraries like COM which have a lot of optional parameters, and large parameter lists. The introduction of the DLR, whose sole purpose is to provide a common runtime for interop with dynamic languages. All of these point us towards interop, and combined with the knowledge that COM interop has been a big pain point for our users really tipped the scale to pushing more interop features out this time around.

Agree? Disagree? As always, with philosophy things (and with everything else for that matter), I’d love to get your feedback. Until then, happy coding!

kick it on DotNetKicks.com

Okay, my attempt at a clever title failed… Ties and Philosophers? I oughtta stick with technical writing. :)

We’re almost done with our chat about named and optional arguments. We’ve covered what the feature is about, and covered overload resolution in more detail. This time I want to do a quick wrap up of our discussion by talking about the tie breaker rules, and then I want to give a bit of background and philosophy behind why we’re electing to do this feature now, instead of several releases ago when it was first considered.

Tie breakers

During overload resolution, the compiler may find that there are two or more candidates that are perfectly legal candidates given the arguments specified. When that happens, we apply our tie breaker rules to figure out which of the methods is the best one.

In the past, breaking ties was simple (well, not really, but simpler than it will be now!). Because every candidate had the same number of arguments, (again, not really – param arrays! I’ll explain that in a sec) we simply needed to apply our betterness rules to the conversions to see if we could find a best method. This check amounted to taking each pair of candidates, and checking if either of them has equal-or-better conversions for each parameter. If so, that method was considered the better choice, and was chosen as the best method to bind to. Optional arguments throw a wrench in the mix – now we can have two candidates that are both perfectly valid, but have a different number of parameters.

First, a quick example:

class C
{
    public void Foo(object o) { }
    public void Foo(int x) { }

    static void Main()
    {
        C c = new C();
        c.Foo(10);
    }
}

In this example, both overloads of Foo are applicable – the integer 10 is convertible to object, and is certainly convertible to int. The traditional tie breaker rule simply says that since 10 is an integer, the conversion to int is better than the conversion to object. Therefore Foo(int x) wins.

Param arrays

Well what about parameter arrays? Lets tweak our example a little bit:

class C
{
    public void Foo(int x, int y) { }
    public void Foo(params int[] x) { }

    static void Main()
    {
        C c = new C();
        c.Foo(10, 20);
    }
}

In this example, the compiler takes the params array, and expands it so that the signature looks like Foo(int x_1, int x_2), and uses that signature as the candidate signature. However, it also notes that the signature came from a params array, and param arrays are treated as second-class compared to real parameters. Since everything else is equal, the second-class-ness of the params array loses the tie to the first class parameters.

Optional arguments

So how do optionals play into the mix? Well lets consider this third example:

class C
{
    public void Foo(int x) { }
    public void Foo(int x, int y = 0, int z = 10) { }

    static void Main()
    {
        C c = new C();
        c.Foo(10);
    }
}

In this example, both candidates are applicable, since they both match for the first argument, and the second candidate has optional arguments for the rest. The conversion for the first argument is the same for both candidates, and there are no param arrays. How do we pick which one is better?

Much like param arrays, the thing to do here is to treat optional arguments like param arrays – as second class arguments. Because the first method has no optional arguments and the second one does, we pick the first one.

Notice that if both candidates had optional arguments, at that point we’ve got nothing more to go on, and so the call is ambiguous.

Philosophy time!

Okay! Time for some philosophy. Why are we doing this feature now? Why didn’t we do it earlier?

Let me tackle the second question first, as it’s got a shorter answer. The reason we didn’t do it earlier is that we really didn’t want this feature in our language. We’ve pushed back on it for this long because it’s not the paradigm that we want.

So that brings us to the first question – why now?

Well, the quick answer is because of COM. It just won’t go away! Try as we might, people still use it (and are still going to continue using it). What does COM have to do with C#? Office. Office PIAs. The Office PIAs are designed such that many of the methods have about 30 parameters, and all of them are optional. Most of the time, what you’d want to do is specify one argument, and use the defaults for the rest.

Enter named and optional arguments. Because we allow you now to call methods without specifying their optional arguments, you can now call these Office methods without passing Type.Missing in as every other argument. And because we allow you to use names to specify exactly which parameter you’re passing this argument for, you can call the method by only passing what you want, and omitting the rest.

Combine this with our COM-no-ref feature in which the compiler takes the optional ref parameter and generates a local temporary variable for you and passes that as the parameter, your COM code will look much cleaner and will be much less tedious to write.

As I mentioned way back when, one of the big themes for C# 4.0 is interop with other “runtimes” – COM, dynamic languages, other user-defined type systems. That theme made this feature set a must-have.

As always, I would love all the feedback you guys have! And of course, happy coding!

kick it on DotNetKicks.com

Last time we talked about the basics of named arguments, optional arguments, and default values. From here on out, I’m just going to refer to the whole feature group as “named and optional arguments” – it’s just too much typing otherwise (we actually just refer to the feature as N&O internally). Let’s now dive a little deeper into how overload resolution works for the feature.

Where do we get those wonderful names?

The first thing we need to figure out, is where we get the names from. Because the CLR does not consider parameter names as part of the method signature, it is perfectly legal to override a method and specify different parameter names than the base method that one is overriding. Lets consider the following:

public class Animal
{
    public virtual void Eat(string foodType = "Grub") { }
}

public class Monkey : Animal
{
    public override void Eat(string bananaType = "Green banana") { }
}

This is perfectly legal C# code. So then which names do we pick? Do we pick the ones from the base method? Or the most derived one? What about interfaces? Let’s consider a usage of named arguments with this example.

public class Program
{
    static void Main()
    {
        Monkey m = new Monkey();
        Animal a = m;

        m.Eat(bananaType:"Ripe banana");
        a.Eat(foodType:"Yummy grub");
    }
}

If we consider the receiver (aka the calling object/type) as the anchor in this whole scheme, and use its statically determined type to figure out the names, then we’ve got ourselves a nice little scheme that is deterministic and quite sensible.

In our example then, because m is statically typed to be of type Monkey, m.Eat gets the names of the parameters from the Monkey class, and so bananaType is the correct name to be using. Similarly, a is typed Animal, and so the call to a.Eat gets the name foodType as its parameter. Notice that even though their runtime types will be identical (that is, we’ve taken an instance of Monkey and assigned it into an Animal local variable), that doesn’t matter – the named parameter feature is simply a syntactic sugar for a compile time rewrite.

So what happens under the covers?

Lets take a look at the example that we used last time:

public class ContactList
{
    List<Contact> SearchForContacts(
        string name = "any",
        int age = -1,
        string address = "any") { ... }

    static void Main()
    {
        ContactList list = new ContactList();
        var x = list.SearchForContacts(age:26);
    }
}

What actually happens under the covers here? When the compiler sees the call to list.SearchForContacts, it first performs a few quick validations to make sure that any positional arguments (arguments not specified by name) occur before any named arguments, and that no names are specified twice. Then it generates a set of all applicable candidates. In our example, there is but one candidate to consider. Then for each candidate, the compiler looks at the arguments, and performs a few verifications.

First it checks to make sure that the names specified in the call (in our case, “age”) is valid for the candidate (ie the candidate has a parameter named “age”). Next, the compiler moves past all positional arguments attempts to match each named argument up with its corresponding parameter. Of course, all named arguments must match parameters who do not have a positional argument specified for it (ie you cannot specify an argument for the same parameter twice).

For each parameter that does not have a corresponding positional or named argument, the compiler checks to make sure it is optional. It then uses the default parameter value for each of those arguments.

Once the compiler has generated this augmented argument list, it performs argument convertibility on it as usual. The resulting augmented list in our example then, is: “any”, 26, “any”.

Just compiler magic

I love magic. I love the concept of magic. It amazes me. I went to Disneyland recently for PDC, and it was magical. I know it isn’t real, and that something else is happening, but I love it anyway. That’s what named arguments are like. Underneath the covers, there is no trace of named (or optional) things – it all looks like straight IL, as if you called the methods with the augmented argument list.

That means that after the compiler has generated the augmented list for you and has found the best candidate, it treats the call as if you had called it with the augmented list, and everything else behaves as it used to.

That being said, it means that this feature is totally a compile time syntactic sugar, and so it doesn’t introduce any new dependencies, and doesn’t introduce any new compatibility issues or anything like that. Programs that you compile against one set of names and default values will absolutely continue to keep working even if the library changes and has a new set of names and default values.

One more piece of magic

Turns out there’s one more crucial thing that the compiler does for you, which is really worth mentioning. Order of evaluation. Lets consider the following example:

public class C
{
    static void Main()
    {
        C c = new C();

        c.M(z:Foo(), x:Bar(), y:Baz());
    }

    void M(int x, int y, int z) { ... }
}

What happens here? Well, the compiler will reorder the arguments so that the result of Bar() gets passed as the first  argument, the result of Baz() gets passed as the second, and the result of Foo() gets passed as the third. However, what order should these sub-expressions be evaluated? You’d expect them to be evaluated as written – Foo first, then Bar, then Baz.

And that’s exactly what we do. We essentially create temporaries which store the value of evaluating each of those sub-expressions (which means that all side effects of evaluating each expression happen in syntax order as you’d expect), and reorder those temporaries to match their respective named positions. See? Magic!

So hopefully that gives you a good feel for how the feature works, and how overload resolution for it works. Have fun with it, and definitely give us your feedback!

kick it on DotNetKicks.com

C# 4.0 introduces the concept of optional parameter values into the language. Now, this has been a controversial subject in the past, and we have had many requests for the feature, but have traditionally stayed away from it. So, why now?

Well, before we get into the philosophy of why we decided to add it this time (which we will! I promise!), first lets discuss the feature itself.

I'll first let's talk about the features themselves, and will follow that with a brief discussion about the focus of C# 4.0 and how these features align with that focus. I'll expand in depth as to how each feature works, and what the compiler does behind the scenes in future posts.

Default parameter values

We've all had the experience of writing (or at least using) methods whose parameters are optional in nature. Often, those methods will contain overloads which will have the optional parameters removed, and will simply be a wrapper for the actual method, passing the default values that the library writer wants to provide for the method.

Default parameter values give us a way to explicitly declare the default value that we would like to be used for the parameter in the method signature itself, instead of in the forwarding method. This gives the added benefit of allowing the IDE to help out and inform the consumer of the default value for the given parameter.

These guys are specified in one of two ways:

public class C
{
    static void Foo([DefaultParameterValueAttribute(10)] int x, int y = 20) { }
}

The first mechanism is purely a backwards compatibility issue and an interop issue with COM and VB. We highly recommend against using this method, however, because it specifies that there is a default parameter value for the parameter, but does not specify that the parameter is in fact optional. This means that the default value given will never be used.

The second mechanism really says two things - first, it tells the compiler that this parameter is optional, meaning the user does not have to specify an argument in this position. Second, it gives the compiler the value to use when the user does not specify an argument for the parameter.

How default parameter values are encoded

Because the new syntax really does two things, it is encoded in the metadata as two things. We encode both the optional flag in the metadata signature (which is equivalent to using the OptionalAttribute attribute), and we encode the default value as the DefaultParmeterValueAttribute attribute.

These two mechanisms have already existed in the CLR, and are in fact what the VB compiler produces today, giving us an easy choice for how to encode the feature.

Note that the compiler will give you an error if you try to specify both a DefaultParameterValueAttribute attribute as well as a default value using the new syntax.

Also note that the compiler enforces that all optional parameters using the new syntax are specified after all required parameters.

Lastly, the compiler will not allow you to specify default parameter values for ref or out parameters. This is because there is no valid constant expression that is convertible to the ref or out type.

Optional arguments - how default parameter values are used

In order to make use of the default parameter values, we've added a feature which allows the caller to omit arguments for parameters which have default parameter values specified.

Notice that because we enforce that optional parameter values are specified at the end of the parameter list, you cannot omit an argument but specify an argument for a later position.

This means that even though existing libraries may have specified the DefaultParameterValueAttribute and the OptionalAttribute for a parameter in the middle of the list, the C# compiler will not allow you to call that method without specifying a value for that parameter.

You can kind of think of optional arguments as a params array - the compiler will allow you to call the method without specifying the arguments, but they must be at the end of the parameter list.

What gets code gen'ed?

First we need to note that the use of optional arguments is really just syntactic sugar. The compiler cannot generate a call without actually providing all the arguments - it simply provides an argument for you and allows you to omit specifying it.

Once the compiler realizes that you're calling a method and omitting an argument because it is optional, it takes the value specified in the DefaultParameterValueAttribute and encodes that as a constant value for the argument that you've omitted. Note that if you're calling a library that doesn't have a default value specified but is still optional, the compiler will use default(T) as the value, where T is the type of the parameter. Also note that for COM calls, we also allow you to omit arguments to ref parameters by generating the temporaries for you.

Putting it all together - Named arguments

The ability to omit specifying arguments for parameters that have a default value is really taken advantage of by the Named arguments feature. This feature allows you to specify by name the parameter for which you are providing a value. This means that you can now omit specifying arguments for all parameters, and only specify the ones which you actually care about.

For COM programmers, this is like heaven! I recently "got" to play with Office interop a bit, and found that the average method had 30 parameters! Most of the parameters are wonderfully optional, and with the ability to now omit them, code looks much cleaner and is much more readable.

Sorry, I'm getting ahead of myself. First let me describe the feature.

Named argument usage

This feature introduces new syntax at the call site of a method. Consider the following example:

public class ContactList
{
    List<Contact> SearchForContacts(
        string name = "any",
        int age = -1,
        string address = "any") { ... }

    static void Main()
    {
        ContactList list = new ContactList();
        var x = list.SearchForContacts(age:26);
    }
}

We've got some library method that searches through some contacts and finds the ones that match our criteria (I know, its a horrible API, but bear with me). The library is great! ;) It specifies the default wildcards for us so that we can omit them as we wish. Now suppose we want to find all contacts aged 26.

Well, since we now have optional arguments, we can omit the arguments after the age parameter, but that gets us half way there. Named arguments get us the rest of the way.

By using the new syntax, we can specify by name the argument that we want to specify a value for, and omit the rest. The syntax is quite simple (well, not really, but I'll get into the details of its complexities later): simply specify the name of the parameter that you want to specify an argument for, add a colon, then add the value that you want.

Note that the value can be any expression type that you normally could have specified. Note also that named arguments are not restricted to parameters that are optional or have default values. Lastly, note that you don't even have to specify the arguments in order! We could call our method with the following:

list.SearchForContacts(address:"home", name:"sam", age:30);

The arguments don't have to be in any particular order. The only rule is that the named arguments must be specified at the end of the argument list. This means that all positional arguments (the "normal" ones that aren't specified by name) must be given first. Once the compiler encounters a named specification, it will produce an error upon encountering any further unnamed arguments.

So that's the feature in a nutshell. There are more details that I'll get to later, but in the mean time I'd love to get your feedback on the feature, on its uses, and on how we've chosen to design it.

And as always, happy coding!

kick it on DotNetKicks.com

By now, my hope is that you all have a well-rounded view of dynamic. We started this series by introducing dynamic and talking about the basics of the feature, and have just finished talking about some of the feature's limitations with the intent that giving both the good and the bad will help us gain a firm understanding of the topic.

So what more is there to talk about?

The thing that's been occupying my thoughts lately is the semantics around the phantom method. Recall from our previous discussion of the phantom method that it is the method which the compiler binds to when there is a dynamically typed argument with a static receiver. What exactly happens when the compiler determines that it needs to bind to this method? What checks will the compiler do? What checks should it do?

Static or dynamic?

The first question we need to ask ourselves is a somewhat philosophical one. Should the compiler treat binding against the phantom method as a dynamic operation with some static parts? Or a static operation with some dynamic parts? A quick example will allow us to consider both views:

public class C
{
    static void Main()
    {
        C c = new C();
        dynamic d = 10;

        c.Foo(d, 10);
        d.Foo(10, 10);
    }

    public void Foo<T, S>(T x, S y) { }
}

The first call is a statically known receiver with a dynamically typed argument. The second is a dynamically typed receiver with a statically known argument.

A 'dynamic' phantom

Lets consider the former position first. When the compiler binds the call to Foo against a dynamic receiver, it has no knowledge of the receiver's members. It therefore does not check for the existance of a Foo member, and does not perform checks like accessibility, arity, argument convertibility, or method type inference.

If the compiler were to consider binding against the phantom method as a dynamic operation with some static parts, then it should treat the first call in the same manner. This means that once the compiler encounters any method call with a dynamic argument, it should stop checking these things against the receiver, even though it knows the receiver's type at compile time and can therefore determine these things.

Seems counterintuitive doesn't it? If the compiler has the information, why wouldn't it use it to help give the user good diagnostic information at compile time? Ok, ok, I'm clearly biased and think that this is the wrong approach. :) Lets consider the second approach then.

A 'static' phantom

This position argues that the compiler use whatever static information it knows to help give diagnostics to the user wherever it can.

This means that it should perform name lookup on the receiver to make sure there is a method named Foo on class C.

It should check that the method Foo is accessible from the current location.

It should do an arity check to make sure that there is a Foo that takes two arguments.

It should do method type inference to determine as much information about the type parameters as it can. In our example, this means that the compiler will infer S to be type int, but not be able to infer T.

It should check that any non-dynamic arguments are convertible to their respective parameter types. In our example, this means verifying that the second argument is convertible to type int, since we inferred S to be int.

It should check the constraints of the method type parameters against the argument types.

Use the static information!

Though there may be some dynamic language guys that think we should be in the first camp, I'm of the opinion that C# is a static language, so lets stay in the latter camp and use as much static information as we can.

Moreover, we've already decided that for the method call off of a dynamic receiver, the compiler will encode the static types of the arguments so that the static types and not the runtime types will be used for overload resolution.

If we look on our little checklist above, all the items seem pretty straightforward - all but one (in my opinion). Method type inference. How should the type inference algorithm be altered to infer the most that it can, giving errors where it can guarantee that the code will never succeed, no matter what the runtime arguments?

Currently in our working compiler, we simply ignore type inference. That is, we skip type inference, and upon encountering a type parameter, we assume it is convertible at compile time. This can produce some unexpected behavior!

Consider the following example:

public class C
{
    static void Main()
    {
        C c = new C();
        dynamic d = 10;

        c.Foo(10, d);
    }

    public void Foo<T>(T t, int x) where T : class { }
}

One would expect that this produces a compile time error - the integer 10 given as the first argument will never satisfy the constraints to the type parameter T. However, we currently allow this call to compile successfully and fail at runtime!

Quite unexpected, yes?

Well, turns out we need to figure out a good way to have the type inference algorithm behave when it sees dynamically typed arguments in order for this to work.

A modification to the type inference algorithm

Here's where things get fun. Our current type inference algorithm has two results: pass or fail. We now need to introduce a third result: inconclusive.

Because I am the strongest believer that the current behavior is unacceptable, and that we need to make this change to type inference, it fell on me to come up with a reasonable proposal for the design team, and to see if I can usher it through.

So here goes!

I propose that we add the following behavior to the type inference algorithm. If the type of the argument is dynamic, then take all type parameters in the corresponding parameter's constructed type and mark them as inconclusive. No errors will be reported on inconclusive type parameters, and no constraint checks will be performed on them in the constraint checking phase.

Note that this proposal says nothing about types constructed over dynamic. For example, if we supplied an argument of type List<dynamic> to a parameter expecting an IEnumerable<T> where T : struct, then we would not mark T as inconclusive, and therefore report the error that the constraint is not satisfied.

A complex scenario

Lets consider a slightly more complex scenario.

public interface IAnimal { }
public interface IWatcher<in T> { }
public class Watcher<T> : IWatcher<T> { }

public class C
{
    static void Main(string[] args)
    {
        C c = new C();

        IWatcher<Giraffe> a = new Watcher<Giraffe>();
        IWatcher<Monkey> b = new Watcher<Monkey>();
        dynamic d1 = 10;
        dynamic d2 = new Watcher<Mammal>();
        IWatcher<dynamic> d3 = new Watcher<dynamic>();
        c.Bar(a, b, d1); // (1)
        c.Bar(a, b, d2); // (2)
        c.Bar(a, b, d3); // (3)
    }

    public void Bar<T>(IWatcher<T> w1, IWatcher<T> w3, IWatcher<T> w2) where T : IAnimal { }
}

public class Mammal : IAnimal { }
public class Giraffe : Mammal { }
public class Monkey : Mammal { }

In this example, the first two examples contain an argument that is typed dynamic. However, notice that we cannot simply ignore the dynamic argument and ignore the third parameter for method type inference.

If we were to do that, the type inference algorithm would first determine that the candidate set for T is {Giraffe, Monkey}. However, even though there is a common base class (ie Mammal), C#'s type inference algorithm requires that the base class be in the candidate set in order for a successful inference. Type inference would therefore fail at compile time.

In the first call, this is all fine and good - runtime type inference would also fail on the call. However, the second call will succeed at runtime! Because the runtime type of d2 is Watcher<Mammal>, Mammal is added to the candidate set. And because IWatcher is covariant on T, choosing T to be Mammal satisfies argument convertibility for each of the three arguments.

The third call will fail at compile time, because the candidate set for T is {Giraffe, Monkey, dynamic}, and T is not marked inconclusive. Type inference will infer T to be dynamic, since it is the common base class and IWatcher is covariant. However, constraint checking will fail, since dynamic is not an IAnimal.

Questions? Thoughts? Pick it apart!

This being my first real attempt at directly changing the spec and not just being a voice on the voting party, I'd greatly appreciate your thoughts and comments! Are there flaws in my proposed argument? Are there refinements that can help make the algorithm more robust? Are there clear scenarios that break down (ie we report an error at compile time, but the call can succeed at runtime)? Please send me your thoughts!

And as always, happy coding! And have a Merry Christmas!

As I mentioned last time, there are a few gotchas that we'll need to look at in order to get a full understanding of the dynamic feature and its capabilities. Today we'll take a look at some of those limitations. As we go along, I'll try to shed some insights as to how the decision making process came about, and why we feel these calls are the right ones.

Mutating values using dynamic

Consider the following code:

static void Main()
{
    dynamic d = 10;
    d++;
}

What should happen here?

Intuitively, we'd expect d to contain the value 11. However, recall that under the covers, dynamic is really object. That means that the first line will generate a boxing conversion from int to object. The local variable d contains a boxed copy of the integral value specified. The second line, then, is really an application of the ++ operator on the local variable d. At runtime, calling operator ++ on d will result in an unbox of the object d into an integer, and a call to the operator ++ on the unboxed value. But this value is not copied back inside the box!

Turns out that what you expect to happen isn't really what would happen if we naively implemented this in the runtime binder. Good thing your trusty C# compiler team isn't naive!!

The solution to this little problem then, is essentially to pass things by ref to the runtime binder so that the runtime binder can write back into the value. This gets us half-way there - we still have the problem of boxed structs. Luckily for us, the architecture of the runtime binder is such that we return expression trees to the DLR (for more information on this, read my post that talks about this). There is an expression tree that performs an unbox and modifies the boxed value, and puts that value back into the box, allowing you to mutate the boxed values of these things.

Nested struct mutation

If we take our above example to the next level however, things start getting a bit trickier. Because of the nature of our architecture, dotted expressions get broken up into parts and bound in segments. So that means that for the expression A.B.C.D, the compiler will encode a site for A.B, use that result as the receiver for a second site for .C, and use the result of that as the receiver for the third site for .D.

That seems like a sensible architecture, doesn't it? Indeed, it is the same architecture that the compiler uses when it does it's binding. However, the runtime architecture has the limitation that it does not have the ability to return values by ref. (Well, this really isn't a limitation in the CLR, as they already have the support for this. This is more a limitation in the .NET languages as none of them provide the ability to have ref returns).

This means that if any of the dotted expressions were to bind to a value type, that value will be boxed (and hence a copy would be made), and further dots into it would be made on the copy of the value, and not the initial value as would be expected. Consider the following code:

public struct S
{
    public int i;
}

public class D
{
    public S s;
    public static void Main()
    {
        dynamic d = new D();
        d.s = default(S);
        d.s.i = 10;
        Console.WriteLine(d.s.i);
    }
}

We would intuitively expect the value '10' to be printed in the console. However, the value '0' is printed instead. We're currently working on determining the best way to fix this issue, and are also debating whether or not this is a critical enough of a scenario to fix.

The rule of thumb? Remember that dynamic is like object, and so boxing happens!

Base calls

There is a restriction in the CLR that prevents the compiler from generating non-virtual calls on virtual methods. This means that there is no way to call a base overload dynamically. This means that one cannot call a base call with any dynamically typed arguments, as it will trigger a dynamic binding.

The possible solution (which we have chosen not to implement) would be somewhat akin to the solution we performed for lambdas. Recall that if you had the lambda: x => base.M(x), the compiler will generate a private method that performs the call to the base access, and will have the lambda body call the generated method. The down side, however, is that for lambdas, we knew exactly which call the user was trying to make. In the dynamic scenario, we would be doing overload resolution at runtime, and so we would have to generate a base call stub for each possible overload. This solution is quite ugly, and since we currently lack an extremely compelling scenario, we have opted not to do this and simply give a compile time error when the user attempts to make a base call with any dynamic arguments.

Explicitly implemented interface methods

As one avid reader commented in one of my previous posts, explicitly implemented interfaces kinda get the shaft again here. Because interfaces are really compile time constructs, and have no runtime representation, explicitly implemented interface members get the short end of the stick at runtime. Consider the following:

interface IFoo
{
    void M();
}

class C : IFoo
{
    void IFoo.M() { }
}

Because of the way the compiler implements explicitly implemented interfaces, C.M gets its name removed (making it uncallable via a C pointer). Now this is fine at compile time, because the compiler can see when a receiver is known to be an IFoo pointer. However, at runtime, there is no notion of interfaces, and so there is no IFoo available for the runtime binder to use to dispatch methods. Combined with the fact that C.M's name has been removed, this makes the method entirely uncallable dynamically.

Accessibility

This last topic isn't really a limitation yet. We are still working on drawing the line between doing the pragmatic thing and doing the most consistent thing on this issue. The CTP implementation of dynamic currently performs accessibility checks only on the member that you are accessing. This means that the runtime binder checks to verify that any member you're trying to use is public.

Namely, we do not do accessibility checks on the type itself (ie if you really ought to be able to access the type of the object in your current context, or should it just look like an opaque object to you), and do not allow any non-public members to be used dynamically.

The down side of this scenario is that you could make a call with a static receiver to a private method that you know you can access from your context, but because a dynamic argument is given, the runtime binder will prevent you from calling the method. Below is an example:

public class C
{
    private void M(int x) { }

    static void Main()
    {
        dynamic d = 10;
        C c = new C();
        c.M(d);
    }
}

When the compiler encounters this expression at compile time, it will do the verification and know that C.M is accessible from the calling context, but because the argument is dynamic, the call gets resolved at runtime. Because of the public-only policy of the current binder, overload resolution will not bind to the method.

Conclusions?

As always, I love getting your feedback, whether positive or negative. But this post in particular I would love to get your thoughts on! The design is not set in stone, so any of your thoughts will definitely be personally brought to the design team by yours truly. Thanks for your comments in advance, and as always, happy coding!

kick it on DotNetKicks.com

Now that we're all experts in how dynamic invocations work for regular method calls, lets extrapolate from our previous discussion about phantom methods a bit and take a look at how those basic concepts apply to other dynamic operations.

Today we'll just go through a laundry list of each type of operation, and throw in a few caveats and gotchas (limitations really, but that's such a negative word) that come along with the whole package. As always, I'll try to give some insights as to why we made the decisions that we did, and if there are workarounds for certain scenarios, I'll definitely point them out.

So without further ado, lets hit the list!

Properties

Named properties take the form d.Foo, where d is some dynamic object and Foo is some name for some field or property that lives on the runtime type of d. When the compiler encounters this, it encodes the name "Foo" in the payload, and instructs the runtime binder to bind off of the runtime type of d.

Note however, that named properties are always used in context! You can do one of three things with these guys - access the value, assign a value to the member, or do  both (compound operations, such as += etc). The compiler will thus encode the intent of the usage in the payload as well, so that the runtime will allow you to bind to a get-only property only if you're trying to access it, and will throw you an error if you're trying to assign to it.

The thing to note here is that the compiler will treat any named thing the same, and allow the runtime to differentiate between properties and fields.

The return type of these guys is dynamic at compile time.

Indexers

You can think of indexers in one of two ways - properties with arguments, or method calls with a set name. The latter is a much more useful way to think of these guys when we're dealing with dynamic. The reason is that just like method calls, even if the indexer itself can be statically bound, any dynamic arguments that don't directly map to dynamic can cause the phantom overload to come into play, and cause a late binding based on the static type of the receiver, and the dynamic types of the arguments.

They still do have some similarities to properties however - they're always used in context. As such, the compiler again will encode whether or not the user is accessing the value of the indexer, setting a value, or performing a compound operation into the payload for the binder to use.

The return type of these guys is also dynamic at compile time.

Conversions

Last time we mentioned that the although dynamic is not convertible to any other type, there are certain scenarios in which we allow it to be convertible. Assignments, condition expressions, and foreach iteration variables are a few examples.

These payloads are quite simple - because the compiler already knows the type that we're attempting to convert to (ie the type of the variable you're assigning to), it simply encodes the conversion type in the payload, indicating to the runtime binder that it should attempt all implicit (or explicit if its a cast) conversions from the runtime type of the argument to the destination type.

Note that user-defined conversions will be applied as well. We worked pretty hard to make sure that the runtime semantics will behave just like the compile time ones, so argument conversions for overload resolution and the like will all happen exactly as you'd expect.

These guys return the destination type at compile time. Note that these guys are the only guys who have a non-dynamic return type at compile time.

Operators

Operators are a bit of a strange beast. At first glance, it's hard to tell that anything dynamic is going on. However, a simple statement like d+1 still needs to be dispatched at runtime, because user-defined operators can come into play.

As such, any operation that has a dynamic argument will be dispatched at runtime. This includes all of the in place operators as well (+=, -= etc).

Note that the compiler will do the magic to figure out if you've got a member assignment (ie d.Foo += 10) or a variable assignment (ie d += 10), and figures out if it needs to pass d by ref to the call site so that it can be mutated. Note also that structs will get mutated as well! So if we were to do:

public struct S
{
    public int Foo;
}

public class C
{
    static void Main()
    {
        dynamic d = new S();
        d.Foo += 10;
    }
}

the result would be that d would point to a struct who's Foo member is set to 10.

Lastly, note that the compiler knows that if you're doing something like d.Foo += x, and at runtime d.Foo binds to a delegate type or an event type, then the correct combine/add call will be invoked for you.

Delegate invoke

The invocation syntax is very much like a method call. The only difference is that the name of the action is not explicitly stated. This means that just like calls, both of the following examples would end up causing runtime dispatches:

public class C
{
    static void Main()
    {
        MyDel  c = new MyDel();
        dynamic d = new MyDel();
        d();
        c(d);
    }
}

The first example causes a runtime dispatch of an invoke that takes no arguments. At runtime, the binder will check to verify that the recevier is indeed a delegate type, and then perform overload resolution to make sure the arguments supplied match the delegate signature.

The second example causes a runtime dispatch because the argument specified is dynamic. The compiler determines at compile time that we have an invoke of a delegate since c's type is a delegate, but the actual resolution must be done at runtime.

 

Okay, that's enough laundry listing for today. Next time we'll look at a few small caveats of things that aren't allowed in dynamic contexts. After that, I think we'll switch gears and start looking at some other VS 2010 features - named arguments, optional parameters, and more!

kick it on DotNetKicks.com

Yes, this does sound like a Star Wars movie, but no, I'm not a Star Wars geek that just likes to pull lines from my favorite movies (though I rather enjoyed Star Wars). This post will deal with what we've coined "the phantom method". It's the method that the static compiler will bind to during the initial binding phase when it recognizes that the invocation its trying to bind needs to be bound dynamically and cannot be resolved statically. It uses the rules that we talked about last time to determine what types to use at runtime.

Lets consider a simple example:

public class C
{
    public void Foo(int x)
    {
    }

    static void Main()
    {
        dynamic d = 10;
        C c = new C();
        c.Foo(d);
    }
}

When we try to bind the call to Foo, the compiler's overload resolution algorithm will construct the candidate set containing the sole candidate, C.Foo(int). At that point, we consider whether or not the arguments are convertible. But wait! We haven't talked about the convertibility of dynamic yet!

Lets take a quick segue into talking about the convertibility of the dynamic type.

Dynamic conversions

The quick and easy way to think about dynamic conversions is that everything is convertible to dynamic, and dynamic is not convertible to anything. "Wait a sec!", you say. "That doesn't make any sense!" And you're absolutely right, it doesn't make any sense - not until we talk about the special handling of dynamic in situations where you would expect convertibility.

In each of these special situations that you would expect some sort of conversion, the dynamic type signifies that the conversion is to be done dynamically, and the compiler generates all the surrounding DLR code that prompts a runtime conversion.

Lets let the local variable "c" denote some static typed local, and the variable "d" denote some dynamically typed expression. The special situations in question are the following:

  1. Overload resolution - c.Foo(d)
  2. Assignment conversion - C c = d
  3. Conditionals - if (d)
  4. Using clauses - using (dynamic d = ...)
  5. Foreach - foreach (var c in d)

We'll look at overload resolution today and explore the concepts, and leave the remaining scenarios as excercises for the reader. :)

Overload resolution

Back to argument convertibility. Since dynamic is not convertible to anything else, our argument d is not convertible to int. However, since we've got a dynamically typed argument, we really want the overload resolution for this call to be bound dynamically. Enter the phantom method.

The phantom method is a method which is introduced into the candidate set that has the same number of parameters as the number of arguments given, and each of those parameters is typed dynamic.

When the phantom method is introduced into the candidate set, it is treated like any other overload. Recall that since dynamic is not convertible to any other type, but all types are convertible to dynamic. This means that though all (well, not really all, but we'll discuss that later) of the normal overloads will fail due to the dynamic arguments being present, the phantom method will succeed.

In our example, we have one argument which is typed dynamic. We also have two overloads: Foo(int) and Foo(dynamic). The first overload fails because dynamic is not convertible to int. The second, the phantom, succeeds and so we bind to it.

Once a call is bound to the phantom overload, the compiler knows to generate the correct DLR magic to signal dispatching the call at runtime.

Only one question remains: when does the phantom overload get introduced?

Introduction of the phantom overload

When the compiler performs overload resolution, it considers each overload in the initial candidate set. If the invocation has any dynamic arguments, then for each candidate in the initial set, the compiler checks to see if the phantom overload should be introduced. The phantom will be introduced if:

  1. All of the non-dynamic arguments are convertible to their respective parameters.
  2. At least one of the dynamic arguments is not convertible to its respective parameter.

Recall that earlier we had said that it would be possible for a call containing a dynamic argument to be dispatched statically instead of dynamically. This is explained by condition 2. If the overload in question contains dynamic parameters for each of the dynamic arguments, then the binding will be dispatched statically.

The following example will not yield a dynamic lookup, but will be bound statically:

public class C
{
    public void Foo(int x, dynamic y) { ... }

    static void Main()
    {
        C c = new C();
        dynamic d = 10;
        c.Foo(10, d);
    }
}

Once the compiler has gone through each of the overloads in the initial binding pass, if the phantom has not yet been introduced, then overload resolution will behave as it always has, despite the occurrence of a dynamic parameter.

How is the phantom dispatch different than dispatch from a dynamic receiver?

It is important to note that there is a subtle difference between dispatch signaled from the phantom method and dispatch signaled from a dynamic receiver.

With a dynamic receiver, the overloads that the runtime binder will consider are determined based on the runtime type of the receiver. However, with the phantom dispatch, the overloads will be determined based on the compile time type of the receiver.

This is because of intuition - one would expect that though the arguments are dynamic, the receiver is known at compile time, and so the candidate set that the call can dispatch to should be known at compile time as well.

More specifically, one would not expect some overload defined in some derived class (possibly not defined in one's own source!) to be called. This is precisely the consideration we took when designing the behavior of the dynamic dispatch.

So what's next?

Next time, we'll apply the rules and concepts that we talked about today to other invocations - operators, conversions, indexers, and properties.

kick it on DotNetKicks.com

Last time we dealt with the basics of dynamic binding. This time, we'll add a small twist.

First, lets recall the example we were using last time:

static void Main(string[] args)
{
    dynamic d = 10;
    C c = new C();

    // (1) Dynamic receivers.
    d.Foo(); // Call.
    d.PropOrField = 10; // Property.
    d[10] = 10; // Indexer.

    // (2) Statically typed receivers (or static methods)
    //     with dynamic arguments.
    c.Foo(d); // Instance method call.
    C.StaticMethod(d); // Static method call.
    c.PropOrField = d; // Property.
    c[d] = 10; // Indexer.
    d++; // Think of this as op_increment(d).
    var x = d + 10; // Think of this as op_add(d, 10).
    int x = d; // Think of this as op_implicit(d).
    int y = (int)d; // Think of this as op_explicit(d).
}

Last time we dealt with the first set of invocations - those with dynamic receivers. This time, we'll deal with the second set - those with either static receivers, or no real apparent receiver.

What do you expect?

Lets take the simplest of this set of invocations and expand it out a bit. Suppose we have something like the following:

public class C
{
    public void Foo(decimal x) { ... }
    public void Foo(string x) { ... }
    static void Main(string[] args)
    {
        C c = new C();
        dynamic d = 10;
        c.Foo(d);
    }
}

First lets consider this from a purely intuitive standpoint. What would we expect to happen?

Since we know the type of our local variable 'c', intuitively, we know that one of the two overloads of Foo on C should be called. However, we also know that d is dynamically typed, so the compiler cannot determine the exact overload to be called until runtime. We would therefore expect the combination of these two to happen - the compiler will determine the candidate set at compile time, and determine which the call should resolve to at runtime. In this case, since d has the value 10 at the time of the call, we would expect the overload of Foo that takes a decimal to be called, since the value 10 is not convertible to type string.

What don't you expect?

Let's be a little more specific here, and expand our example to illustrate what we would NOT expect to have happen:

public class C
{
    public void Foo(decimal x) { ... }
    public void Foo(string x) { ... }
    static void Main(string[] args)
    {
        C c = new D();
        dynamic d = 10;
        c.Foo(d);
    }
}

public class D : C
{
    public void Foo(int x) { ... }
}

First of all, lets notice the subtle change in our source code, highlighted in yellow. We now creating an instance of the derived class D. This means that at runtime, the local variable c will be an instance of type D instead of C as in our previous example. Note also that D contains an overload of Foo that is a better match than all of the overloads on C - the value 10 is intrinsically typed int and so D.Foo is the best match.

However, note that although our example instantiates the local variable c within our code, it is very easy to imagine a method taking a parameter of type C and being given some other derived class at runtime. We do not expect this to change our candidate set used for overload resolution! Specifically (in compiler terminology), since the call to c.Foo can be bound statically to a method group, we expect the statically determined method group to be the one that is used. The dynamic argument should only serve to influence the resolution of the method group, not to influence the creation of the group itself.

What actually happens?

As I mentioned before, one of the design tenets that we've been trying to maintain is that dynamic binding behaves exactly as it would at static compile time, with the exception that the type used in place of the dynamic objects (arguments or receiver) is the runtime determined type instead of the compile time determined one. This means that for all arguments not statically typed dynamic, the compile time types will be used, regardless of their runtime types.

Applying this rule to our example means that at runtime, we should bind as if the type of the receiver c is C, and the type of the argument d is int. Using these types for overload resolution will yield C.Foo(decimal) as the result.

A slightly more complex example to (hopefully) drill home the point:

public class C
{
    public void Foo(object x, C c) { ... }
    static void Main(string[] args)
    {
        C c = new D();
        dynamic d = 10;
        c.Foo(d, c);
    }
}

public class D : C
{
    public void Foo(int x, D d) { ... }
}

Notice in this example that at runtime, c contains an instance of type D, and d contains the value 10. If we were to use the runtime types for everything involved in the binding at runtime, then the receiver would be type D, with the argument types being int and D respectively. This would yield D.foo(int, D) as the best result, but that's not at all what we would expect.

Because the only statically-known dynamically typed argument is the first argument d, it is the only one that has its runtime type used. The remainder of the arguments to the call (the receiver c, and the second argument c) have their static types used. As such, the only method considered is C.Foo(object, C), which is the method we'd expect to have resolved.

What's next?

Next time we'll look deeper at exactly what the compiler does in order to determine that we need a runtime dispatch for the given expression. After that, we'll apply the same principles we've discussed to other invocation types in our example (such as property-accessor-looking things and operators). By the end of it, we'll have turned this whole dynamic thing inside out, so stay tuned!

kick it on DotNetKicks.com

A few weeks ago, a few of us on the compiler team did a Channel9 interview, discussing some of the new features that we're working on and how they fit into the whole Visual Studio 2010 story.

The video is now online! In it, we talk in quite length about the dynamic feature, so if you've been following along with my posts but are still a little confused, the video may shed some more perspective. We go into some quick whiteboard examples of simple dynamic calls, and discuss what happens under the covers.

We also talk about some of our other new features, such as the Named and Optional arguments feature, and the Co and Contra-variance feature. Hope you enjoy!

link: http://channel9.msdn.com/shows/Going+Deep/Inside-C-40-dynamic-type-optional-parameters-more-COM-friendly/

Last time, we began to dive into dynamic binding in C# and what happens through the pipeline. This time, we'll take a simple scenario and pick apart the details of what happens under the covers, both during compile time and runtime.

We can break down what the compiler does into three parts: type and member declarations with dynamics (ie methods that return dynamic), binding and lookup, and emitting. We'll deal now with the binding aspects of dynamic.

Dynamic binding

Dynamic binding itself can be broken into two scenarios. Lets consider the following example.

static void Main(string[] args)
{
    dynamic d = 10;
    C c = new C();

    // (1) Dynamic receivers.
    d.Foo(); // Call.
    d.PropOrField = 10; // Property.
    d[10] = 10; // Indexer.

    // (2) Statically typed receivers (or static methods)
    //     with dynamic arguments.
    c.Foo(d); // Instance method call.
    C.StaticMethod(d); // Static method call.
    c.PropOrField = d; // Property.
    c[d] = 10; // Indexer.
    d++; // Think of this as op_increment(d).
    var x = d + 10; // Think of this as op_add(d, 10).
    int x = d; // Think of this as op_implicit(d).
    int y = (int)d; // Think of this as op_explicit(d).
}

Consider the first set of examples under (1). Each of these dynamic invocations happen off of the dynamically typed expression. It is clear where the dynamicity (yes, I like that word, even though it isn't one...) comes from, and where it goes.

The second set of examples under (2) are a little more complex. The use of dynamic is indirect in each of these. Because the argument to each operation is dynamic, they flow into the containing operation and make them dynamic as well. As such, the compiler does sort of a mix of dynamic binding and static binding - it will use the static type of the receiver to determine the set of members to overload on, but will use the runtime types of the arguments to perform overload resolution.

The first set of examples are much more straight forward to understand, so we'll use this set as our foundation for exploring the feature.

Dynamic receivers

When the compiler encounters an expression typed dynamic, it knows to treat the subsequent operation as a dynamic operation. Whether its an index get, index set, method call etc, the result type of the operation will be determined at runtime, and so at compile time, the result of the operation must also be dynamic.

The compiler transforms all dynamic operations into what we'll call a dynamic call site. This consists of creating a compiler generated static field on a generated static class that stores the DLR site instance for the invocation, and initializing it as necessary.

The DLR call site is a generic object that is generic on the delegate type of the call. More on how this delegate gets generated later. The type names may not be final yet, but currently the creation of the DLR call site takes a CallSiteBinder which is an object that knows how to perform the specific binding that is required for the call site. The DLR provides a set of standard actions that can be used to take advantage of the DLR's support for interop with dynamic objects (more on that in a later post).

The call site contains a field of type T that is an instance of the delegate type that the site is instantiated with. This delegate is used to contain the DLR caching mechanism which you can learn about on Jim Hugunin's blog. It stores the results of each bind and is used to invoke the resulting operation.

Once the call site has been created, the compiler then emits the code to invoke the delegate, passing it the arguments that the user passed to the call site.

What happens at runtime?

Once the compiler has created the DLR call site, it then invokes the delegate, which causes the DLR to do its magic with interop types, and its magic with caching. Assuming that we don't have a true IDynamicObject and we don't have a cache hit, the CallSiteBinder that we seeded the DLR site with will be invoked. C# has its own derived CallSiteBinders that will know how to perform the correct binding, and will return an expression tree which will be merged into the DLR call site's target delegate for caching.

The current caching mechanism simply checks exact type matches on the arguments. For example, suppose our call looks like the following:

arg0.M(arg1, arg2, ...);

And suppose our call has arg0.Type == C, and all the arguments passed to the call are of type int. The cache check would look like the following:

if (arg0.GetType() == typeof(C) &&
    arg1.GetType() == typeof(int) &&
    arg2.GetType() == typeof(int) && 
    ...
    )
{
    Merge CallSiteBinder's bind result here.
}
... // More cache checks
else
{
    Call the CallSiteBinder to bind, and update cache.
}
C# CallSiteBinder creation

The last thing we need to paint a full picture of dynamic binding is to understand what the C# CallSiteBinder implementation does.

In our example, we have 3 different types of dynamic operations. We have a call, a property access, and an indexer. Each of these operations have their own unique pieces to them, but still share much of the common functionality. As such, they all are initialized with a common C# runtime binder, and are used by the runtime binder as data objects that describe the action that needs to be bound. We'll call these objects the C# payloads.

A good way to think of the C# runtime binder is a mini compiler - it has many of the concepts you'd expect in a traditional compiler, such as a symbol table, and a type system, and much of the functionality as well, such as overload resolution and type substitution.

Lets use the simple example of d.Foo(1) for our consideration.

Once the runtime binder gets invoked, it is given the payload for the current call site, and the runtime arguments the site is being bound against. It takes the types of the all the arguments (including the receiver) and populates its symbol table with those types. It then unpacks the payload to find out the name of the operation it's trying to perform on the receiver (in this case, "Foo"), and uses reflection to load all members named "Foo" off of the runtime type of d, putting those members into its symbol table.

From there, we have enough information in the binder's internal system to do the binding that the action describes. At this point, we fork off and bind based on the payload's description.

One of the design choices we made was that the runtime binder should have the exact same semantics that the static compiler has. This includes reporting the same set of errors that the compiler would produce, and perform the same set of conversions (user-defined or otherwise).

As such, each payload is bound exactly as the static compiler would have. The result of the bind is an expression tree that represents the action to take if the binding was successful. Otherwise, a runtime binder exception is thrown. The resulting expression tree is then taken and merged into the call site's delegate to become part of the DLR cache mechanism, and is then invoked so that the result of the user's dynamic bind gets executed.

A slight limitation

As I mentioned, we tried to keep the philosophy of matching exactly what the static compiler would do. However, there are several scenarios that will not work in Visual Studio 2010 that we will hopefully get to in a future release.

Several to note are lambdas, extension methods, and method groups.

Because we currently do not have a way of representing the source of a lambda at runtime without a binding, dynamic invocations that contain lambdas produce a compile time error.

Also, because we don't currently have a way of passing in using clauses and scopes during runtime, extension method lookup will also not be available for this next release.

There is currently no way to represent a method group at runtime (ie there is no MethodGroup type), and so without introducing that concept into .NET, there is no good way for us to allow method groups to be represented dynamically. This means that you cannot do the following:

delegate void D();
public class C
{
    static void Main(string[] args)
    {
        dynamic d = 10;
        D del = d.Foo; // This would bind to a method group at runtime. 
    }
}

Because we cannot represent method groups at runtime, a runtime exception will be thrown if the runtime binder binds d.Foo to a method group.

Hopefully you have a better understanding about the details of what happens when C# performs a dynamic bind. Next time we'll take a look at the second set of scenarios we discussed today. We'll also introduce the Phantom Method, and describe what it does and see how it affects overload resolution.

Until then, happy coding!

kick it on DotNetKicks.com

The other day I was playing around with some office code, and I found myself writing a lot of code much like the following sample that Anders used at his PDC talk:

static void Main(string[] args)
{
    var xl = new Excel.Application();

    ((Excel.Range)xl.Cells[1, 1]).Value2 = "Process Name";
    ((Excel.Range)xl.Cells[1, 2]).Value2 = "Memory Usage";
}

As you can imagine, it very quickly became tiresome assigning the results of each call to a local variable, debugging and finding out what type it returns for my scenarios, making the cast to the strong type so that I can call certain methods on it, rinse, and repeat.

This pattern is common in dynamic APIs, and cause a lot of excess code to be written that essentially is used just to make the type system happy. Wouldn't it be nice if instead, we could write something like this?

static void Main(string[] args)
{
    var xl = new Excel.Application();

    xl.Cells[1, 1].Value2 = "Process Name";
    xl.Cells[1, 2].Value2 = "Memory Usage";
}

Well, in C# 4.0, we now allow you to write exactly that. One of the main features that we're working on in C# 4.0 is the dynamic late binding feature. This feature allows you to tell the compiler that the thing that I'm returning really ought to be treated like a dynamic type, and that any dispatch on it should be done dynamically. The runtime will then do the binding for you based on the runtime type of the object instead of the static compile time return type, and if the binding succeeds, then the code will succeed. This gives us exactly what we want.

So how does this feature work?

Firstly, we've introduced a dynamic type into the type system. This type indicates to the compiler that all operations based on the type should be bound dynamically and not at compile time. Secondly, we've created a C# runtime binder which does the late binding for you. Lastly, we've baked in the usage of the DLR and are making full use of their caching and dynamic dispatch capabilities, so that you can interop with dynamic objects (objects created from Iron Python for instance).

The dynamic type

In order to start using the dynamic binding, we've got to have some way to signify to the compiler that we want our object or expression to be bound dynamically. Enter the dynamic type.

The dynamic type is just a regular type that you can use in your code to denote local variables, fields, method return values etc. It tells the compiler that everything to do with that object or expression should be done dynamically. Consider the following example:

static void Main(string[] args)
{
    dynamic d = SomeInitializingStatement;

    d.Foo(1, 2, 3); // (1)
    d.Prop = 10; // (2)
    var x = d + 10; // (3)
    int y = d; // (4)
    string y = (string)d; // (5)
    Console.WriteLine(d); // (6)
}

In this example, each of the statements has some element of the dynamic type flowing through it, and is therefore dispatched dynamically. Lets consider each of them.

  1. Since the receiver in this example is typed dynamic, the compiler will indicate to the runtime that it needs to bind some method named "Foo" on whatever the runtime type of d happens to be, with the arguments {1, 2, 3} applied to it.
  2. This example also has a dynamic receiver, and so the compiler indicates to the runtime that it needs to bind a property-looking-thing (could be a field or property) named "Prop", and that it wants to set the value to 10.
  3. In this example, the operator "+" becomes a dynamically bound operation because one of its arguments is dynamically typed. The runtime then does the normal operator overload resolution rules for "+", finding any user-defined operators named "+" on the runtime type of d, and considering that along with the regular predefined binary operators for int.
  4. In this example, we have an implicit conversion from the runtime type of d to int. The compiler signifies to the runtime that it should consider all implicit conversions on int and on the runtime type of d, and determine if there is a conversion to int.
  5. This example highlights an explicit conversion to string. The compiler encodes this cast and tells the runtime to consider explicit casts to string.
  6. In this example, despite the fact that we're calling a statically known method at compile time, we have dynamic arguments. As such, we cannot perform overload resolution correctly at compile time, and so the dynamic-ness of d flows out to its containing call, and we end up dispatching Console.WriteLine dynamically as well.

There are several other scenarios that dynamic flows out to, but I've listed these to give you a general idea of what the dynamic type's implications are.

Now, we should note that the dynamic type is really just syntactic sugar to signify to the compiler that it should treat bindings dynamically. In metadata, dynamic is just object with an attribute signifying its dynamicity (if that's even a word... I don't think it is though!).

What happens at compile time?

For each dynamic operation, the compiler generates calls into the DLR, and takes advantage of its call sites. For more information about the DLR and what call sites are, my colleague Jim Hugunin gave an excellent talk at PDC about it - you can view his blog here, and watch his talk here.

The DLR call site takes a set of standard actions which indicate what type of dynamic action we want to take. The C# compiler emits a subclass of these standard actions, annotated with some C# specific details, and emits invocations of the call sites in place of the call that the user makes. For instance, this code sample gets translated into something like the following pseudocode:

// This code...
static void Main(string[] args)
{
    dynamic d = SomeInitializingStatement;
    d.Foo(1, 2, d);
}

// transforms into this code.
static void Main(string[] args)
{
    dynamic d = SomeInitializingStatement;
    _csharpCallAction = new CSharpCallAction("Foo");
    _dlrSite<T> = new Site<T>(_csharpCallAction); // Create the site. 
    _dlrSite.Target(1, 2, d); // Invoke the delegate. 
}

If you want more information on this, my colleague Chris Burrows has written an excellent blog on the dynamic type and what the compiler generates.

Note that the site creation pseudocode specifies a generic argument, T. This argument is a delegate type that represents the signature of the call. So in our example, our call takes 2 integer arguments and a dynamic argument, and has a dynamic receiver. T would then be a delegate that represents that.

Invoking that delegate invokes the C# runtime binder, which binds the expression based on the runtime types of the arguments and the receiver.

What happens at runtime?

When the DLR delegate gets invoked, it does a couple of cool things that I'll describe briefly. For more information, check out Jim Hugunin's blog.

  1. The DLR checks a cache to see if the given action has already been bound against the current set of arguments. So in our example, we would do a type match based on 1, 2, and the runtime type of d. If we have a cache hit, then we return the cached result.
  2. If we do not have a cache hit, then the DLR checks to see if the receiver is an IDynamicObject. These guys are essentially objects which know how to take care of their own binding, such as COM IDispatch objects, real dynamic objects such as Ruby or Python ones, or some .NET object that implements the IDynamicObject interface. If it is any of these, then the DLR calls off to the IDO and asks it to bind the action.

    Note that the result of invoking the IDO to bind is an expression tree that represents the result of the binding.
  3. If it is not an IDO, then the DLR calls into the language binder (in our case, the C# runtime binder) to bind the operation. The C# runtime binder will bind the action, and will return an expression tree representing the result of the bind.
  4. Once step 2 or 3 have happened, the resulting expression tree is merged into the caching mechanism so that any subsequent calls can run against the cache instead of being rebound.

Jim gives a great description of points 1, 2, and 4, which deal with the DLR specifics, so I'm going to elaborate on what happens in step 3.

The C# runtime binder

The C# runtime binder uses Reflection to populate its internal symbol table to determine what to bind to. Each of the C# specific actions encodes the type of the binding, along with extra information that allows us to determine how to bind the action.

For example, if the argument is known at compile time to have a static type, then that type will be marked in the C# action, and will be used as the type of the argument during runtime binding. If it is known at compile time to be typed dynamic (ie it is a variable of type dynamic, or is an expression that returns dynamic), then the runtime binder will use reflection to determine its runtime type and use that type as the type of the argument.

The runtime binder populates its symbol table as needed. For instance, in our example, we were calling the method Foo. The runtime binder will load all members named Foo on the type of the receiver into the symbol table.

It then populates the necessary conversions for each of the argument types. Since we may need to coerce the arguments to types that match the method calls (using user-defined conversions as necessary), the binder loads those conversions into the symbol table as well.

It then performs overload resolution exactly like the static compiler does. That means that we get the exact same semantics as the static compiler. It also means that we get the same error semantics and messages - a failed binding at runtime results in an exception being thrown, which encapsulates the error message that you would have gotten at compile time.

It then takes the result of overload resolution and generates an expression tree that represents the result, and returns that back to the DLR.

A summary

So that's a brief summary of what the dynamic pipeline looks like. Of course, I've glossed over a lot of the details, but I'll be covering those details in my future posts. Until next time, some questions to ponder:

What happens when the receiver is known statically but the arguments are dynamic? What happens if the methods we're trying to bind against are private? What about operators - how does resolution work on them?

These, and more, I'll aim to address in my subsequent posts.

As always, happy coding!

kick it on DotNetKicks.com

The cat's out of the bag! Hours ago at PDC '08, I got to watch Anders unveiled the new C# 4.0 language features that we've been working on. This unveiling was accompanied by some fantastic demos of our work in action. Even though I've been working on this stuff for the past year, I was still completely blown away by his demos.

I've gotten the chance to work on quite a few of those features, so over the next while, I'll be sharing with you all some of the things that we're doing, and will be anxiously awaiting your feedback.

So first off, for those of you who haven't heard, here are the language features that we're working on, and a brief description of them.

  1. Dynamic binding. We've introduced a new type, dynamic, which behaves much like object, but allows the operations performed on your object to be bound at runtime instead of compile time.
  2. Named and Optional parameters. You can now specify default values for your parameters, allowing them to be optionally specified at the call site. We've also added the ability for your arguments to be passed by name, so that you can specify exactly which arguments you want to give, and refrain from specifying the rest (assuming they're optional).
  3. Com interop features. We've done quite a bit of work to improve COM interop. These include:
    • No ref for COM calls. For all COM calls that take ref arguments, you can specify an argument without a ref, and the compiler will generate a local for you and generate a ref to that local as the argument.
    • No PIA. We have introduced the ability to deploy your applications which use Primary Interop Assemblies (PIAs) without referencing the actual PIA at runtime. This allows compiling against them, but not needing to ship them with your application.
    • Implicit dynamic for COM types. We now give you the option of turning all objects returned from COM into dynamics so that you can perform late bound calls off of them instead of having to cast the result in order to make it useful.
  4. Variance. We've introduced covariance and contravariance into the language for interface types and delegate types. For those looking for an excellent in-depth description of variance, visit Eric Lippert's blog.

All of these features are available in our CTP release, which is available here.

Stay tuned for more on these topics, including samples and detailed descriptions of what happens behind the scenes. I'll also discuss some of the design considerations that we took, and would love to get your feedback on those as well.

More Posts Next page »
 
Page view tracker