Welcome to MSDN Blogs Sign in | Join | Help

Peterhal - Last day at Microsoft

I've been promising myself an extended vacation for a long time now, but working at Microsoft has been so much fun I've kept postponing it. Well, this is it - my last day at Microsoft.

 

Feel free to email me at 'hotweird' at hotmail.

Posted by peterhal | 0 Comments

What Makes a Good Programmer?

I just read two salon articles about Scott Rosenberg's new book Dreaming in Code. His thesis is that "programing is hard" and uses the experiences of the development of the Chandler project to frame the discussion. Scott identifies a common problem in large software development - inconsistent terminology use across the software development team. At the same time he complements the members of the development team as being talented programmers. These two ideas struck me as incongruous. How can a bunch of "ace programmers" end up with such a basic problem? Should programmers encountering this problem really be deserving of the accolade "ace"? And what skills would I expect from an "ace" programmer anyhow?

Before I go any further, I want to make it clear that I don't want to talk down Scottt or the Chandler team. Scott makes a number of great points that reflect my own experiences. Virtually every large software project I've seen has had problems akin to those Scott describes. Especially every large project I've worked on. I'm certainly as guilty of these difficulties as any programmer. But the examples that Scott describes are good grist for discussion about software development.

When people talk about talented programmers they are usually refering to their ability to write correct code, whch solves problems quickly and elegantly. I suspect that that is what Scott was implying by describing the Chandler team as ace programmers. Raw coding skills vary widely across individuals. When it comes down to it some programmers are just more talented than others at writing code. Large team projects - projects with more than 3 programers with schedules of more than a few months - require a whole set of skills beyond raw coding. On large teams, excellent coding skills are a prerequisite. Alone, they qualify you as a beginner, not an ace.

From Beginner to Good

For me, the mark of a good team programmer is someone who does not suggest rewriting everything from scratch. Scott makes the very valid point that programmers like to program - and that they don't like to understand code written by others. Truer words were never spoken. Time and time again I've seen smart novice programmers suggest rewriting code produced by other programmers. I've noticed an inverse relationship betwen the esteem I hold for a programmer and the number of times I've heard them suggest a rewrite of a large software project.

Good team programers are capable and willing to work with a codebase that is too large for a single person to rewrite within the project's schedule. They recognize that the existing codebase has shortcomings and they have the maturity and discipline to know that attempting to 'fix' everyone of those shortcomings will almost certainly kill the project. Instead they identify those areas which must be improved on to deliver the project, and they have the skills to make significant changes to a large and ugly (because every large piece of software is ugly - but that's another story) codebase without causing the whole thing to destabilize.

And From Good To Great

If the mark of a good team programmer is the ability to work (usually grudgingly) with the code of others, then the mark of a great team programer is the ability to produce code that other programmers on the team will gladly use. Programmers are the most fickle sort, and if you can produce code that other, less skilled, coders will use without suggesting a rewrite - then you've elevated yourself to the top of the programing heap. Great programmers produce code that is so good that it will prevent the cross programer problems that plague most large software projects.

 

Posted by peterhal | 20 Comments

Code Review: Double Checked Locking Code

A friend of mine recently sent me some code to review:

 

Hi Peter,

Do you have any suggestions on how to clean up this code?

In particular the creation of a particular singleton type doesn't look very pretty in this model:

 

private DoubleCheckedLock<SingletonClass> myInstance = new DoubleCheckedLock<SingletonClass>();

public SingletonClass Current {

    get {

        return this.myInstance.GetSingleton(

            delegate() {

                return this.CreateSingleton(OrderContext, "third");

            }

        );

    }

}

 

I’m sharing my comments on the code, as it demonstrates some important design principles. Here’s the original code:

 

public class DoubleCheckedLock<T> where T : class

{

    private volatile T myInstance;

    private static readonly object lockObject = new object();

 

    public T GetSingleton(CreateObject c) {

        if (this.myInstance == null) {

            lock (lockObject) {

                if (this.myInstance == null) {

                    this.myInstance = c();

                }

            }

        }

        Debug.Assert(this.myInstance != null);

        return this.myInstance;

    }

 

    public delegate T CreateObject();

 

}

 

public abstract class DoubleCheckedLock2<T> where T : class

{

    protected abstract T CreateObject();

    private DoubleCheckedLock<T> myInstance = new DoubleCheckedLock<T>();

 

    public T Instance {

        get {

            return this.myInstance.GetSingleton(this.CreateObject);

        }

    }

}

 

And here are my comments:

 

  • The name DoubleCheckedLock describes the implementation. Names should reflect how the client will use a symbol, not how something is implemented. For clients, the class implements a lazily created, thread safe singleton object. Aka, a Singleton object that does the right thing. Naming the class 'Singleton' will make client code read much better.

 

  • I'm not a fan of public nested types, so I'd recommend moving the CreateObject delegate to a top level type. That's a style choice, but I've found it to wear well in practice.

 

  • Names of types, including delegate types, should be nouns, not verbs or verb phrases. CreateObject should be renamed Creator.

 

  • Passing in the creator delegate is necessary for this pattern so don't sweat it. However, passing in the delegate to the 'GetSingleton' method allows for some surprising behavior. It allows different calls to 'GetSingleton' to supply different values which will surely cause confusion. This confusion can be designed out by passing in the creator delegate as an argument to the Singleton constructor.

 

  • Once the GetSingleton parameter is removed, it can become a property named 'Value'. Again, this makes your client code read better.

 

  • Add the 'sealed' and 'readonly' modifiers unless you are specifically expecting to use a class as a base or modify a field after construction. The additional modifiers self document the code, and make you think through your design more fully.

 

  • The DoubleCheckedLock2 abstract base class is a classic example of inheriting for containment - representing a 'has a' relationship between the derived class and the DoubleCheckedLock2 base class. Inheritance should always represent an 'is a' relationship, never a 'has a' relationship. You will save yourself many headaches by avoiding base classes which force you to use your one precious base class to represent a 'has a' relationship. Get rid of the DoubleCheckedLock2 class. The fact that it has such a crummy name, and that there is no good name to give this thing is a clear indication that something is wrong with it.

 

Now the client code looks like this:

 

sealed class Client {

    private readonly Singleton<ExpensiveResource> expensiveValue;

    Client() {

        this.expensiveValue =

                new Singleton<ExpensiveResource>(

                    delegate()

                    {

                        return new ExpensiveResource();

                    }

                );

    }

 

    public ExpensiveResource ExpensiveValue {

        get { return this.expensiveValue.Value; }

    }

}

 

Which is pretty damn sexy, if I don't say so myself.

 

Here's the implementation:

 

public delegate T Creator<T>();

 

public class Singleton<T> where T : class {

    private static readonly object lockObject = new object();

    private volatile T myInstance;

    private readonly Creator<T> creator;

 

    public Singleton(Creator<T> creator) {

        Debug.Assert(creator != null);

        this.creator = creator;

    }

 

    public T Value {

        get {

            if (this.myInstance == null) {

                lock (lockObject) {

                    if (this.myInstance == null) {

                        this.myInstance = this.creator();

                        this.creator = null;    // allow creator object to be GC-ed as soon as possible

                    }

                }

            }

            Debug.Assert(this.myInstance != null);

            return this.myInstance;

        }

    }

}

 

Happy coding,

Peter

Posted by peterhal | 4 Comments

New F# Release.

Don Syme and the F# team have just released a new version of F#. Check it out here: http://blogs.msdn.com/dsyme/archive/2006/11/30/f-1-1-13-now-available.aspx

 

Peter

Posted by peterhal | 0 Comments

What Do Programmers Really Do Anyway? The data is in!

The smart folks over in the MSR-Human Interactions in Programming team have done some interesting research into the question of "What Do Programmers Really Do Anyway?". I'm more than a little pleased to see that the data is consistent with the assertions I made in my previous blog posts.

Some of their results are here:

http://research.microsoft.com/research/pubs/view.aspx?type=Technical%20Report&id=994

http://research.microsoft.com/hip/papers/Ko2007BugFixing.pdf

Their team's web site is:

http://research.microsoft.com/hip/

I'm looking forward to more great work from these folks,

Peter

Posted by peterhal | 0 Comments

C# Automatically Implemented Properties - My Video Debut

Hey Folks,

Charlie Calvert video taped our discussion on some of the new C# 3.0 features. Find it here: http://wm.microsoft.com/ms/msdn/visualcsharp/peter_hallam_2006_11/PeterHallam01.wmv.

Cheers,

Peter

Posted by peterhal | 2 Comments

What Do Programmers Really Do Anyway? (aka Part 2 of the Yardstick saga)

Way back in 2002 when we started working on Whidbey, I captured my thoughts on the direction we should take for C# and Visual Studio in two large emails. In the first email The Yardstick I spent a lot of time saying that you must evaluate features against the amount of time they save versus the amount of time they cost. A big saving in an infrequent task is not worth even a small cost in a frequent task, and more importantly, even a small saving in a frequent task will have a much more significant effect than a huge saving in an infrequent task.
 
This follow up email captures my thoughts on "Where we as tool developers should focus to provide the most benefit for the programmers that use our tools?" To answer this question we must first answer the question:
What do professional programmers really do with their time anyway?
Here is the result of my observation way back in August of 2002, and after reviewing it, I find that it still rings true today. I'd be delighted to hear how your experiences as programmers stack up...
 
 
 
So after taking out the time spent playing Halo and consuming mass quantities of caffeine, here's what I do all day:
Design Code
Write New Code
Understand Existing Code
Modify Existing Code
Verify Existing Code Still Works
Let me expand on each of these tasks, to explain precisely what I mean.
 
Design Code
 
Design involves analyzing a new problem, and mapping out the broad flow of code which will be used to solve the problem. To date the #1 tool for this activity is a big whiteboard (note the 8' x 4' monster on my North wall) and 1 to 3 engaged brains focused on the problem. There are some tools which attempt to formalize the design process using visual designers to view and edit the relationships in your code. My whiteboard is pretty effective at this task however, and I have yet to really want to even investigate a formal tool for an informal process of brainstorming and design.
 
Write New Code
 
For me, writing new code means typing in code in a virgin source file and getting it to compilable state. I'll also include writing a new method from scratch in this category. I do not include adding code to an existing method, or copying code from an existing method and modifying the copy to do something new. I'll leave those activities in the modifying existing code task. Intellisense is THE power tool for writing new code.It has easily increased the speed at which a programmer can write new code by a factor of 2x to 5x. Visual designers (like the Winforms and Data designers) and code wizards are useful for getting some common coding patterns started.
 
Understanding Existing Code
 
Understanding existing code means taking a look at some code to understand precisely what is going on. Answering questions like: Exactly what are the inputs to this method? What is the output of this piece of code? Exactly what does the callstack and data look like when this particular code path get hit? Why is this code doing what it should? Why does this code not do what I want it to? Note that this includes understanding code that I have source for (usually written by me, or someone in my organization) but it also includes understanding the details of libraries (like the BCL and WinForms) that I don't have source for. Lastly this also includes understanding the overall structure of an existing library or application. Answering questions like: What are the main data structures and how do they interact?
The primary tool used in this activity is the editor - think 'lots of time staring at source code'. Code outlining (collapse to definitions) is a great feature in this area. A lot of time is spent tracing up and down call stacks, trying to understand the flow of the bits. In theory class view and symbolic navigation (goto definition and goto reference) would be very useful, but I still find myself using find in files as my primary navigation tool for source code. The Object Browser, ildasm and Help are the primary tools for understanding libraries of managed code. Looking at the code spit by Visual Designers is a great way to understand how the underlying code libraries work. Perhaps the most useful tool is the debugger. The 'find the bug' part of debugging fits entirely in the understanding code activity. Symbolic debuggers aid greatly in understanding of code flow, but perhaps more importantly the debugger shows the shape of the real data structures that your code is operating on. Still, most of understanding code happens without the debugger, and relies on the editor.
 
Modifying Existing Code
 
Modifying code is related to writing new code, but it is different enough to call out separately. Modifying existing code is either bug fixing or adding new features. When fixing a (simple) bug in a method there is some existing behavior in the code which must be preserved. The coder must ensure that the fix doesn't break any of the existing non-buggy behavior. Adding new features to code is different from writing new code in that it never involves just adding code. New feature work almost always starts with redesigning the existing code base so that it is sympathetic to the new feature area. Only once the code has been refactored can the adding of new code begin. Once the new code has been written it must still be hooked into the existing code which requires some modification of existing code. As an example, I am currently working on adding generics to the C# compiler. This requires a fundamental change in the data structures used by the compiler which I've been working on for most of the last week, and will continue well into the next. When I'm done this change I will have changed about 5% of the lines of code in the compiler (some 4,500 lines of code) while adding exactly no new code. Only once the refactoring is complete can the real work of adding the individual parts of generics into the compiler begin. Not all features require this dramatic a refactoring, but it is not atypical either. Currently the tools used to modify existing code include intellisense and find in files. Visual designers can also help here, provided that the code was authored in the designer and the designer is not confused by all that grungy real world code which you've added to your project.
 
Verifying Existing Code
 
Verifying existing code means writing and running test suites to ensure that recent bug fixes haven't caused regressions. It also means stepping through recently changed code in the debugger to verify the new behavior. Industry thought leaders have recently made a big fuss about regression testing, and have expounded testing frameworks like JUnit. It has been my experience that regression testing always pays off, but that is another mail entirely.
 
Priorities
 
Of the above tasks VS is the primary tool for 3. Writing new code, understanding existing code, and modifying existing code. The real question is "What percentage of time do real developers spend in each of these three activities?". The answer may surprise you, so think hard before continuing?
 
My answers are:
New Code:                         2%
Modifying Existing Code:     20%
Understanding Code:         78%
Now, I fit solidly in the Einstein user profile. I code all day every day. I'm a C++ power user. I read x86 assembly natively and can decipher the raw bytes of machine code in a pinch. I've been working on the same code base for 3.5 years, and will likely continue to work on that code base for another 3.5 years. A better question is, what do these numbers look like for Elvis. Now, many, many moons ago I used to be Elvis. I had a spanking new 386 on my desk. My primary tool for writing code was a pencil. Browsing code involved a printer and a highlighter. Yes, yes the year was 1988, and I'm writing a Futures and Options accounting system for Citibank in DBASE III+ as an intern... Oh my god I'm wearing the most god-awful blue pinstripe suit.
 
So what are Elvis's numbers like:
New Code:                         5%
Modifying Existing Code:     25%
Understanding Code:         70%
No, I am not making this up.
 
Let me explain where these numbers come from. First: Why is 5 times more time spent modifying code than writing new code? The answer is that new code becomes old code almost instantly. Write some new code. Go for coffee. All of sudden you've got old code. Brand spanking new code reflects at most only the initial design however most design doesn't happen up front. Most development projects use the iterative development methodology. Design, code, test, repeat. Repeat a lot. Only the coding in the first iteration qualifies as all new code. After the first iteration coding quickly shifts to be more and more modifying rather than new coding. Also, almost all code changes made while bug fixing falls into the modifying code category. Look at VS, our stabilization (aka bug fixing) milestones are as long as our new feature milestones. Modifying code consumes much more of a professional developer's time than writing new code.
 
Secondly, why does understanding code take 3 times more of a developers time than modifying code? The answer here is that before modifying code, you must first understand what it does. This is true of any refactoring of existing code - you must understand the behavior of the code so that you can guarantee that the refactoring didn't change anything unintended. When debugging, much more time is spent understanding the problem than actually fixing it, and once you've fixed it, you need to understand the new code to ensure that the fix was valid. And lastly, even when coding new code, you never start from scratch. You will be calling existing code to do most of your work. Either user written code or a library supplied by Microsoft or a third party for which no source is available. Before calling this existing code you must understand it in precise detail. When writing my first XML enabled app, I spent much more time figuring out the details of the XML class libraries than I did actually writing code. When adding new features you must understand the existing features so that you can reuse where appropriate. Understanding code is by far the activity at which professional developers spend most of their time.
 
Visual Studio is not Focused on Real Coders
 
I recently asked a couple of PM friends of mine (who will remain nameless to protect the innocent) where they figured developers spent their time. Their estimates ranked writing and modifying code above understanding code. Looking at the feature list for Whidbey I think that this is consistent with many folks across the division. It sounds reasonable that writing code is the primary activity for folks who write code for a living but it is in fact very far from the truth. I think there are several reasons why our current focus is on writing code rather than understanding code.
 
Many of our new features are designed by folks who write small demo apps. Lets face it, PM's aren't professional developers. They need to come up with code snippets and examples which are trimmed down for presentations and conferences. This results in features which demo well, but will in fact provide little or even negative benefit to the real coding task of understanding code. Staring at code for hours just doesn't demo well.
Our usability studies are run over a timeline of 3 - 4 hours. This includes problem definition, problem solution and post mortem. The last 2 usability studies I saw were basically "write this self contained dumbed down piece of code from scratch". This does not even remotely resemble real world professional coding. The last time I had a coding project like that I was in college. Early in college. A much more representative task would be to send a coder at an existing piece of code that they'd never seen, that was undocumented, badly written, badly architected and had several bugs. Then tell them to add a new feature while maintaining the existing behavior as much as possible. It may be difficult to get anything useful out of a short study, but it would be much more representative of real professional development. Even in the usability studies that I have seen, the users have spent a ton of time in help trying to understand our class libraries (aka understanding existing code).
Many of our ideas for new features are a response to questions from users on our newsgroups. Again, this skews the data towards the new user, writing their first program. Most of our questions on the newsgroups come from new users who have just installed the product. Well, guess what? No matter how good the product is they will always ask 'dumb' questions while they try and get their head around this monster new product they've just installed. Once they've gotten a little familiar with the product their usage of the product will change drastically. The things which they spend time on in the first few weeks will be dramatically different the things that they spend their time on after they have learned the basics of the product. We really don't get any good feedback from newsgroups on the real day to day usability of our product.
 
In The Yardstick, I outlined a system for evaluating features. You count up the time saved and compare that against the time cost to give a net time saved. Lets run a couple of examples here:
 
Example 1:
 
Time Saved: 10% of new code time
Time Cost: 1% of understanding code time
 
This looks like a pretty attractive feature. saving 10%, costing 1% looks good. But now factor in the fact that much more time is spent understanding code then writing new code and try again:
 
Real Time Saved: 10% of new code time = 0.5% of Total Dev Time
Real Time Cost: 1% of understanding code time = 0.7% of Total Dev Time
 
Net Time Saved: -0.2% of Total Dev Time
 
Now we realize that even this small sacrifice of understanding code results in a net loss in productivity for our customers.
 
Example 2:
 
Time Saved: 10% of understanding code time
Time Cost: 10% of writing new code time
 
At first glance this looks like a wash, no saving, but factor in the relative time again:
 
Real Time Saved: 10% of understanding code time = 7% of Total Dev Time
Real Time Cost: 10% of writing new code time = 0.5% of Total Dev Time
 
Net Time Saved: 6.5% of Total Dev Time
 
Again, because much more time is spent in the understanding code task this feature is in fact a significant usability win.
 
So now the question comes : where should we spend our resources to improve VS in ways which will most benefit our customers. Clearly we should focus our efforts on making it easier for users to understand existing code, but the conclusions are in fact much more dramatic than that. If we could reduce the amount of time spent understanding code by a mere 10%, that would save the user more time than a 100% reduction in the amount of time spent writing new code. Think about what that really means for a moment. If we spent a ton of work, making intellisense, designers and wizards so good that writing new code took no time at all. Zero time. The ESP coding interface. That would still have less developer impact than a 10% reduction in the amount of time developers spend understanding the code base they are working in. The other conclusion to draw from the above, is that any new feature which impedes the developers primary focus of understanding code will allmost certainly be a net usability loss. In fact I'll go one step further and say that cutting existing features which impede the user's ability to understand existing code will result in more productive programmers.
 
 
Peter
Posted by peterhal | 22 Comments

C# Stumper: Why does this code not compile?

Hey folks,

First off, I want to appologize for not having any activity on my blog for a while. I just got back from a wonderful 3 week vacation in Spain. Now that I'm back, rested and limber, here's a twisted peice of C# code which is gauranteed to turn your brain inside out.

Why does the compiler (correctly) give an error message on the override in the following code?

abstract class A<T>
{
    public abstract T getT();

    class B : A<B>
    {
        public override B getT()
        {
            throw new System.Exception("The method or operation is not implemented.");
        }
    }
}

I got this code from a coworker, and I must confess that the first time I saw it I was convinced that the behaviour was a compiler bug. It turns out that the compiler correctly diagnoses the problem, though even after seeing the error message it took me a while to figure out what's going on. I'll post a discussion in a couple of days.

Any Takers?

Peter
C# Guy

Posted by peterhal | 16 Comments

Many C# Questions: Switching on non-constant values.

I finally decided to play with the style settings on my blog. As you may have guessed, I'm a bit of a newbie when it comes to websites and blogging. Let me know what you think of the new look. Last weeks posting generated some great comments. Tzagotta asks:

 

Why are constant expressions required in case labels? I see a switch statement conceptually the same as a series of chained ifs/else ifs.

 

This is an interesting question. It seems reasonable that you should be able to switch on an expression of any type, and have case labels which can take any value – even values which are not compile time constants. A simple example would look something like this:

 

class Program {

    static void Main() {

        object myFirstObject = ...;

        object mySecondObject = ...;

 

        object obj = ...;

        switch (obj) {

        case myFirstObject:  ...; break;

        case mySecondObject: ...; break;

        default:             ...; break;

        }

    }

}

 

which would be equivalent to this code:

 

class Program {

    static void Main() {

        object myFirstObject = ...;

        object mySecondObject = ...;

 

        object obj = ...;

        if  (obj == myFirstObject) {

           ...

        } else if (obj == mySecondObject) {

           ...

        } else {

           ...

        }

    }

}

 

At first glance this seems like a great idea – the switch is slightly more compact and readable than the chained ifs, both desirable characteristics of a programing language. If the construct could easily be limited to this simple example it might be a reasonable language extension to consider. Unfortunately, things are not as simple as they appear at first glance…

 

Firstly, what happens when myFirstObject is equal to mySecondObject? Well that depends on which case label came first in your switch. The order of the case labels becomes significant in determining which block of code gets executed. Because the case label expressions are not constant the compiler cannot verify that the values of the case labels are distinct, so this is a possibility which must be catered to. This runs counter to most programmers’ intuition about the switch statement in a couple of ways. Most programmers would be surprised to learn that changing the order of their case blocks changed the meaning of their program. To turn it around, it would be surprising if the expression being switched on was equal to an expression in a case label, but control didn’t go to that label.

 

Secondly, allowing non-constant expressions as case labels lets in some significantly less desirable code as well. For example, this code would also be legal:

 

        switch (obj) {

        case myFirstObject:  ...; break;

        case mySecondObject: ...; break;

        case EraseMyHardDrive(): ...; break;

        default:             ...; break;

        }

In C#, any non-constant expression can yield side effects. There is currently no notion of a non-constant expression which does not have side effects in C#. Seeing code like this will leave the reader scratching their head wondering when their hard drive will get erased.

 

And lastly, programmers coming from a C/C++ background will expect that the execution of a switch statement will be fast and that it will take about the same time to reach any particular case label. When the case labels are constant integral values (or strings), the compiler can do some great optimizations to make the execution speed of the construct match the coder’s expectation. With non-constant case labels, getting to the 100th case label would take significantly more time than getting to the 10th case label. Again, this would be surprising to the programmer.

 

Peter

C# Guy

Posted by peterhal | 1 Comments

Many Questions: Switch On Enum

Just a quick one this week: 

 

Why is it that you cannot use enum constants in a switch statement's cases without first casting them to type int?

 

Often you will want to use Enum constants as case labels in switch statements. Sometimes, the compiler will complain and require a cast to int on each case label. This will look something like this:

 

    enum Color { Red, Green, Blue };

 

    int i = ...;

    switch (i)

    {

    // cast required! Aarg!

    case (int)Color.Red:    break;

    case (int)Color.Green:  break;

    case (int)Color.Blue:   break;

    }

 

The confusion stems from a subtle difference between C++ and C#. In C++, values of enum type are implicitly convertible to int. In C#, conversions between an enum type and its underlying type are explicit and require a cast. To maintain consistency, the requirement for the cast carries over into the use of enums in switch statements.

 

However, you can use expressions of enum type as case labels without a cast, but only in a type safe way. The rule is that the type of the expression being switched on, the ‘governing type’ of the switch statement in C# language spec terminology, must match the type of the expressions in the case labels.

 

For example:

 

    enum Color { Red, Green, Blue };

 

    Color c = ...;

    switch (c) // Governing type is ‘Color’ not ‘int’ so ...

    {

    // ... no cast required

    case Color.Red:    break;

    case Color.Green:  break;

    case Color.Blue:   break;

    }

 

 

One of the design goals for enums in C# was to treat them as first class types that were truly distinct from their underlying type. This is one of the subtle ways that this decision manifests itself in the language.

 

Peter

C# Guy

Posted by peterhal | 14 Comments

Many Questions: Generics Variance

One of the main benefits of the addition of generics to C# is the ability to easily create strongly typed collections using types in the System.Collections.Generics namespace. For example, you can create a variable of type List<int>, and the compiler will check all accesses to the variable – ensuring that only ints are added to the collection. This is a big usability improvement over the untyped collections available in version 1 of C#.

 

Unfortunately, strongly typed collections have drawbacks of their own. For example, suppose you have a strongly typed List<object> and you want to append all the elements from a List<int> to your List<object>. You would like to be able to write code like this:

 

            List<int> ints = new List<int>();

            ints.Add(1);

            ints.Add(10);

            ints.Add(42);

            List<object> objects = new List<object>();

 

            // doesn’t compile ‘ints’ is not a IEnumerable<object>

            objects.AddRange(ints);

 

In this case, you would like to treat a List<int> which is also an IEnumerable<int>, as an IEnumerable<object>. This seems like a reasonable thing to do – as int is convertible to object. It is very similar to being able to treat a string[] as an object[] as you can do today. If you find yourself in this situation, the feature you are looking for is called generics variance – treating an instantiation of a generic type (in this case IEnumerable<int>) as a different instantiation of that same type(in this case IEnumerable<object>).

 

C# doesn’t support variance for generic types, so when encountering cases like this you will need to find a workaround in your code. If you do encounter this kind of problem, there are a couple of techniques you can use to workaround the problem. For the simplest cases, like the case of a single method like AddRange in the example above, you can declare a simple helper method to do the conversion for you. For example, you could write this method:

 

    // Simple workaround for single method

    // Variance in one direction only

    public static void Add<S, D>(List<S> source, List<D> destination)

        where S : D

    {

        foreach (S sourceElement in source)

            destination.Add(sourceElement);

    }

 

    ...

    // does compile

    Add<int, object>(ints, objects);

 

This example shows the some characteristics of a simple variance workaround. The helper method takes 2 type parameters, for the source and destination, and the source type parameter S has a constraint which is the destination type parameter D. This means that the List<> being read from must contain elements which are convertible to the element type of the List<> being inserted into. This allows the compiler to enforce that int is convertible to object. Constraining a type parameter to derive from another type parameter is called a ‘naked type parameter constraint’.

 

Defining a single method to workaround variance problems is not too bad. Unfortunately variance issues can become quite complex quite quickly. The next level of complexity is when you want to treat an interface of one instantiation as an interface of another instantiation. For example, you have an IEnumerable<int>, and you want to pass it to a method which only takes an IEnumerable<object>. Again, this makes some sense because you can think of an IEnumerable<object> as a sequence of objects, and an IEnumerable<int> is a sequence of ints. Since ints are objects, a sequence of ints should be treatable as a sequence of objects. For example:

 

        static void PrintObjects(IEnumerable<object> objects)

        {

            foreach (object o in objects)