Why are anonymous types generic?

Why are anonymous types generic?

Rate This
  • Comments 16

Suppose you use an anonymous type in C#:

var x = new { A = "hello", B = 123.456 };

Ever taken a look at what code is generated for that thing? If you crack open the assembly with ILDASM or some other tool, you'll see this mess in the top-level type definitions

.class '<>f__AnonymousType0`2'<'<A>j__TPar','<B>j__TPar'>

What the heck? Let's clean that up a bit. We've mangled the names so that you are guaranteed that you cannot possibly accidentally use this thing "as is" from C#. Turning the mangled names back into regular names, and giving you the declaration and some of the body of the class in C#, that would look like:

[CompilerGenerated]
internal sealed class Anon0<TA, TB>
{
    private readonly TA a;
    private readonly TB b;
    public TA A { get { return this.a; } }
    public TB B { get { return this.b; } }   
    public Anon0(TA a, TB b)
    { this.a = a; this.b = b; }
    // plus implementations of Equals, GetHashCode and ToString
}

And then at the usage site, that is compiled as:

var x = new Anon0<string, double>("hello", 123.456);

Again, what the heck? Why isn't this generated as something perfectly straightforward, like:

[CompilerGenerated]
internal sealed class Anon0
{
    private readonly string a;
    private readonly double b;
    public string A { get { return this.a; } }
    public double B { get { return this.b; } }   
    public Anon0(string a, double b)
    { this.a = a; this.b = b; }
    // plus implementations of Equals, GetHashCode and ToString
}

Good question. Consider the following.

Suppose you have a library assembly, not written by you, that contains the following types:

public class B
{
    protected class P {}
}

Now, in your source code you have:

class D1 : B
{
    void M() { var x = new { P = new B.P() }; }
}

class D2 : B
{
    void M() { var x = new { P = new B.P() }; }
}

We need to generate an anonymous type, or types, somewhere. Suppose we decide that we want the two anonymous types - which have the same types and the same property names - to unify into one type. (We desire anonymous types that are structurally identical to unify within an assembly because that enables scenarios where multiple methods use generic type inference to infer the same anonymous type; you want to be able to pass instances of that anonymous type around between such methods. Perhaps I'll do an example of that in the new year.)

Where do we generate that type? How about inside D1:

class D1 : B
{
    [CompilerGenerated]
    ??? sealed class Anon0 { public P P { get { ... } } ... }
    void M() { var x = new { P = new B.P() }; }
}

What is the desired accessibilty of Anon0? It cannot be private or protected, because then D2 cannot see it. It cannot be either public or internal, because then you'd have a public/internal type with a public property that exposes a protected type, which is illegal. (Nor can it be either "protected and internal" or "protected or internal" by similar logic.) It cannot have any accessibility! Therefore the anonymous type cannot go in D1.  Obviously by identical logic it cannot go in D2. It cannot go in B; it's just an assembly. The only remaining place it can go is in the global namespace. But at the top level an internal type cannot refer to P, a protected type. P is only accessible inside a derived class of B.

But we can put the anonymous type at the top level if it never actually refers to P. If we make generic class Anon0<TP> and construct it with P for TP, then P only ever appears inside D1 and D2, and yet the types unify as desired.

Rather than coming up with some weird heuristic that determined when anonymous types needed to be generic, and making them normally typed otherwise, we simply decided to embrace the general solution. Anonymous types are always generated as generic types even when doing so is not strictly necessary. We did extensive performance testing to ensure that this choice did not adversely impact realistic scenarios, and as it turned out, the CLR is really quite buff when it comes to construction of generic types with lots of type parameters.

And with that, I'm off for the rest of the year. Air travel is too expensive this year, so I'm going to miss my traditional family Boxing Day celebration, but I'm sure it'll be delightful to spend some time in Seattle for the holidays. I hope you all have a safe and festive holiday season, and we'll see you for more fabulous adventures in 2011.

  • > We've mangled the names so that you are guaranteed that you cannot possibly accidentally use this thing "as is" from C#.

    And here was I, thinking this was just part of an ongoing campaign to increase C# literacy by giving out lessons via error messages. (along with the Expression<Func<...>> notation and the usage of fully qualified class names)

    BTW, I think C# errors should have levels, but not like warnings, but like courses. I got some level 300 errors recently trying to put a facade around IQueryable, and it didn't even have anonymous types! Should I suggest that via connect? ;-)

  • Does this mean that on the (perhaps rare) occasion of having two anonymous types with the same property names but not the same type, you need to generate less types?

  • Why are anonymous type eh.. anonymous?  Until now i have never found that particularly usefull.

    The usefull part lies in the Creation-On-The-Fly, mostly withing Linq. But why not allow me to name them on that verry spot? I wouldn't mind specifying the access modifiers too!

  • Ferdinand: Just use a regular class for that. You'll get the full flexibility of regular classes and be able to define more closely in which namespace it lies, its access modifiers, its constructors and so on. Either the on-the-fly syntax for "named" types would have to include every single detail that the real class/struct definitions do, and then what's the point, or you risk running into a wall. And besides, as soon as you pass these types to other methods or create them in many places, now you'd have to merge the extra metadata from the two places or risk creating incompatible types.

    You can create and set up ordinary classes in interesting ways with object initializers, which has almost the same syntax as the anonymous type creation syntax, just add the type name after "new".

  • Jesper, what you are saying is the consequence of anonymous classes being anonymous, but not the reason for their anonimity. I just can't think of a good reason why these classes must be anonymous. They seem pretty simple classes. Not like anonymous methods, that have to deal with closures. It seems like a waste of opportunity that you have a strong type, which has a name, but you can't use its name.

    (the article explains that they are in fact not so simple, but even then the resulting type could be given a name).

  • @Jesper: This echos Jon Skeet's suggestion (from his talk "If I Ruled the World") of "nonymous types" - that would be defined easily, like anonymous one, but be global. Their hash code and equals would work like anonymous types - delegate to the members.

  • Hi,

    I've been reading here for a while, but I've only recently taken the plunge from VB to C#, so forgive the slightly naive question.. I guess it means that in the following, x and y are different / incompatible types (Anon0<string, double>, vs Anon1<double, string>)?

    var x = new { A = "hello", B = 123.456 };

    var y = new { B = 123.456, A = "hello" };

    Cheers,

    Mike

  • For those who want to do more with anonymous types, there's always Tuple, where you specify the members' types instead of their names. The problem with Tuples is that there's no language support for them, making them rather cumbersome to use. One of the features I'm hoping for in the next version of C# is support for tuple packing and unpacking to make use of anonymous types (in general, not the kind in today's article) much nicer.

  • @Gabe.

    Yeah Tuples help a little. But they're very limited in two ways:

    Type inference is not applied to constructors (I'd *really* love them to fix that) and you cannot specify the names, Both of these hurt readability.

    Also if tuples were baked into the language (a la f#) I would have preferred that they were structs (there being no serious escape analysis in the CLR at the moment) as accidental allocation on hot paths is a killer, but I appreciate I'm an edge case there.

  • @Shuggy: I agree about that - tuples should be as light as possible and thus should be structs. In my implementation of Tuples (used for a project in .net 3.5), they are structs, and have a factory Tuple.Create<T>() just for type inference.

  • ShuggyCoUk: Yes, the reason that tuples are so cumbersome is that you have to construct them with Tuple.Create to get type inferencing and that you have to use .Item1 and .Item2 to get their members. Adding a literal tuple syntax would fix the construction problem and adding unpacking would mean that you rarely ever have to type the member names, essentially solving both problems. Perhaps some merging of tuples with anonymous types would give you what you want, but it would take some thinking to get right.

    However, I'm not so sure I'd want them to be structs. For one project I was profiling I noticed that tuples were on a hot path so I decided to speed it up by making them structs (I copied the code out of Reflector and changed "class" to "struct"). I broke even on 2-tuples, but changing 3-tuples to structs made the code even slower. I didn't try 1-tuples because I don't use them and I figured there's no point in investigating anything bigger. Apparently for my application the copying overhead outweighed the GC overhead.

  • Have a happy holiday Mr. Lippert. :)

  • @Gabe: It all depends on how they're constructed and what they contain.

    If you've got code like this (using the hypothetical new syntax):

    int x, y, z = (1, 2, 3);

    Then allocating an object would be very bad

    If you've got code like this however:

    Tuple<long, long, long> SomeMethod();

    Then a class would mean the return value is 32 (or 64)-bit while a struct would mean it is 192 bit. Pass the tuple around a bit and your code is suddenly slow because of all this copying around.

  • @Mike C: they are different and incompatible, but that is not an artifact of the specific implementation described here. It is simply the way the language itself is defined:

    "Within the same program, two anonymous object initializers that specify a sequence of properties of the same names and compile-time types _in the same order_ will produce instances of the same anonymous type. "

  • This description doesn't make much sense to me. The CLR doesn't have a notion of type accessibility (only visibility and AFAICT that is a fairly meaningless concept from the CLR point of view).

    If you compile your example and then disassembly with ILDASM and replace the type parameter with "class [lib]B/P" and reassembly, you'll see that the resulting assembly verifies just fine.

Page 1 of 2 (16 items) 12