What's the difference, part one: Generics are not templates

What's the difference, part one: Generics are not templates

Rate This
  • Comments 32

Because I'm a geek, I enjoy learning about the sometimes-subtle differences between easily-confused things. For example:

  • I'm still not super-clear in my head on the differences between a hub, router and switch and how it relates to the gnomes that live inside of each.
  • Hunks of minerals found in nature are rocks; as soon as you put them in a garden or build a bridge out of them, suddenly they become stones.
  • When a pig hits 120 pounds, it's a hog.

I thought I might do an occasional series on easily confounded concepts in programming language design. 

Here’s a question I get fairly often:

public class C
{
  public static void DoIt<T>(T t)
  {
    ReallyDoIt(t);
  }
  private static void ReallyDoIt(string s)
  {
    System.Console.WriteLine("string");
  }
  private static void ReallyDoIt<T>(T t)
  {
    System.Console.WriteLine("everything else");
  }
}

What happens when you call C.DoIt<string>? Many people expected that “string” is printed, when in fact “everything else” is always printed, no matter what T is.

The C# specification says that when you have a choice between calling ReallyDoIt<string>(string) and ReallyDoIt(string) – that is, when the choice is between two methods that have identical signatures, but one gets that signature via generic substitution – then we pick the “natural” signature over the “substituted” signature. Why don’t we do that in this case?

Because that’s not the choice that is presented. If you had said

ReallyDoIt("hello world");

then we would pick the “natural” version. But you didn’t pass something known to the compiler to be a string. You passed something known to be a T, an unconstrained type parameter, and hence it could be anything. So, the overload resolution algorithm reasons, is there a method that can always take anything? Yes, there is.

This illustrates that generics in C# are not like templates in C++. You can think of templates as a fancy-pants search-and-replace mechanism. When you say DoIt<string> in a template, the compiler conceptually searches out all uses of “T”, replaces them with “string”, and then compiles the resulting source code. Overload resolution proceeds with the substituted type arguments known, and the generated code then reflects the results of that overload resolution.

That’s not how generic types work; generic types are, well, generic. We do the overload resolution once and bake in the result. We do not change it at runtime when someone, possibly in an entirely different assembly, uses string as a type argument to the method. The IL we’ve generated for the generic type already has the method its going to call picked out. The jitter does not say “well, I happen to know that if we asked the C# compiler to execute right now with this additional information then it would have picked a different overload. Let me rewrite the generated code to ignore the code that the C# compiler originally generated...” The jitter knows nothing about the rules of C#.

Essentially, the case above is no different from this:

public class C
{
  public static void DoIt(object t)
  {
    ReallyDoIt(t);
  }
  private static void ReallyDoIt(string s)
  {
    System.Console.WriteLine("string");
  }
  private static void ReallyDoIt(object t)
  {
    System.Console.WriteLine("everything else");
  }
}

When the compiler generates the code for the call to ReallyDoIt, it picks the object version because that’s the best it can do. If someone calls this with a string, then it still goes to the object version.

Now, if you do want overload resolution to be re-executed at runtime based on the runtime types of the arguments, we can do that for you; that’s what the new “dynamic” feature does in C# 4.0. Just replace “object” with “dynamic” and when you make a call involving that object, we’ll run the overload resolution algorithm at runtime and dynamically spit code that calls the method that the compiler would have picked, had it known all the runtime types at compile time.

  • 'Why not: "C++ has other confusing areas. For instance, C++ uses the C# ~destructor syntax, but the behavior is (as we know) very different. This is very confusing for C# programmers."'

    Uh, because C++ was written first, and C#'s syntax was clearly based on C++, not vice-versa. C++ could not have been written differently in order to avoid confusion with C#, as C# syntax had not been invented yet. It has something to do with the linearity of time, cause and effect, and other related concepts.

  • Karellen, The comment you quoted had nothing to do with decisions made at the time the language was authored. It had everything to do with the experience of a person who has worked in one ofthe languages and sees the other for the first time.

    Given recent (past 5 year) trends, I am willing to bet that there are significantly more C# programmers who have never analyzed a single line of C++ code, than there are C++ programmers twoh have never analyzed a single line of C# code

  • And I think I know the answer Eric would give to Karellen:  Contrary to (still, unfortunately) popular belief, the C# language designers were not attempting to replicate the functionality of C++ or even use it as a model.  They were creating a new language, period, and any resemblance to any other language is purely coincidental.

    C# generics have the same syntax as Java generics, which work similarly.  So what might be confusing to one class of users is perfectly natural to others.  The bottom line is that the world at large does not necessarily see everything the same way you do - most of us have learned to deal with that.

  • Ben Voigt [C++ MVP] said: "With templates, I get efficient array iteration with no bounds checking.  With generics, I get virtual calls to MoveNext and Current."  

    I wouldn't assume that. If the JIT can figure out statically what the type is, it may in theory use static calls instead of virtual, and it may even do some inlining. And if it doesn't do it today, it may do in a future version. So the architecture of generics doesn't rule out optimisation. On the contrary, by capturing a high-level description of our intentions, it potentially has *greater* scope for optimisation, by doing it later, when the maximum amount of information is available.

    (I don't use Java much in anger, but I believe Sun has been very agressive with this kind of thing in the 'server' flavour of their VM.)

    Re: the syntax debate, regardless of the (very different) details of how generics/templates work in Java, C# and C++, there is one thing they all have in common. They all need a way to specify a list of type parameters or arguments: <T1, T2> or <int, double> - so naturally they all use the same syntax for that basic idea. It would have been perverse, given C++ is the most widely used generic system and is their closest syntactical ancestor language, for them to use different syntaxes for a particular small piece of the puzzle that happens to be common to them all, even though the rest of it differs. I wouldn't expect C# to invent a different symbol for addition because its operator overloading works differently. Also note that neither C# or Java uses the keyword 'template' to introduce a declaration, so they don't pretend to be precisely the same as templates.

  • @James Miles - That's interesting (to me, anyway). I've boiled it down to:

    static void Main()

    {

       Do(() => { throw new Exception(); });

    }

    static void Do(Action a)

    {

       Console.WriteLine("Expected");

    }

    static void Do(Func<string> d)

    {

       Console.WriteLine("Unexpected (but actual)");

    }

    In other words, a lambda has a throw making the end unreachable, and that makes it match Func<ANYTHING> better than Action, which does seem strange.

    Even more strangely, you can get your expected behaviour by sticking a 'return;' at the end of the lambda:

    Do(() => { throw new Exception(); return; });

    Though I bet Eric will tell us that there's some other situation where we'd be amazed if it *didn't* behave like this! :)

  • This also helps me clarify overload resolution with respect to named parameters in C#4.0

    In particular, I've been scratching my head over this code bit :

    http://msmvps.com/blogs/jon_skeet/archive/2009/07/03/evil-code-of-the-day.aspx

    If you realize that overload resolution happens at compile time, along with the specific overload resolution rule for this case as specified in the C# spec, it's crystal clear. Thanks for the post!

  • "Even more strangely, you can get your expected behaviour by sticking a 'return;'"

    Daniel Earwicker - Yes it is strange isn't it ;)

  • I think you've missed the core difference between a template and a generic -- at least for C++ people.  The parameter of a template can be anything -- a number, a type, a global variable, etc.  And it can apply to a free function or a class.  This allows us to do computations at compile time.  See boost and loki for examples of the power this allows us -- inline LL parsers, simple lambda functions, and precompiled regular expressions for just a few examples.  These things are not possible using generics -- they require other language features.  They aren't necessarily better in all cases, but they are certainly different at a more fundamental level than pointed out in the article.

  • Joel makes a good point -- C++ templates are themselves a pure-functional Turing-complete language. A C++ compiler can compute anything, but you cannot tell whether the C++ compiler will be able to compile a program without running it (here's a proof: http://www.netvor.sk/~umage/wtf/C++%20Templates%20are%20Turing%20Complete.pdf).

    So while some might enjoy the flexibility of a language-within-a-language that templates offer, others dislike the unmaintainability and undecipherable error messages that accompany it. I think C# generics strike a happy medium between C++ templates and Java generics.

  • > I'm still not super-clear in my head on the differences between a hub, router and switch and how it relates to the gnomes that live inside of each.

    You have a right to be confused, despite the (completely valid) definitions given above, they keep changing what different bits of equipment do.

    So, theoretically, Hubs are layer 1 (hardware) level devices, Switches are layer 2, and Routers are layer 3. Which doesn't really account for Layer 3 Switches, which are like Routers but faster. Once upon a time, there were just Hubs and Routers. Then someone came up with Switches, which were designed to make networks more efficient by only sending data where it was needed. Then 'Switch' became a marketting term and at that point yu can kiss any technical validity goodbye.

    Regarding the gnomes: Hub gnomes have been lobotomised, Router gnomes are a little slow, but have huge memories. Switch gnomes are hyperactive and schizophrenic.

  • @Joel Redman - I think Eric does explain that essential difference, in this quote:

    "When you say DoIt<string> in a template, the compiler conceptually searches out all uses of “T”, replaces them with “string”, and then compiles the resulting source code."

    The great value of CLR generics is that they allow us to define a generic type or method and expose it to other languages, without those languages needing to reparse the code of the language in which the generic entity was defined. So C++/CLI, C#,VB.NET, F# and countless other languages can share the same generic libraries.

    And although I think C++ compile time metaprogramming is great, like many things in C++ it is mostly great relative to the other limitations of C++. std::tr1::shared_ptr is great if you don't have GC! And similarly, C++ doesn't have a standard API for type reflection or code generation.

    The CLR has an extremely rich and powerful model for doing reflection and code generation at runtime, and the "wizards" in the C# world frequently come up with new ways to perform "magic" using this, and (perhaps surprisingly) still maintaining static type safety and high performance. It allows them to effectively extend the language, and blur the boundary between compilation and interpretation, much as templates do for C++.

    So despite C# lacking the full power of templates, it is no less of a playground for advanced library authors who enjoy generating and picking through incomprehensible error messages.

  • @ Daniel

    He does say that, but completely misses the major implications of it.  The template is fundamentally more powerful than the C# generic, but at the same time, it cannot be used cross-language.  It isn't better or worse.

    Furthermore, shared_ptr is more suitable for C++ applications than mark and sweep garbage collection since it is completely synchronous.  The lifetime of every scope-bound object in the system has a predictable lifetime, and when it goes out of scope or gets deleted clean up is done immediately.  While this may not be quite as efficient overall as the C# model, for many applications it is preferable to be predictable.  Real time apps and OS's come to mind.  Shared_ptr is a compromise in this direction, and certainly has its place.  Again, for some apps, better, for others worse.

    That's why we like .NET.  It allows us to use the paradigms we need to get the job done.

  • "C# generics have the same syntax as Java generics, which work similarly." - uh? Java generics and C# generics work completely differently!

    Java generics are purely a compile time illusion: there is no such thing, at runtime, as an ArrayList<String> - just an ArrayList. The compiler keeps track of the compile time types and gives you some helpful error messages if you bypass type-safety, but you can bypass the error messages - eg by casting your ArrayList<String> to an ArrayList and then to an ArrayList<Object>, and put in something other than strings - and they will be *accepted* at runtime because the runtime isn't even aware that ArrayList was generic in the first place!

    Also, Java allows types to be partially unbound, so ArrayList<?> is also a legal type which you can have instances of and perform operations on.

    C# does not have any concept of List<?> at compiletime except as an argument to typeof(). Not only that, but it doesn't have a concept of List<?> at runtime, either. Any given instance of List<T> has a *particular* T that it applies to. Attempting to cast that list to a List of a different<T> would fail. Attempting to put other kinds of object other than String into a List<String> would be impossible (because to do so you'd have to write code that would fail with a class cast exception before it even got to the Add() call).

    Java and C# generics are just as different as C# generics are from C++ templates. They use the same syntax because they have more in common than different. And that's as it should be. Sure, it's confusing when you encounter one of the comparatively obscure things that they do differently. But that's outweighed by the fact that you can get a general sense of the code based on what they are doing the same.

    A halfway decent programmer learns a new language, initially, by learning how to map its syntax to concepts he or she is familiar with from other languages, ANYWAY. Even if C# had decided to say that generics should be expressed as List~T~, C++ programmers coming to C# would still mentally map that to List<T> to start with to be able to get the basic idea of what it's for. And only AFTER that get a deep enough understanding to grasp the subtle differences.

  • "They were creating a new language, period, and any resemblance to any other language is purely coincidental."

    Come on. We all know other C-style languages influenced C#. They could've chosen the VB syntax for generics, but they didn't. It's not purely coincidental, if it were C# wouldn't look like it did.

  • @author:

    Templates like in C++ are a bit more than fancy-pants search&replace you know :)

    I'm just writing to ask you:

     What have stopped the C#/CLR designer from actually allowing to specialize the generic methods? Having:  method<T>(T arg), method(string arg), method(int arg) a compiler knows about ALL possible overloads. Easiest way would be to emit a prologue for method<T>(T) that checks "if arg is string" "if arg is int" and thus call the appropriate 'overload' at runtime even if someone calls method((object)string). Now the programmers must do that all the time by hand if they want to have any specialization-like features :/

    IMHO, not the C# compiler shall, but rather the CLR should perform such lookup and choose the right, tightest match at runtime, but with all other languages on the platform, I can understand the option taken there. But why not in C#? Rarely you have more than 1-6 specializations, so not that big performance hit at all.

    @ComeOn

    Aggreed with that. It's very name and symbol claims that. Early marketing joke was, that C# is actually C++++ (two rows with ++). The evolution of the language shows heavy impact of changes made to independently evolving C++ and Java. heh, I ven remember that when C# was born, it was claimed that it is being created in a such way, that existing C++ and Java programmers will not notice much difference and can start coding in C# right away with little fuss. Maybe it's still written somewhere on Microsoft's pages?

Page 2 of 3 (32 items) 123