Insomnia and being a workaholic is an interesting combination. It is amazing how much work can be accomplished in the eight hours before everyone else comes to work. J

Anyways, I spent some time working on specifying generics in C++ yesterday so I figured I'd write about that today. Perhaps the most important message regarding generics is that they are not templates. That is evident in the C++ language design as it supports both generics and templates. At the PDC, I heard comments such as "generics are templates done right". This, sadly, is a misinformed opinion that too many people share.

Generics and templates have a minor overlap in functionality. They both make it possible to create parameterized types which make it possible to create type safe collections. That's where the similarity stops. First, let's look at some features templates allow and how they are interpreted. I'll assume that you know the syntax for templates.

  • Templates are instantiated at compile-time with the source code.
  • Templates are type safe.
  • Templates allow user-defined specialization.
  • Templates allow non-type parameters.
  • Templates use "lazy structural constraints".
  • Templates support mix-ins.

Templates are instantiated at compile-time. This is huge. Anyone who really wants to understand the limitations of either generics or templates needs to know this. This means that the same template instantiated in two different assemblies actually have different types. The CLR includes the strong-name of the assembly in the type, and thus [A]vector<int> is different from [B]vector<int> even if they were instantiated from the same template. A template is always emitted to an assembly after it has been specialized. So, really a template disappears after compilation. It is not possible to instantiate a template from another assembly. Rather, to instantiate a template, the programmer needs the source code for the template.

Templates are type safe. Templates were designed to replace what many people were using macros for. Templates are fully type checked by the compiler. In no way is there any textual substitution or macro-like behavior in templates. Templates are indeed verifiable as long as the implementation of the template does not use unverifiable features.

Templates allow user-defined specialization. First, let me explain the difference between specializations and instantiations. Consider the following code:

      Collection<int> a;
      Collection<int> b;
      Collection<double> c;
      Collection<X> d;

Here, there are four instantiations of a template but only three specializations. The first two variables share the same specialization. When defining a template, every template has a "primary template". This is the most general template that the compiler will use unless there is a better explicit specialization or partial specialization of the template. The usefulness of user-defined specializations cannot be understated – it allows the programmer to choose a different implementation for different template arguments. For instance, if templates were used in a math library, this allows the programmer to create different implementations for integer and floating-point calculations.

Without user-defined specializations, the compiler creates all specializations from the primary template. These specializations can still generate fairly different code from each other. For instance, one specialization may inline all function calls involving template parameters. Another specialization may not inline the same function calls.

Templates allow non-type parameters. Non-type parameter like integers or template template parameters allow templates to have significant expressive power. Constant values in specializations are known by the optimizer and therefore can be passed into the code via a number of data flow analyses such as constant propagation and copy propagation. The resulting code is far more efficient than one that must rely on accessing variables.

The combination of specialization and non-type parameters have actually enabled an entire programming paradigm in C++ known as template meta-programming. While it is entirely possible to go overboard with the capabilities templates afford, there are numerous useful techniques.

Templates use "lazy structural constraints". What happens when a template relies on a function and it's not there? When writing the template, the developer can call member functions on the template parameter, use operators, or call functions that use the template parameter. The definition of the template is kept by the compiler until later needed for a specialization. When the compiler creates a specialization, if a function call or an operator has no meaning for a particular parameter, then a compile-time error occurs. In short, the constraint for a template parameter is that it supports a particular operation (i.e. it has a function with a suitable overload or it has a valid operator). Template constraints are enforced at specialization rather than at definition.

I use the phrase "lazy structural constraint" with much liberty. The notion I am trying to get across is that the constraint is enforced lazily because compiler diagnostics appear at specialization. They are structural constraints because they require some kind of support from a type parameter that is not necessarily dependent on a subtype relationship.

Templates support mix-ins. A class template can inherit from a type parameter. This enables a number of programming patterns, such as mix-ins and policy based programming. Generics do not support directly inheriting from a type parameter.

Now, with that brief summary of templates out of the way, let's turn to a brief summary of generics.

  • Generics are instantiated at run-time by the CLR.
  • Generics are also type safe.
  • Generics are cross-language.
  • Generics do not allow user-defined specialization.
  • Generics do not allow non-type parameters.
  • Generics use subtype constraints.

Generics are instantiated at run-time by the CLR. Unlike templates, a generic defined in source code is emitted into MSIL as a generic. The compiler does not specialize generics. A generic type thus has one assembly to which it belongs. A programmer who wants to create an instantiation of a generic must identify which assembly the generic comes from (either via importing metadata or using the current assembly). When the runtime specializes a generic, it creates one specialization for all ref classes. Each value type will have its own specialization.

Generics are also type safe. Generics were designed to be verifiable (meaning that the MSIL could be proved type safe). Like templates, generics are only unverifiable if they use unverifiable features.

Generics are cross-language. By far the biggest advantage to templates is that producers of cross-language frameworks need to use generics instead of templates. While generics are not Common Language Specification (CLS) compliant now, it is expected that they will be at some point in the future.

Generics do not allow user-defined specialization. As generics were designed as a runtime service, the designers of generics felt that specialization was tied too much to the semantics of a particular language. Thus, when writing a generic, it is only possible to write it once. This is like only being allowed to write a primary template.

Generics do not allow non-type parameters. The primary design goal for generics was creating parameterized collections. Most collections do not require non-type parameters, and thus the designers of generics did not include this feature.

Generics use subtype constraints. This is the big one. It is the mechanism by which generics are implemented on the CLR. First, look at the following code:

      generic<typename T>
      ref class R {
        void f(T t) {
          t->g();
        }
      };

How do we know that the call to g in the function f is valid? With templates, we'd check at specialization. With generics, it's up to the runtime to determine that this is valid. In order to verify that a generic class is valid, the runtime needs more information. Thus, we must fix the above definition as follows:

      generic<typename T>
      where T : IG
      ref class R {
        void f(T t) {
          t->g();
        }
      };

With generics, overload resolution is done at the point of definition. Thus, the call to g is done by looking for the name g in the constraints for T. As there is only one constraint, g must be a member of IG. The g function is called through the interface rather than directly on the variable. Consider the following example:

      interface class IMethod {
        void f();
      };
 
      ref struct R : IMethod {
        virtual void g() = IMethod::f {
          System::Console::WriteLine("R::g");
        }
 
        void f() {
          System::Console::WriteLine("R::f");
        }
      };
 
      generic<typename X>
      where X : IMethod
      void G(X x) {
        x->f();
      }
 
      template<typename X>
      void T(X x) {
        x->f();
      }
 
      int main() {
        R^ r = gcnew R;

        G(r);
        T(r);
      }

With generics, the call to f is done through the interface IMethod. With templates, the call to f is done directly on the class R. Thus, the output of this program is:

      R::g
      R::f

Explicit overrides (used for explicit interface implementation in other languages) are not the only way this difference could be surfaced. Function overloading will change too. Overloads within a generic only consider the constraints as possible arguments, whereas with templates the compiler waits until specialization so it only needs to do overload resolution with the real type.

An unfortunate consequence of subtype constraints with interfaces is that not all useful functions in a class can be contractually guaranteed through an interface. An interface only demands virtual functions. So non-virtual functions, static functions, static conversion functions, static operators, and constructors cannot be used on a generic type parameter. As the CLS requires operators and conversions to be static member functions, generics cannot make use of operator overloading on generic type parameters. This means that a type used in a sorted collection needs to implement a specific interface rather than simply provide the less-than operator. That is a significant drawback if you're used to templates, and this is why the Whidbey version of the frameworks is updating all the built-in types to implement IComparable<T>.

Now with all of that background, here is a comparison of templates and generics:

  Generics Templates
Constraint mechanism Subtype constraints Lazy structural constraints
Allows explicit specialization No Yes
Allows partial specialization No Yes
Type identity of specialization Globally unique Unique to each assembly
Cross language facility Yes No
Allowed parameters Ref class, value class only All types and non-type
Name lookup and binding At definition, to constraints At specialization, to type

Certainly, there will be evangelists for either option. The best option for C++ is to support both mechanisms. Having both templates and generics satisfies anyone who believes one is better than the other. Of course, any pragmatic programmer will realize that having both gives the programmer the ability to use the right feature to solve the problem at hand. Both generics and templates have shortcomings, but using both features together actually yields an even more expressive language.

A very compelling scenario is using templates to create highly efficient data structures, but exposing that type at the assembly boundary through a generic interface. This is similar to a factory pattern that uses private types that inherit from public interfaces. Using this pattern, a specialized C++ collection class can take advantage of frameworks APIs that use the interface and allows other languages to make use of the type through the interface.

This technique is exactly how the STL.NET collection library will be implemented. The collections will be C++ templates that employ the STL design of iterators and separate algorithms. The collections will implement generic interfaces such as IList<T>. I think that the potential for combining templates and generics is great, and we're just starting to scratch the surface. Much of uses for combining templates and generics were driven by Anson Tsao, Martyn Lovell, and Eric Niebler while they were investigating STL.NET.

Hopefully, this was a lucid (after all I am not sleeping) explanation of the fundamental differences between generics and templates. As both features are significant, I very well could write a dozen more pages. As I did with handles, I will later spend time writing about the design rationale behind both generics and templates.