Implementing the virtual method pattern in C#, Part Three

Implementing the virtual method pattern in C#, Part Three

Rate This
  • Comments 18

(This is part three of a three-part series; part one is here; part two is here.)

Last time we saw how you could emulate virtual methods in a language that only had static methods by creating fields of delegate type, and then choosing what delegates go into the fields. However, this is not very space-efficient. Suppose there were a hundred virtual methods on Animal instead of two. That means that every class derived from Animal has a hundred fields, and in most of them, those fields are exactly the same, all the time. You make three hundred giraffes and each one of them will have exactly the same delegates in those hundred fields. This seems redundant and wasteful.

What the CLR actually does is it bundles those up into a virtual function dispatch table, or "vtable" for short. A vtable is a collection of delegates; the purpose of the vtable is to answer the question "if a virtual method was invoked on an object of this runtime type, which method should be called?"

sealed class VTable
{
  public readonly Func<Animal, string> Complain;
  public readonly Func<Animal, string> MakeNoise;
  public VTable(Func<Animal, string> complain, Func<Animal, string> makeNoise)
  {
    this.Complain = complain;
    this.MakeNoise = makeNoise;
  }
}

Suppose we want to construct vtables for Giraffe, Cat and Dog to answer the questions "if these methods were invoked on an object of compile-time type Animal but runtime type Giraffe/Cat/Dog, which methods should actually be called?" That's easy enough:

static VTable GiraffeVTable = new VTable(Giraffe.Complain, Animal.MakeNoise);
static VTable CatVTable = new VTable(Cat.Complain, Cat.MakeNoise);
static VTable DogVTable = new VTable(Dog.Complain, Animal.MakeNoise);

And now we can just have every class have a reference to the appropriate vtable. We rewrite the classes again as follows:

abstract class Animal
{
  public VTable VTable;
  public static string MakeNoise(Animal _this)
  {
    return "";
  }
  // No factory; Animal is abstract.
}
class Giraffe : Animal
{
  private static VTable GiraffeVTable = new VTable(Giraffe.Complain, Animal.MakeNoise);
  public bool SoreThroat { get; set; }
  public static string Complain(Animal _this)
  {
    return (_this as Giraffe).SoreThroat ? "What a pain in the neck!" : "No complaints today.";
  }
  public static Giraffe Construct()
  {
    // There are no more instance constructors; this just allocates the memory.
    Giraffe giraffe = new Giraffe();
    // Ensure that virtual method fields are initialized before other code is run.
    giraffe.VTable = Giraffe.VTable;
    // Now do the rest of the initialization that the constructor would have done.
  }
}
class Cat : Animal
{
  private static VTable CatVTable = new VTable(Cat.Complain, Cat.MakeNoise);
  public bool Hungry { get; set; }
  public static string Complain(Animal _this)
  {
    return (_this as Cat).Hungry ? "GIVE ME THAT TUNA!" : "I HATE YOU ALL!";
  }
  public static string MakeNoise(Animal _this)
  {
    return "MEOW MEOW MEOW MEOW MEOW MEOW";
  }
  public static Cat Construct()
  {
    Cat cat = new Cat();
    cat.VTable = Cat.VTable;
    // Now do the rest of the initialization that the constructor would have done.
  }
}
class Dog : Animal
{
  private static VTable DogVTable = new VTable(Dog.Complain, Animal.MakeNoise);
  public bool Small { get; set; }
  public static string Complain(Animal _this)
  {
    return "Our regressive state tax code is... SQUIRREL!";
  }
  public static string MakeNoise(Dog _this)  // Remember, we forgot to say "override"
  {
    return _this.Small ? "yip" : "WOOF";
  }
  public static Dog Construct()
  {
    Dog dog = new Dog();
    dog.VTable = Dog.VTable;
    // Now do the rest of the initialization that the constructor would have done.
  }
}

And now we have to rewrite the call sites again:

string s;
Animal animal = Giraffe.Create();
s = animal.VTable.Complain(animal);
s = animal.VTable.MakeNoise(animal);
animal = Cat.Create();
s = animal.VTable.Complain(animal);
s = animal.VTable.MakeNoise(animal);
Dog dog = Dog.Create();
animal = dog;
s = animal.VTable.Complain();
s = animal.VTable.MakeNoise();
s = Dog.MakeNoise(dog);

That is basically how the virtual method pattern is implemented behind the scenes. The C# compiler analyzes every method that is declared as abstract, virtual or override and figures out which "vtable slot" that method corresponds to; it then emits metadata that has enough information to allow the runtime to lay out a vtable. For every call site, the C# compiler works out whether to call an instance method directly or do an indirection through the vtable, and emits the code accordingly. When the object is allocated at runtime, the CLR consults the metadata to figure out which vtable to plug into the hidden vtable field of each object. The vtable is of course not actually a class, and its fields are not actually delegates; the CLR has a less heavyweight way to represent a reference to a method internally. But the concept is the same.

Obviously I've omitted a great many details here. For example, it is not legal to invoke a virtual or instance method with a null receiver, but it is legal for a static method to be called with a null first argument. By rights, when rewriting the call sites to go from instance or virtual calls to static calls, I should have been putting in null checks as well. I've also not at all described how "base" calls work (though perhaps you can figure it out from this sketch). I've also not said how this works for boxed value types that override virtual methods of object, though again, perhaps you can figure it out. I've ignored issues of security and accessibility. I've omitted describing what happens when a class creates a "new virtual" method. And I've said nothing about how interfaces work; the CLR uses special techniques to solve interface overriding problems that I might discuss in a later blog.

The real takeaway here is that by choosing to embed the "virtual and instance calls" pattern in the language, you get to take advantage of its power without having to go through all the pain of declaring those fields and setting up those vtables yourself. The compiler and CLR teams get to experience that pain for you; you're welcome.

You also get safety! The pattern as implemented here depends on everyone following the rules of the pattern carefully. For instance, what stops you from calling giraffe.VTable.Complain(dog)?  Nothing, that's what. The C# compiler ensures that it is impossible to actually call a virtual method with a "this" that does not match the type expected by the function, but in our pattern-based implementation of virtual methods that error is not trapped until the conversion fails at runtime, if it is caught at all.

(This is part three of a three-part series; part one is here; part two is here.)

  • interestingly if you take your previous part's code and instead transform it so that you only have one type, but allow changing (a copy of) the vtable after the empty constructor you have something broadly like prototype based inheritance.

  • Are there any differences in this mapping compared the mapping that allows you to emulate C++ virtual methods in C using structs and function pointers?

    C++ requires you to think about how to deal with vtables in a world with multiple inheritance. Aside from that, there's no particularly interesting difference between this sketch and how you'd do something similar in C. -- Eric

    Well, except using delegates instead of function pointers.

    Right; C# does not allow you to manipulate references to methods directly; it requires that they be wrapped up in a delegate. However if we so chose we could add "function pointer types" to C# and generate "calli" instructions to invoke them; it seems unlikely that this feature will be implemented any time soon though. -- Eric

  • > sealed class VTable

    > {

    >   public readonly Func<Animal, string> Complain;

    >   public readonly Func<Animal, string> MakeNoise;

    >   public AnimalVTable(Func<Animal, string> complain, Func<Animal, string> makeNoise)

    >   {

    >     this.Complain = complain;

    >     this.MakeNoise = makeNoise;

    >   }

    > }

    I think your classes name should be AnimalVTable.

  • > For example, it is not legal to invoke a virtual or instance method with a null receiver, but it is legal for a static method to be called with a null first argument

    It is legal in the CLR to call (but not callvirt) a non-virtual method with a null 'this'; the method runs until it tries to dereference the this pointer, at which point a NullReferenceException is thrown.

  • How does the VTable deal with other functions (other than Complain and MakeNoise, for example).  Suppose I had a Tiger class, to which I added a Bite() function, which wasn't in Animal or any of the other classes, what would the VTable look like?  Or is that a discussion for another day?

  • Please do discuss calling interface methods in another post.

  • > It is legal in the CLR to call (but not callvirt) a non-virtual method with a null 'this'; the method runs until it tries to dereference the this pointer, at which point a NullReferenceException is thrown.

    C# emits callvirt in the vast majority of cases actually. (Including non-virtual calls) I believe the only exceptions are static and calls to base class methods.

    This ensures a null-reference exception happens at the "appropriate" point. (I assume)

  • Here's another vote for the interfaces post.  I've always wordered how you could make multiple interfaces map onto the same methods in a generic fashion.

  • -> "the CLR uses special techniques to solve interface overriding problems that I might discuss in a later blog"

    Please do so, as they are more interesting then simple virual methods,  I can see how to make them work, but not in a fast way...

  • I'm not sure what Eric means by "interface overriding problems" - the most common one is name clashes, which is simple at the runtime level since an interface method is (or should be?) called by a vtable slot [1] - but he could also mean multipily inherrited interfaces, aka. the dreaded diamond, which causes the issue of how to implement interface references and calling through an interface, at least one of interface downcasting (safe upcasting is always going to be expensive), interface reference equality, or method invocation seems to need to be more complicated, and I'm curious what they did.

    I feel it's likely that they would want interface references to be binary equal to class references to make GC tracing more efficient, but I can't see how method calls work then, I don't think it's likely they do an interface vtable lookup at the call site. If they did it the C++ way of offsetting the reference to point to an extra vtable, then you have to deal with a lot of the problems of multiple-inheritance again. I'd love to see that post (series!)

    [1] At least at execution time, type safety complicates this at load time if the types cross assemblies.

  • John: If Tiger.Bite isn't a virtual function, it doesn't go into the vtable -- its calls are all just converted to regular static function calls. If Bite is added as a virtual function by Tiger, it goes into another slot in Tiger's vtable.

    In this example, though, it's not really possible because VTable is a sealed class. If this were a more realistic example, AnimalVTable would derive from ObjectVTable, and TigerVTable would derive from AnimalVTable:

    public class TigerVTable : AnimalVTable

    {

     public readonly Func<Tiger, string> Bite;

     public TigerVTable(Func<Animal, string> complain, Func<Animal, string> makeNoise, Func<Tiger, string> Bite)

       : base(complain, makeNoise)

     {

       this.Bite = bite;

     }

    }

  • Yet another vote for the internals of interfaces, please!

    Besides C++, other similar examples:

    "Linux maintains tables of registered device drivers as part of its interfaces with them. These tables include pointers to routines and information that support the interface with that class of devices."

    -- tldp.org/.../drivers.html 8.4 Interfacing Device Drivers with the Kernel

    i.e. a type of device is represented by a table of function pointers (a vtable). The significance of each slot in that table is determined by the category of device (character, block), which is like an abstract base class.

    JavaScript: the first paragraph of this post describes very well this situation:

    var make = function()

     return {

       foo: function() { ... },

       bar: function() { ... }

     };

    }

    So each call to make() allocates an object with its own copy of all the method slots. Wasteful! How about:

    var p = {

     foo: function() { ... },

     bar: function() { ... }

    };

    var make = function() {

     return Object.create(p);

    };

    Now we allocate one object (the "prototype") with all the method definitions, and subsequent calls to make() will return an object that inherits the methods of the prototype.

  • @Gabe: Right, I was assuming that a VTable class specific to Tiger, containing a Bite method would be generated, though I knew that was impossible because VTable is sealed.  Perhaps there's a reason for that and Eric will explain it in a future post in the series.

  • I guess that for interfaces each implementing class fills out an appropriate vtable. Right?

  • "C++ requires you to think about how to deal with vtables in a world with multiple inheritance."

    Speaking of multiple inheritance (or, multiple _something_ anyway), is this series going to cover how interfaces work?

Page 1 of 2 (18 items) 12