My MSIL Wishlist

The Microsoft Intermediate Language (MSIL), or as it is officially called now the Common Intermediate Language (the CIL, not to be confused with CLI which stands for Common Language Infrastructure) is the byte-code specification for the .NET framework. The official specification is the Standard-ECMA 335 (Common Language Infrastructure). MSIL is essentailly a very simple stack-based assembly language which is compiled to assembly language by the .NET framework as needed. This process is known as JIT (for Just In Time) compilation.

MSIL is quite easy and fun to generate at run-time from C# (or any .NET enabled language for that matter) using the Reflection API. There are also numerous articles on generating MSIL at CodeProject.com which I recommend reading if you want to learn more.

My interest in the MSIL is due to the fact that I am developing an optimizing compiler for my programming language, Cat. Cat is a statically typed functional stack-based language that resembles a mix between Forth and ML.

Multiple Return Values

This is a feature possible in several .NET languages, such as Forth.NET and IronPython. In a stack-based language like Cat you can use a function which returns multiple values as follows:
  define f { 1 2 }
  define g { f + }

The initial expectation is that the MSIL code would look like:

  f:
    ldc_i4_1
    ldc_i4_2
    ret
  g:
    call f
    add_i4

However this is not possible in MSIL since the ret opcode will only return one value from the stack. Instead, the best solution I could come up with was to wrap both values, and then unwrap them afterwards. Without going into the gruesome details, you need to generate the following psuedo-code:

  class Tmp {
    public Tmp(int32 arg0, int32 arg1)
    {
      field0 = arg0;
      field1 = arg1;
    }
    public int32 field0;
    public int32 field1;
  }

  f:
    ldc_i4_1
    ldc_i4_2
    newobj tmp
    ret
  g:
    call f
    Tmp tmp
    stloc tmp
    ldloc tmp
    ldfld field0
    ldloc tmp
    ldfld field1
    add
    

This demonstrates another problem: ldfld pops the object from the evaluation stack, so I have to declare a temporary variable, and copy from it.

The Ldfld Instruction

The ldfld instruction pops an object from the evaluation stack, and pushes a named field. The first minor quibble I had with ldfld was the fact that I had to name all of my fields, rather than leave them unnamed. Records or classes with unnamed fields, or tuples as they are usually called, are commonplace in many languages.

The bigger problem I had was that ldfld doesn't leave the object on the top of the evaluation stack. If the object is temporary as in my previous example, I don't want to keep it around. There are likely good reasons for this design decision in the MSIL, it probably makes sense the way it is for most languages most of the time. However an alternative instruction which does what I describe would help me generated byte-code.

No Swap Instruction

The MSIL has no swap instruction. It may be that such an instruction would have significantly complicated the byte-code verifier and hurt the JIT compilation performance. I am only guessing though, I don't really know enough about the internals of the .NET framework to speculate. I am not the first person to bemoan this fact however, Valer Bocan who implemented Forth .NET also posted a thoughtful article at CodeProject.com about this and other possible enhancements to the MSIL. In all fairness, stack-based languages are not the MSIL designers primary concern. Well at least not until Cat takes over the world ;-)

Not only is swap an important and fundamental operation in many stack based and assembly languages, I could have used it to reduce the need for declaring a temporary variable. One solution would have looked like this:

  ...
  g:
    call f
    dup
    ldfld field0
    swap
    ldfld field1
    add

Notice the fewer instructions and lack of a temporary variable declaration. This is only "a good thing" from my point of view.

In general though it is a real joy for me to have access to such an easy to use byte-code generation framework for free. The things you can do with MSIL and the reflection framework are limitless.