Paddling Upstream

  • My MSIL Wishlist

    My MSIL Wishlist

    The Microsoft Intermediate Language (MSIL), or as it is officially called now the Common Intermediate Language (the CIL, not to be confused with CLI which stands for Common Language Infrastructure) is the byte-code specification for the .NET framework. The official specification is the Standard-ECMA 335 (Common Language Infrastructure). MSIL is essentailly a very simple stack-based assembly language which is compiled to assembly language by the .NET framework as needed. This process is known as JIT (for Just In Time) compilation.

    MSIL is quite easy and fun to generate at run-time from C# (or any .NET enabled language for that matter) using the Reflection API. There are also numerous articles on generating MSIL at CodeProject.com which I recommend reading if you want to learn more.

    My interest in the MSIL is due to the fact that I am developing an optimizing compiler for my programming language, Cat. Cat is a statically typed functional stack-based language that resembles a mix between Forth and ML.

    Multiple Return Values

    This is a feature possible in several .NET languages, such as Forth.NET and IronPython. In a stack-based language like Cat you can use a function which returns multiple values as follows:
      define f { 1 2 }
      define g { f + }

    The initial expectation is that the MSIL code would look like:

      f:
        ldc_i4_1
        ldc_i4_2
        ret
      g:
        call f
        add_i4

    However this is not possible in MSIL since the ret opcode will only return one value from the stack. Instead, the best solution I could come up with was to wrap both values, and then unwrap them afterwards. Without going into the gruesome details, you need to generate the following psuedo-code:

      class Tmp {
        public Tmp(int32 arg0, int32 arg1)
        {
          field0 = arg0;
          field1 = arg1;
        }
        public int32 field0;
        public int32 field1;
      }
    
      f:
        ldc_i4_1
        ldc_i4_2
        newobj tmp
        ret
      g:
        call f
        Tmp tmp
        stloc tmp
        ldloc tmp
        ldfld field0
        ldloc tmp
        ldfld field1
        add
        

    This demonstrates another problem: ldfld pops the object from the evaluation stack, so I have to declare a temporary variable, and copy from it.

    The Ldfld Instruction

    The ldfld instruction pops an object from the evaluation stack, and pushes a named field. The first minor quibble I had with ldfld was the fact that I had to name all of my fields, rather than leave them unnamed. Records or classes with unnamed fields, or tuples as they are usually called, are commonplace in many languages.

    The bigger problem I had was that ldfld doesn't leave the object on the top of the evaluation stack. If the object is temporary as in my previous example, I don't want to keep it around. There are likely good reasons for this design decision in the MSIL, it probably makes sense the way it is for most languages most of the time. However an alternative instruction which does what I describe would help me generated byte-code.

    No Swap Instruction

    The MSIL has no swap instruction. It may be that such an instruction would have significantly complicated the byte-code verifier and hurt the JIT compilation performance. I am only guessing though, I don't really know enough about the internals of the .NET framework to speculate. I am not the first person to bemoan this fact however, Valer Bocan who implemented Forth .NET also posted a thoughtful article at CodeProject.com about this and other possible enhancements to the MSIL. In all fairness, stack-based languages are not the MSIL designers primary concern. Well at least not until Cat takes over the world ;-)

    Not only is swap an important and fundamental operation in many stack based and assembly languages, I could have used it to reduce the need for declaring a temporary variable. One solution would have looked like this:

      ...
      g:
        call f
        dup
        ldfld field0
        swap
        ldfld field1
        add

    Notice the fewer instructions and lack of a temporary variable declaration. This is only "a good thing" from my point of view.

    In general though it is a real joy for me to have access to such an easy to use byte-code generation framework for free. The things you can do with MSIL and the reflection framework are limitless.

  • The Cat Programming Language

    When I am not at work, writing multimedia SDK samples and documentation, I am at home working on my side project: the Cat programming language. I have just recently released a nearly stable version (0.9.9) of the interpreter written in C# at http://www.cat-language.com/download.html. I am hoping that people will find it useful or interesting, and share their feedback with me.

    Cat is a bit of an odd-duck in that it is both a functional programming language (like Scheme, Haskell, and ML) and it is also a stack-based language (like Forth, and Postscript). This combination isn't unique (other examples include Joy and Factor), however it also has an optional type system. The current interpreter doesn't do type-checking but there is a formal description available at http://www.cat-language.com/semantics.html. I plan on implementing the type checker in version 2.0.

    I've released Cat into the public domain, so you can use it for any purpose commercial or otherwise without restriction (and of course warrantee). I should make clear too that this has nothing to do with my employer, Microsoft, and is simply something that I do on my own.

    Please feel free to share with me your thoughts, suggestions or questions! 

  • Formally Understanding Concurrency

    A colleague of mine is interested in the formal basis of concurrent programming, so I decided to gather several resources and post them here. I should start by saying much that much of the research in concurrent programming is based on the theory of pi-calculus.

    Hopefully this provides a few useful and interesting entry points into the study of concurrency in computer science.

  • Taming Variable Sized Structures in C++

    I am currently working on the documentation of the Windows Media Format SDK, and I was confronted with some API's which accept variable sized structs in the DRM Client Extended API. Writing clean memory allocation code for variable structs can be a painful endeavour and I am a lazy lazy man. For my own purposes I developed the a data structure which simplifies usage of data structures with variable sized fields when the size is a constant known at compile-time.

    The following code demonstrates how you can write a simple template wrapper around an arbitrary variable sized struct (where the last field of the struct is variable sized).

    // Pre-declaration
    typedef unsigned int DWORD;
    typedef unsigned char BYTE;
    
    // This is an actual struct from the WMF SDK
    struct WMDRM_IMPORT_CONTENT_KEY
    {
        DWORD dwVersion;
        DWORD cbStructSize;
        DWORD dwIVKeyType;
        DWORD cbIVKey;
        DWORD dwContentKeyType;
        DWORD cbContentKey;
        BYTE rgbKeyData[ 1 ];
    };
    
    // This is a somewhat generic template wrapper around a variable sized struct.
    template<typename Struct, int FieldSize_N> 
    struct VarStruct 
    {   
        Struct* operator->() { return &(data_union.as_struct); }    
        static const int field_size = FieldSize_N;
        static const int struct_size = sizeof(Struct) + FieldSize_N - 1;
    
    private:
        
        union
        {
            Struct as_struct;
            BYTE as_array[struct_size];
        } 
        data_union;
    };
    
    // Here is some mundane usage. 
    int main()
    {
        VarStruct<WMDRM_IMPORT_CONTENT_KEY, 42> vs;
        for (int i=0; i < 42; ++i)
        {
            vs->rgbKeyData[i] = i;
        }
    }
    

    I hope this code is self-explanatory, and can come in useful. Feel free to fire any questions my way.

    The contents of this post are licensed under the Creative Commons Attribution-NonCommercial-NoDerivs license. The opinions in this post are my own and are not neccessarily the views of my employer, Microsoft.
  • HD DVD Documentation Online

    In my role as a documentation writer at Microsoft I recently completed and posted a new release of documentation for developing interactive HD DVD content, also known as iHD. You can find the documentation at: http://msdn.microsoft.com/library/en-us/HDDVDATK/htm/hddvdprogrammingguide.asp.

    An interesting and informal resource for authoring iHD content can be found on MSDN at Peter Torr's blog.

     

     

     

  • Parsing Windows Media Services 9 Log Files

    This posting is provided "AS IS" with no warranties, and confers no rights.

    Parsing Windows Media Services 9 Log Files

    Windows Media Services (WMS) logging is a very broad and complex subject, but it is covered in depth by this white paper about the Logging Model for Windows Media Service 9 Series from Microsoft. What can be tricky about WMS logs is that they contain multiple kinds of log entries, some of which represent cache hits, cache misses, server data, etc. This means that naively counting hits will give false results, so I recommend strongly reading the white paper before embarking on any kind of expedition for parsing WMS logs. 

    I am not an expert in parsing log files, or in Windows Media Server logs, but Microsoft does provide a useful free power-tool called LogParser, which can be useful for slicing and dicing your Media Server logs. LogParser is the swiss army knife of logging tools, and has entire sites dedicated to it by its fans ( http://www.logparser.com - the unofficial logparser site ).

    I have provided the following links to help you gather information on WMS logging, and Log Parser so that you can be better equipped to parse your log files.

    I hope you find these links useful!


© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker