First, hats off to the Mono team.  I think they've done a great job at writing some great software, but also at proving that evil MS really does know how to produce a truly common language runtime that can be properly standardized and ported to other platforms and languages.

Now for the meat of my blog.  Working on the AMD64 JIT for the runtime is really fun, but it has a few downsides.  You we are the 'second' JIT that MS has written.  As such the knee-jerk reaction of testers is not to consult the spec and see if what we're doing is different, but acceptable.  Instead they blindly compare it to the first JIT (the x86 JIT) and if it's different and it breaks their test, it must be wrong.  As we all know, 2 different implementations of the same spec are going to behave differently.

The biggest difference I see is that the AMD64 JIT has borrowed a lot of code and optimizations from the native AMD64 compiler.  Because of this, we perform some optimizations that are 100% legal according to the CLR spec, but that are certainly unexpected by those who have been using the x86 JIT for a while.  Almost all of these examples stem from end users not writing proper thread safe code.

Here's a real example from System.Collections.Hashtable:

private bucket[] buckets;

public virtual bool ContainsKey(Object key) {
    if (key == null) {
        throw new ArgumentNullException("key", Environment.GetResourceString("ArgumentNull_Key"));
    }

    uint seed;
    uint incr;
    // Take a snapshot of buckets, in case another thread resizes table
    bucket[] lbuckets = buckets;
    uint hashcode = InitHash(key, lbuckets.Length, out seed, out incr);
    int  ntry = 0;
            
    bucket b;
    int bucketNumber = (int) (seed % (uint)lbuckets.Length);            
    do {
        b = lbuckets[bucketNumber];
        if (b.key == null) {
            return false;
        }
        if (((b.hash_coll &  0x7FFFFFFF) == hashcode) && 
            KeyEquals (b.key, key))
            return true;
        bucketNumber = (int) (((long)bucketNumber + incr)% (uint)lbuckets.Length);                              
    } while (b.hash_coll < 0 && ++ntry < lbuckets.Length);
    return false;
}

I added the emphasis.  Notice the comment that indicates the code is assuming that this.buckets will change on another thread. With the x86 JIT, simply copying the member into a local was enough.  However, the AMD64 JIT applied some simple copy-propagation to this code and realized that since the this pointer was already enregistered, it was easy to access this.buckets as a simple indirection. So the AMD64 JIT completely ignores the local copy. At this point some of you might be thinking that was an illegal optimization.  It's not.  Go read the specs and study your compiler theory.

There is exactly one thing needed to correct this method.  Its the volatile keyword.  By simply marking the buckets member as being volatile, then the JIT cannot add or remove reads or writes to it, thus it cannot copy-prop.

As a side note, I believe this code really is fine even if the buckets array does get bigger on another thread.  It would only have a problem it buckets got smaller after bucketNumber was calculated, but before it was used to index into the array.  Don't quote me on this, but I vaguely remember that Hashtable never does resize smaller, so this really is a total non-issue, but still a good example.

--Grant