Wow, two programming/C# enties in a single day...

Before I joined the C# team, I was the test lead for the C++ compiler for a number of years. We would periodically get customer comments that "the compiler was broken", and upon further investigation, we would usually find that it was a bug in the program. There was usually a good correlation between the amount of experience of the programmer - those with more experience normally suspected their code first, and only after careful research would consider the compiler (and they were usually right at that point).

One of the nice things about the C# compiler not having pointers is that it's much harder to accomplish bad things ("Try to imagine all life as you know it stopping instantaneously, and every molecule in your body exploding at the speed of light". Shame on you if you don't recognize the quote). If you're playing the interop game, you're back in the pointer-world of sharp sticks, and you can easily create the otherwise elusive "Execution Engine Error".

Last week, I upgraded to build 30730 of VS and the runtime. (This means "third year, seventh month, and 30th day", and is also known in Microsoft parlance as the "Julian Date", even though is isn't a julian date. This replaced our previous scheme (also not a julian date) that we used on VS 2002 and 2003, which replaced the scheme we used in VS6 (also not a julian date). As far back as I remember, our numbers had always been called julian dates but never were. An ideal dating system is monotonically increasing by 1 (so you can tell how far apart builds are) and easy to convert to human-readable dates (so you know when the build was created), but that's not really possible, so at least we've finally settled on something where you know when the build was, and it works for more than a couple of years (previous versions broke badly when confronted with the long dev cycle of VS 2002). It's a testament to the understandability of the previous schemes that I don't remember what they are, but I do know that many people ran little JDate applications on the desktops so they knew what jdate to use for today. But I digress)

I got the new build on, and nothing broke (a nice thing occurance), rebuilt, and ran my app. It worked fine in most areas, but when I tried to use one function, I got an null reference exception. Of course, I initially thought my code was bad, but a little debugging narrowed the problem down to an innocuous-looking function:

		private void CheckType<T>
        (DBObject node, List<int>
            list) { if (node is T) { if (node.Checked) { list.Add(node.ID); } } } 
        
    

In my app, I have a treeview with different node types in it, and I need to get the list of all check nodes of that type into a list so I can persist it. This function is called for each node and each type of node, and it fills in the items.

All the parameters were correct on being passed in, but when they get into the function, list is nowhere to be found, and calling list.Add() causes problems. Since this code worked before and the debugger couldn't find list, I started to suspect a code generation problem. Further investigation showed that even if list.Add() was never called, the program would blow up at some future point.

I just finished a session with one of the CLR guys to try to find the root cause and get a small repro case (small repro cases are the holy grail of tracking code generation issues). He knew that there had been some changes in JITting generic methods when one of the parameters was a MarshalByRef type, and we were able to create a small project that throws an ExecutionError at will. That will allows us to find the problem and get it fixed.

The moral of the story - and I'm sure if you've read this far you're expecting a moral - is that while it's usually your code that has the problem, sometimes it's the underlying system that has issues, so don't be too trusting...