July, 2008

  • Kirill Osenkov

    Software Development Meme

    • 2 Comments

    John tagged me in a software development meme - a sort of a game where you tell stuff about yourself and then pass on to other people. My moral values tell me it's not a chain letter so I guess it's OK to post :) Besides, it's fun and I'd be curious to learn more about others in the blogosphere.

    How old were you when you started programming?

    16.

    How did you get started in programming?

    My parents got me a book about BASIC and I got hooked up immediately. I only got a computer years later, so I was writing my first programs on paper (poor kid) and imagining how I'd be typing them in on a new shiny gray keyboard with loud clicking sounds... (people start having tears here). I stayed late in school computer class after classes and wrote code. What surprises me is that I guess today I'm no less fascinated by programming as back then...

    I wonder if I still can get my first program right:

    10 PRINT "What is your name?"
    20 INPUT A$
    30 PRINT "Hello " & A$

    I wonder if it "compiles".

    What was your first language?

    GWBASIC. It had an awesome IDE, boasting install size of about 40K, xcopy deployment, 100% keyboard support, lightning-fast responsiveness and an ultra-modern REPL loop similar to F# Interactive. Moreover it stored the entire program in memory - not every IDE can afford that nowadays.

    What was the first real program you wrote?

    There is a belief that every young russian-speaking programmer enthusiast HAS TO implement his/her own clone of Norton Commander and his/her clone of Tetris. This is if you're wondering where all those Volkov Commanders, Dos Navigators and FARs come from.

    Of course, I was no exception to that rule:

    DiskGuide

    Introducing Disk Guide - a file manager for Windows 95, written during 1998-1999 using VB5! If you're adventurous enough, you can give it a spin: http://www.osenkov.com/projects/diskguide/DiskGuide.rar - believe it or not, I'm still using it pretty regularly, after almost 10 years. It has a couple of gotchas (e.g. it chooses to bypass Recycle Bin when deleting files, so be careful), but I haven't seen any major bugs in it during the last several years. There are a lot of "features" though, so be warned (it's not a bug, it's a feature!).

    And here's the obligatory Tetris (VB6):

    Tetris

    Which reminds me. We had a hallway conversation with a colleague recently who is a developer on VB IDE team - he had asserted to be able to implement Tetris from scratch in 15 minutes. So we took a recent build of Visual Studio, a stopwatch - and of course it took him longer than 15 minutes. It took him whole 35 minutes.

    What languages have you used since you started programming?

    GWBASIC, LOGO, QuickBasic, TurboBasic (yeah!!!), VBDOS 4.0, VB5, VB6, TurboPascal, Borland Pascal, C++ (just enough to form a strong opinion about this language), VB.NET 7.0, Haskell (my world goes upside down), Prolog (I thought my world went upside down with Haskell??!), VHDL, ... drumroll... C# 1.0 (here it goes baby!!!), C# 1.1, C# 2.0, VB 8, C# 3.0 and lately some C# 4.0 and VB 10.

    What was your first professional programming gig?

    Ehmm... what is a gig? *goes-to-look-up* Ohh, I see.

    I wrote a dynamic geometry educational software package in VB6. Some pretty serious stuff: http://dg.osenkov.com

    If you knew then what you know now, would you have started programming?

    Hell yeah.

    If there is one thing you learned along the way that you would tell new developers, what would it be?

    People, please strive to write clean code - learn, learn, learn - always seek ways to improve, rewrite and refactor your code - be perfectionist about the code. It's the journey that's important, not the destination.

    What’s the most fun you’ve ever had... programming?

    Definitely implementing the dynamic geometry project - first feeling of control over your code, first results, first rewarding successes and implemented features - that was fun. By the way I shipped it, I can't believe it.

    Over to you

    I herewith tag Mr. Campbell, Mr. McNamara, Mr. Carpenter, Mr. Maino and Mr. Smith.

  • Kirill Osenkov

    A curious subtlety about how CLR does interface dispatch on array types

    • 15 Comments

    This post is about some internal CLR magic that usually makes accessing .NET arrays through an interface up to 100 times faster (if you're an optimist; or accessing arrays up to 100 times slower in the worst case, if you're a pessimist). I'm usually very curious about how this universe works under the hood - and finding out about the internals of the CLR for me is like discovering laws of physics :) So if you ever find yourself measuring the timing of array access through generic interfaces in performance critical code, this post is probably worth taking a look at.

    Anyway, here's the story. A customer (Robert Kieboom from CityGIS) has reported an intermittent performance decrease when accessing arrays through the IList<T> interface. Specifically, for some weird reason, same codepath started executing 100 times slower after a call to some seemingly unrelated code. Huge thanks to Robert for providing this precise and exact repro (I hope he doesn't mind if I share it):

    class Program
    {
        public static int AddValues(IList<int> list)
        {
            int add = 0;
            /*foreach ( int value in list )
              add += value;*/
            for (int i = 0; i < list.Count; i++)
                add += list[i];
            return add;
        }
    
        static void TestAddArray(int[] items)
        {
            Stopwatch timer = Stopwatch.StartNew();
            AddValues(items);
            timer.Stop();
            Console.WriteLine("Array: {0} ms.", timer.ElapsedMilliseconds);
        }
    
        static void TestAddIList(int[] items)
        {
            IList<int> cl = new List<int>(items);
    
            Stopwatch timer = Stopwatch.StartNew();
            AddValues(cl);
            timer.Stop();
    
            Console.WriteLine("IList: {0} ms.", timer.ElapsedMilliseconds);
        }
    
        static void Main(string[] args)
        {
            int[] items = new int[1000000];
            for (int i = 0; i < items.Length; i++)
                items[i] = i - (items.Length / 2);
    
            TestAddArray(items); // fast
            TestAddArray(items); // fast
    
            TestAddIList(items); // something weird happens??!
    
            TestAddArray(items); // slow
            TestAddArray(items); // slow
        }
    }

    Here's what this program does. First we call TestAddArray a couple of times that uses an array disguised as an IList<int> - both calls are very fast, each about 108ms on my machine.

    Then we call TestAddIList that uses a List<int> disguised as an IList<int> - this call is even faster, 21 ms.

    Now - watch this - both subsequent calls to TestAddArray take 3638ms and 3677ms respectively - about 33 times slower!

    If you remove the call to TestAddIList(items), all four calls will be equally fast. What's going on? The same codepath becomes slower after some random call? We were puzzled. After Eric couldn't find any explanation for this from the C# compiler standpoint (we were producing correct IL), it was clear that the issue is deeper than IL and has to do with the execution engine itself. So I routed the bug to the CLR team - and pretty soon we had a reply from them.

    It turns out, CLR has some optimizations on how they do virtual dispatch on a generic interface, if the runtime type is an array. As I'm afraid to talk nonsense outside of my area of expertise, I'll just quote the bug description from the CLR folks:

    The problem is rooted in the fact that IList<T>::get_Item is an interface invocation on an array type (in the example an int[]).  Because arrays are very common, and interface dispatch through interfaces like IList<T> is uncommon, the runtime saves significant space by having special logic for dispatching through generic interfaces like these. 

    Our interface dispatch logic has an important optimization where FOR EACH CALL SITE, it checks one particular target explicitly, and if that fails, does slower hash table lookup, and if that lookup fails, fall back to an  expensive lookup.  Thus for calls sites that tend to go to one destination it is very fast, and call sites that go to many targets, you get good, but not as great performance.

    As it turns out, there is a problem in the special logic for dispatch logic for arrays (but not for ordinary interfaces) that cause the hash table lookup to fail.    What this means is either interface dispatch FOR ARRAY TYPES  is very fast (if the explict check works), or very slow (when it does not). 

    In the fast case, the interface calls to 'get_Item and get_count' (used in the [] operator) call with at 'this' pointer of int[] first, which means this is the value that is used for the 'fast check'.   The second time we call it with List<int>, not int[]. However since there is no bug associated with normal (not array) dispatch, everything works well.

    However when the benchmarks are reversed, List<int> is the first 'this' that dispatches through 'get_Item' and 'get_Count' call sites so they get the benefit of the fast check. When the first benchmark is run, it now dispatches using an int[] as the 'this' pointer, and because of the bug, the hash table lookup always fails and so only the very slow lookup mechanism is used.

    Workarounds:

    1) In performance critical code, ensure that if arrays are passes as IList<T> (or super Interfaces), that T[] is always called first.  There is a good chance that this approach is impractical, but if there are only a few such critical paths then it is possibly useful. 

    2) Always use List<T> where you would have used T[]. By the way, if you can avoid using Collection<T> that is probably a good thing, as it is a wrapper that in most cases does not add much value.

    3) In any performance critical paths, special case the array case with code like

                 method(IList<T> aIList)

                 {

                        T[] asArray = aList as T[]

                        if (asArray != null)

                        {

                                  // logic on arrays

                        }

                        else

                        {

                                  // logic on aList

                        }

    Option 3 has the advantage that it is SIGNIFICANTLY faster than interface dispatch will ever be because the array check is done only once (and then lots of operations are done), rather than effectively being done on every interface invocation. I would not recommend any work-around that does not have some positive effect on your code base.  I could believe that options 2 and 3 MAY be a superior solution in the long term anyway, so I would suggest investigating them.

    This explains why calling arrays through generic interfaces might under certain conditions become considerably slower (or, alternatively, why these calls are in many cases considerably faster than the general case). In any case, virtual dispatch is slower than calling array members directly, so if you have a chance, just work with arrays directly or use generic collections like List<T>.

    Regarding the future fate of this particular CLR optimization failing in some cases, we are considering whether we shall fix it for the next release of the .NET framework. However, as always: no promises here folks! If you do hit this problem (which is pretty unlikely), just use the workarounds for now - thankfully they are easy and reliable.

    I hope this was insightful and many folks have found this interesting - I'd like to give credit to everyone who reported, reproduced and investigated this issue (including, among others, Robert Kieboom, Arie Leeuwesteijn, Erik Meijer, Eric Lippert, Vance Morrison and Jan Kotas) - and I hope they don't mind me sharing this with the rest of the .NET world.

Page 1 of 1 (2 items)