• Kirill Osenkov

    New Years resolutions v2.0.0.9


    Well, I've been tagged in a chain-letter-blogging game again. This time, Chris "I-like-to-put-ugly-monsters-on-the-frontcovers-of-my-books-to-at-least-partially-distract-readers-from-great-content" Smith tagged me in his New Years resolutions post. It's February, but I think it's better late than never. So here it goes:

    Make sure VS 2010 rocks!

    I'll try to do my part and make sure that our language service works as expected and the features are pleasant to work with. I'll also try to make sure other teams don't miss obvious bugs (yes, Editor, Shell and Project System, I'm looking at you!) Given the fact that I keep finding crashes in the XAML designer, I'll keep an eye on them as well. Oh and the debugger, of course.

    Learn MEF

    I plan to read MEF sources and actually play with it. It's 21 century out there, nowadays you *need* a dependency injection/component framework.

    Learn DLR

    Especially DLR hosting. Would be fun to build an expression evaluator/function graph plotter into my Live Geometry Silverlight app.

    Read more of the product code

    I definitely should read more of the VS source, especially since more and more of it gets rewritten in managed code. We're building a managed compiler and rewriting the language service in managed code, it would be great to follow the API design there. It's super important to get the API surface right. I'll maybe start playing with it early on and build some sample apps/add-ins to see how the API feels.

    Read more blogs

    and catch up on all those starred and flagged items. I plan to read all of Cyrus, all of Wes and all of Eric, for a start. I need to catch up on Jon's early posts as well. Also, I still hope Wes starts blogging again. Same for Dustin (although I do understand how busy Dustin is in his new role...)


    Continue ignoring it. I'll at least be honest to myself. I will still exercise regularly. Once in three months is regular, isn't it?


    Is there anything else that I've missed? :-P

  • Kirill Osenkov



    In my recent post about coding styles one particular thing provoked the majority of feedback and discussions: the ForEach extension method on IEnumerable<T>. Justin Etheredge has a good post about this method here. StackOverflow.com also has a good question: Why is there not a ForEach extension method on the IEnumerable interface?

    Note: If you’d like this method to be added to .NET 4.0, go vote here: https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=279093

    Recently I also went ahead and logged a suggestion against the BCL team to add this method in .NET 4.0, and got some feedback from Melitta Andersen and Justin van Patten. Essentially, the BCL team’s considerations boil down to what we’ve discussed in my recent post:

    • it encourages state mutation
    • it’s hard to set a breakpoint inside the lambda/delegate body (and other debugging issues)
    • it’s not clear where to place this method (it’s not exactly part of LINQ, it’s just a helper on IEnumerable<T>), also it doesn’t allow to chain calls

    Mads Torgersen (representing the language design team) was also reluctant about adding this method because of similar concerns (functional impurity etc). I myself in my previous post was enumerating various downsides of using this method.

    And still I think we should add it.

    My thinking is the following. The ultimate purpose of the BCL is to help avoid code duplication by introducing a common reusable set of functionality so that people don’t have to reinvent the wheel by writing their own collections, sorters, etc. A lot of people will use ForEach anyway, and if we don’t provide it in the framework, they will have to re-implement it in every new project. Also, by providing the ForEach method out of the box, we’re not forcing anyone to actually go ahead and use it – people will still have the choice and be warned about the downsides of ForEach. It’s just when they will use it anyway (and this happens a lot), they will be able to consume a ready-made one. The use of having it (in my opinion) by far overweights the downsides of using it inappropriately.

    ForEach looks really good with very simple snippets, such as:


    Some developers forget that you can use this shorter syntax instead of:

    myStrings.ForEach(s => Console.WriteLine(s));

    Another ForEach advantage is that it allows you to extract the body of the loop into a separate place and reuse it by just calling into it.

    Also, given the fact that List<T> already has it, it seems unfair that IEnumerable<T> doesn’t. This is an unnecessary limitation (that’s what I think).

    Chris Tavares says:

    I suspect the reason that this didn't exist before is that you can't use it from VB. VB only support lambda expressions, while the foreach method requires a lambda *statement*.

    Well the good news is that we are introducing statement lamdbas in VB 10.0 so this shouldn’t be an issue at all.

    I’m also pretty sure that one can also overcome the debugging difficulties of ForEach with tooling support, such as [DebuggerStepThrough] and “Step Into Specific”. Debugging problems are not language/libraries problem per se, it’s a tooling problem, and tooling should always be fixed/improved to satisfy languages and libraries.

    A lot of people are asking for the ForEach extension method – this is probably one most wanted piece of API:

    My questions for you folks are:

    1. What do you think? Should this method be added to BCL?
    2. If yes, where? System.Linq.Enumerable? System.Collections.Generic.Extensions? Anywhere else?

    There are also variations on this extension method:

    • returning IEnumerable<T> to allow the ForEach calls to chain
    • accepting an Action<T, int> where the second parameter is the index of an item
    • Kevin’s Apply method

    If you ask me, only the simplest overload should be added, because Select is a better choice for chaining calls. Also this would encourage people to use the method in the simplest scenarios, where the downsides of doing so are negligible.

    Finally, one argument I have is that googling “foreach extension method” yields 1 590 000 results, which I think is a pretty good indication that the feature has high demand.

  • Kirill Osenkov

    How to disable optimizations during debugging


    Sooner or later you may run into a situation where you need to evaluate a local variable under debugger and all you get is this:

    "Cannot obtain value of local or argument 'whatever' as it is not available at this instruction pointer, possibly because it has been optimized away'.

    Well, it turns out there are two different tricks to solve such problems:


    Shawn Burke blogs about How to disable optimizations when debugging Reference Source. In a nutshell, you need to:

    1. Start VS with the Environment Variable COMPLUS_ZapDisable=1
    2. Disable the VS Hosting Process (.vshost.exe) before you start debugging


    Another tip is from our VB IDE Dev Jared Parsons: Disabling JIT optimizations while debugging. Essentially, Jared points to create an .ini file with the same name as the application's .exe:

    [.NET Framework Debugging Control]

    He also points to the MSDN article http://msdn.microsoft.com/en-us/library/9dd8z24x.aspx (Making an Image Easier to Debug).

    To be frank, this tip didn't work for me for some reason, but I guess it's still worth mentioning.

    Hope this helps!

  • Kirill Osenkov

    Call Hierarchy Navigation in Visual Studio 2010


    We're currently designing a new IDE feature named Call Hierarchy. Essentially, it allows you to find places where a given method is called, which is similar to how Find All References currently works. However, unlike Find All References, the Call Hierarchy feature provides more deep understanding and more detailed information about calls.


    You can invoke the Call Hierarchy toolwindow by right-clicking on a method, property or constructor name in the code editor and choosing View Call Hierarchy from the context menu:


    Tool window

    A toolwindow will appear docked on the bottom of the Visual Studio window:


    You can expand the node for the method to see information about it: incoming calls to the method ("Calls To") and outgoing calls ("Calls From"):


    Here's how it works. A method (or a property, or a constructor) is displayed as a root in the treeview. You can expand the node to get a list of "search categories" - things you want to find. Four search categories are currently supported:

    1. Calls To - "incoming" calls to this member
    2. Calls From - "outgoing" calls mentioned in this member's body
    3. Overrides - available only for abstract or virtual members
    4. Implements - finds places where an interface member is implemented

    When you expand a search node (such as Calls To 'GetCallableMethods'), a solution-wide search is started in the background and the results appear under the Calls To folder. You can click on a result, and the details will appear in the Details list view on the right hand side.

    The Details list view shows all the exact call sites and locations in code where GetCallableMethods is called from GenerateXsdForComplexTypes. We see that the method is being called only once, the line of code is shown, as well as file name and position in the file. Double-clicking on that call site will navigate to it in the code editor.

    The advantages of Call Hierarchy compared to Find All References is that it allows you to explore and drill deep multiple levels into the call graph (find caller's caller etc.) Also, Call Hierarchy has deeper and more fine-granular understanding of the source code - while Find All References just finds the symbols, Call Hierarchy differentiates abstract and virtual methods, interface implementations, actual calls from delegate creation expressions, etc. Also it works like a scratch-pad: you can add any member as another root-level item in the call hierarchy tool window and have several members displayed there at once. Finally, the Details Pane given information about the concrete call sites, if a method is being called several times in the body of the calling method.


    In the toolbar you can select the scope of the search: search in currently opened file only, current project or the entire solution.

    Refresh button re-fills the treeview in case the original source code was modified.

    If a root node of the treeview is selected, the "Delete Root" button will remove it from the treeview. You can add any member as a new root in the treeview by right-clicking on it in the context menu:


    or adding it from the source code as described in the beginning.

    Finally, the Toggle Details Pane button shows or hides the details pane.

    Some design issues and implementation details

    Although the feature is already implemented and if you have the Visual Studio 2010 CTP, you can already play with Call Hierarchy, we're still not quite happy with the current UI design, usability and the user experience.

    For example, one issue that we're seeing is that it takes 2 mouseclicks and 2 mousemoves to invoke the Find All References search, but it takes 4 mouseclicks and 4 mousemoves to get the callers list for a given method (1 click - menu invocation, 1 click - menu item selection, 1 click - expand the treeview node for the method, 1 click - expand the "Calls To" folder). Although the search itself will be slightly faster than Find All References, the perceived complexity of invoking the feature is something we definitely want to improve. We want this feature to be at least as good and usable as Find All References, but also provide additional benefits, otherwise people will just not use the feature and continue using Find All References.

    I think I'll stop for now and see what kind of feedback you guys might have about this. In the next blog post I plan to share more of our current thinking and what we'd like to change. For now, I'd be really interested to know what you think and if you have any suggestions or ideas. Now it's not too late, and we can change the feature based on your feedback.

  • Kirill Osenkov

    Reflections on coding style


    I've been monitoring my coding style for quite a while and I noticed that I often prefer chaining method calls to normal control flow. Now I think it might not always be such a good idea.

    To give you an example, today I had a task of reflecting over constants in a class and stuffing them into a dictionary. Here's a sample to demonstrate my original approach. In this sample I replaced the actual domain model with a fake colors domain and the task is to implement two methods: FillColors and OutputColors.

    using System;
    using System.Collections.Generic;
    static class Program
        public const string Red = "#FF0000";
        public const string Green = "#00FF00";
        public const string Blue = "#0000FF";
        static void Main(string[] args)
            var dictionary = new Dictionary<string, string>();
        static void FillColors(Action<string, string> add)
            var fields = typeof(Program).GetFields();
            fields.ForEach(f => add(f.Name, f.GetRawConstantValue().ToString()));
        static void OutputColors(Dictionary<string, string> dictionary)
            dictionary.ForEach(p => Console.WriteLine(p.Key + " = " + p.Value));
        static void ForEach<T>(this IEnumerable<T> list, Action<T> action)
            foreach (var item in list)

    First, several interesting comments about the code.

    • constants are public because otherwise Type.GetFields won't find them. And for the life of me I couldn't figure out how the overload of GetFields works that accepts BindingFlags. I didn't want to spend more than 2 minutes on that so I just made the constants public.
    • there are no more access modifiers, because members are private by default and top-level types are internal by default if the access modifier is missing. Jon Skeet wrote in his C# in Depth book that he prefers omitting the modifier if it corresponds with the default visibility and I really like this style. Makes the programs shorter. Short programs are especially useful for us testers, where we need to find a minimal repro for bugs that we report.
    • FillColors accepts a delegate "add" which I like calling "the collector". It could have accepted the entire dictionary, but I only use the add functionality so to minimize coupling I only require what I actually consume. An alternative here would be to implement an iterator using yield return, but this would require returning a list of pairs of values, and Tuples were not in our TFS branch yet at the moment of writing :) Another reason for "the collector" pattern is that yield return doesn't compose well - flattening lists like Comega does is still not in C# as of 4.0. Contrary to that, passing a delegate (Continuation-Passing Style) composes nicely and is fully supported starting from C# 1.0.
    • OutputColors doesn't accept IEnumerable<KeyValuePair<string, string>> because I was to lazy to spell that out and it is not relevant to the current discussion :)
    • Also OutputColors doesn't accept Action<string> and hardcodes Console.WriteLine because of the KISS/YAGNI principle.
    • I don't know why KISS/YAGNI didn't apply to the FillColors method. It just didn't.

    But I digress. Let's go back to the actual topic of me writing one long line:

    fields.ForEach(f => add(f.Name, f.GetRawConstantValue().ToString()));

    instead of:

    foreach (var f in fields)
        add(f.Name, f.GetRawConstantValue().ToString());

    Here are several observations that favor foreach against ForEach:

    ForEach is not in the library

    As of .NET 3.5 SP1, ForEach is not in the library so I had to implement it myself. For .NET 4.0 I've logged a suggestion with Mads to add it to the BCL, but I'm not sure if we're adding it for 4.0. A lot of people seem to add it to their libraries so I think it'll be useful to have regardless of the coding style it suggests. I'm sure it is useful at least sometimes.

    ForEach hides state mutation

    Eric Lippert told me that he's not a great fan of ForEach because it conceals state mutation - it takes an eager (non-lazy) control flow statement that mutates state and hides it behind a functionally-looking call.

    You can't put a breakpoint inside ForEach "body"

    If you use foreach, you can put a breakpoint on the add call, whereas you can't put a breakpoint on the add call in the first case. This is not a language problem, but rather a tooling issue, but still I don't see tools (yes, debugger, I'm looking at you!) becoming comfortable enough in this scenario anytime soon.

    Stack is polluted

    In the first case (ForEach), two additional frames are on the stack which looks kinda ugly during debugging:


    vs. the original:


    This may not have a good impact on the debugging story.

    Performance is worse

    Although I know that micro-managing performance is a bad thing to do, we still create an anonymous method (metadata bloat) and have two additional calls on the stack (I know that JIT will probably optimize them, but who knows). So the danger is that in critical codepaths (bottlenecks) the performance will degrade compared to using control flow (which the JIT will compile into a couple of nice fast JMP instructions).

    Readability suffers

    Readability is probably worse because ForEach is not as familiar as foreach. Also, the code lines tend to grow long with the first approach and I like my lines of code less than 80 characters.

    ForEach is NOT functional style

    I'd like to stress that I'm still a huge fan of functional style, immutability, composability, first-class functions, etc. The ForEach style described in this blog-post is NOT functional style, although it definitely looks like it.

    1. It is NOT immutable
    2. It is NOT idempotent
    3. It does not chain as Kevin mentions
    4. It is NOT lazy and doesn't have deferred execution like many LINQ operators which are truly functional

    We had a hallway conversation with Chris Smith from F# team about this coding style topic today, and even he (the functional programming guy and a huge F# fan) admitted that sometimes C# should better use C# native constructs instead of borrowing from other (albeit awesome) languages like F#.


    So I guess I'm going back to more using good old control flow statements again as they deserve it. They have been here for 30 years for a good reason and I don't think they're going away anytime soon.

  • Kirill Osenkov




    Goodbye productivity...

  • Kirill Osenkov

    New IDE features in VS 2010


    We have announced a list of new IDE features for managed languages in Visual Studio 2010:


  • Kirill Osenkov

    F# No Longer Vaporware


    Chris Smith from the F# team has recently posted some "interesting" news:


  • Kirill Osenkov

    Webcast on Visual Studio Tips &amp; Tricks


    Mike Benkovich from http://benkotips.com and my fellow C# QA team member Eric Maino have talked me into doing a webcast on Visual Studio Tips & Tricks. Mike was the host, and Eric and I were showing VS features that we like and use often in our day-to-day coding.

    Here's a list of tips that Eric and I were showing:


    And here's the webcast itself:


    Hope you like it!

  • Kirill Osenkov

    How to debug crashes and hangs


    At my job on the C# IDE QA team I've learned some useful things about debugging in Visual Studio, which I'd like to summarize in this post. Although the screenshots were made using Visual Studio 2008 SP1, this pretty much applies to other versions of VS as well.

    Rich debugging support

    When you develop your C# application and hit F5, the target process (your program) gets started, and then the Visual Studio process attaches the debugger to the process where your code is running. This way, you can break into the debugger and VS will provide you with all sorts of rich debugging support - current statement highlighting, call stack, watches, locals, immediate window, Edit-and-Continue and so on.

    More importantly, if your application throws an exception or crashes, the debugger will intercept that and provide your with all the information about the exception.

    As a side note, in Visual Studio there is a way to run your code without attaching the debugger - the shortcut is Ctrl+F5. Try throwing an exception in your code when using F5 and Ctrl+F5 to feel the difference.

    throw null;

    By the way, my favorite way to artificially throw exceptions is throw null; I just love the fact that it throws a NullReferenceException because it can't find the exception object and nevertheless does exactly what I want it to do :)

    Crashes and the Watson dialog

    What if a program crashes or throws an exception, which you don't have source code for? Moreover, you didn't start the program using F5, but the operating system launched the process. I remember that before coming to Microsoft, the only thing I could do about some application crashing was to express my disappointment about the fact (usually in Russian). Now I don't feel helpless anymore, because I've learned a couple of tricks. As an example for this we'll crash Visual Studio itself and then debug the crash.

    How to crash Visual Studio?

    Viacheslav Ivanov reported an interesting crashing bug in our language service recently. Save all your work and then paste this code in a C# Console Application and change 'object' to 'int':

    using System;
    static class Program
        static void Main()
            ITest<object> test;
            test.Test((object /* and now change the argument type to "int" */ i) => { });
    public interface ITest<T> { }
    public static class Extensions
        public static void Test<T, I>(this ITest<T> test, Action<ITest<I>> action) { }

    What you will see is the Watson dialog:


    Given this chance, I'd highly recommend everyone to click "Send Error Report" if you ever see this dialog for Visual Studio. Many people "Don't Send" and frankly I don't understand why not. There is no personal information being sent, and we don't want your personal information anyway, honest. What we want is a call-stack and minidump, if possible, so if you want us to fix the product to make it more stable in the future, you will greatly help us if you send us the Error Report. By the way, we usually fix most (if not all) crashes that come in through this dialog, so the chances that we'll fix the crash you report using the dialog are actually pretty high. For example, we've already fixed the bug mentioned above and it works just fine in current builds.

    Attaching a debugger

    So what can you do if an application crashes or hangs? You can attach the debugger to a running process, even if it has already crashed. The code is still being executed (the main thread of the crashed application is usually pumping messages for the error dialog). You can either choose "Debug" on the Watson dialog, or (what I usually do) is start a new instance of Visual Studio myself and attach to the process manually, without dismissing the Watson dialog.

    Note: if the debuggee process crashes and you attach the debugger after the fact, you'll have to manually break into the debugger by pushing the "Pause" button. If the debugger was already attached at the moment of the crash, then it will offer you to break or continue.

    You can attach the Visual Studio debugger to a running process by choosing Tools | Attach To Process (Ctrl+Alt+P):


    You will see the Attach to process dialog:

    image Interesting things to note here are the Attach to: selection. Since Visual Studio is a mixed-mode managed/native application, to get call stacks for both managed and native parts, you'd want to attach to both of them (by clicking Select...):


    If you select both Managed and Native, you'll get richer debugging information about your callstacks - this is recommended.

    Note: if you want to enable mixed-mode debugging (managed+native) for the application that you have source code for, go to project properties of the startup project, and on the Debug tag select "Enable unmanaged code debugging". Then the debugger will automatically attach using mixed-mode.

    Finally, select the process from the list which you'd like to attach to (in our example, it will be devenv.exe) and click Attach. Notice that the process of the debugger itself is not shown in the list, that's why you don't see two devenv.exe in the list.

    Remote Debugging

    What many people don't know is that VS can be used to debug a process running on another machine on the network. To do this, you just need to start the Visual Studio Remote Debugging Monitor on the same machine with the process you want to debug:


    The remote debugging monitor will listen to debugger connections on the other machine and you'll be able to attach the debugger using the Transport and Qualifier fields on the Attach to process dialog:


    You can find more detailed information about Remote Debugging in MSDN and on the internet.

    Set Enable Just My Code to false

    One very important option in Visual Studio is "Enable Just My Code", which is set to true by default. To be able to see more information on the callstacks instead of just "Non-user code", you need to go to Tools | Options and disable this option:


    I usually do this right after I install Visual Studio, so that I can always debug into "not my code".

    Other interesting options on this page are:

    • Enable .NET Framework source stepping - in case you'd like to step into the .NET framework source code
    • Enable Source Server support
    • Step over properties and operators (Managed only) - won't step in to property getters, operators, etc. - this is new in VS 2008 SP1
    • Require source files to exactly match the original version - in case you don't have the source files for the exact .pdb symbol file, but still have a close version of the source code

    Break on first chance exceptions

    One very useful option in the debugger is the ability to break whenever a first-chance exception is being thrown. A first-chance exception is an exception that might be caught by the program itself later in some surrounding catch block. First-chance exceptions are usually non-fatal and handled (or swallowed) by the user, so they might not even be visible to the end user during normal execution. However, if a first-chance exception is not handled in the code and bubbles up to the CLR/OS, then it becomes a crash.

    So, to break on first-chance exceptions, you can go to Debug | Exceptions to invoke the Exceptions dialog:


    Here you can put a checkmark on Common Language Runtime Exceptions for the debugger to break every time a managed exception is thrown. This way you will see more hidden exceptions, some of them originating deep in the .NET framework class library. Sometimes there are so much first-chance exceptions that you can become overwhelmed, but they are incredibly useful to get to the root cause of the problem, because the debugger will show exactly where the exception originated preserving the original surrounding context. Another great advantage of breaking on first-chance exceptions is that the call stack is not unwound yet and the problem frame is still on the stack.

    Debugging tool windows

    OK, so now that we know how to attach a debugger and how to set the debugger options, let's see what happens after we've attached a debugger. For our exercise, you can open two instances of Visual Studio, attach the second instance's debugger to the first one, and then crash the first one using the lambda-expression crash code above. Instead of the Watson dialog on the debuggee process, you will see the following window in the debugger:


    Now you have the chance to break and see where the exception happened. Continue is useful if the exception is actually a first-chance exception and you'd like to pass it on to user code to handle it. Let's hit Break and examine what tool windows are available under debugger.

    Processes window

    All the debugger windows are available from menu Debug | Windows. The Processes window shows the list of processes that the debugger is currently attached to. A nice trick is that you can actually attach to multiple processes at the same time. Then you can use this window to switch the "current" debuggee process and all other tool windows will update to show the content of the "current" process.


    Note: on our team, we use this window a lot, because our tests run out-of-process. Our test process starts Visual Studio in a separate process and automates it using DTE, remoting and other technologies. Once there is a failure in the test, it's useful to attach to both the test process and the Visual Studio process under test. The most fun part is when I was debugging a crash in the debugger, and attached the debugger to the debugger process that is attached to some other process. Now if the debugger debugger crashes on you on the same bug, then you're in trouble ;) Sometimes I definitely should blog more about the fun debugging stories from my day-to-day job. But I digress.


    As we all know, processes have multiple threads. If you break into a debuggee process, you will most likely end up with a list of threads that were active at the moment when you did break. Main thread is the green one - this is the UI thread of your application (in our example, Visual Studio). Main thread executes the application message loop, and is pumping the windows messages for the application's UI. If a message box or a dialog is shown, you will see this dialog's message loop on the main thread.


    To switch threads, double-click the thread name that you're interested in. In most of the cases you'll be interested in the main thread. But if you start your own threads, give them a name so that they are easy to find in the threads list.

    Call stack

    Every thread has a call stack. Call stack is probably the most important thing you'd like to know about a crash - what was the sequence of function calls that lead to a crash? If the program is hanging, you'd like to know what function is it hanging in and how did it get there. Oftentimes by just glancing at a callstack you immediately know what's going on. "Is this callstack familiar?" or "Who is on the callstack?" is probably the question we ask most often during the bug triage process.

    Anyway, here's the call stack window:


    In our example we see that the C# language service module is on the stack (.dlls and .exes are called "modules" in the debugging world).

    However instead of function names from cslangsvc.dll we see the addresses of the procedures in memory. This is because the symbols for the cslangsvc.dll module are not loaded. We'll look into how to load symbols in a moment.


    The Modules window shows a list of .dlls and .exes loaded into the debuggee process:


    There are multiple ways to load symbols for a given module. You can right-click on a module for a list of options:


    Symbols for modules are stored in .pdb files and are produced with every debug build of the binary. pdb files contain mapping from compiled binaries back to the original source code, so that the debugger can display rich information (function names, source code locations, etc.) for the binary being debugged. Without symbols, debugging is only possible at the assembly level and registers window, you can't map the binaries back to the source code.


    A very useful dialog is the Tools | Options | Debugging | Symbols:


    Here you can set paths where to look for the .pdb files. Normally, the .pdb files will be directly next to the binaries, in which case they are usually found and loaded automatically. Loading symbols takes some time, so Visual Studio supports caching symbol files to some directory. Also, if you don't want all the symbols for all the binaries loaded (it can take a while), you can check the checkbox "Search the above locations only when symbols are loaded manually". Load symbols from Microsoft symbol servers provides symbols for Microsoft products, such as Windows and Visual Studio. You can load symbols from here, or also from the Modules or Call Stack windows, by right-clicking on a module and choosing Load Symbols From. Since we're debugging into Visual Studio, the symbols are located on the Microsoft Symbols servers:


    When we load public symbols, the following little dialog is showing:


    After we've loaded the symbols for cslangsvc.dll we notice that the Call Stack window is now much more informative:


    Given this call stack, anyone of our developers will now easily be able to pinpoint the problem. In our example we see that the problem happens when we try to show the Smart Tag for the Generate Method refactoring:


    As you see, there are more modules for which the symbols haven't been loaded yet. You can load the symbols for those modules by right-clicking their stack frame and choosing Load Symbols From. Loading all the symbols is usually recommended for more detailed information.

    That is why the call stack is so precious during bug reports. To save the call stack, you can Ctrl+A and Ctrl+C to copy all the stack into the clipboard. When you click Send Error Report in the Watson dialog, the call stack is included in the error report.

    Also you see that the call stack is not very useful without the symbols - it's the symbols + the callstack that provide rich information about the crash.


    Another very useful information that you can provide about the crashed process is the memory dump (heap dump, minidump) - this is basically a snapshot of the memory state of the crashed process. To create a minidump, go to Debug | Save Dump As. Visual Studio will offer you to save the dump to a location on disk, and an option to just save the Minidump or Minidump with Heap:


    Minidump with Heap will save more information than just the minidump, but will take a whole lot more disk space (for Visual Studio - possibly hundreds of megabytes - depending on the process working set size during the crash).

    Those are the three components that we usually send to our developers during crash bug reporting:

    1. Call stack
    2. Symbols
    3. Minidump with heap

    In most cases, they are able to understand the problem given this information. A list of repro steps is usually even better, because they can reproduce the problem themselves and enable first-chance exceptions to break early into the problem area.

    Debugging hangs

    If an application hangs, you can attach the debugger to it and hit break to see what thread and what call stack blocks execution. Usually looking at the call stack can give you some clues as to what is hanging the application. Debugging hangs is no different than debugging crashes.

    Microsoft shares customer pain

    Finally, I'd recommend everyone to watch this video:


    In this post, I've talked about some things worth knowing for effective debugging sessions:

    • Turning off "Enable just my code"
    • Attaching the debugger using Tools | Attach To Process
    • Selecting Managed, Native for mixed-mode debugging or Enable unmanaged code debugging
    • Attaching to multiple processes
    • Selecting processes and threads
    • Breaking on first-chance exceptions using the Debug | Exceptions dialog
    • Picking the right thread
    • Loading symbols
    • Viewing the call stack
    • Saving the minidump file

    Do let me know if you have feedback or corrections about this information.

    kick it on DotNetKicks.com
  • Kirill Osenkov

    A one-line program to count lines of code


    I wanted to sum the total lines of code in files in a given folder. I thought that writing my own program to do this would be faster than looking for it on the internet, so here's what I came up with (1 line broken into 7 lines to fit into your blog reader):

    using System;
    using System.Linq;
    using System.IO;
    class Program
        static void Main(string[] args)
                    Environment.CurrentDirectory, "*", 
                    string.Join(" ", args).Contains("/s") 
                        ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly)
                .Select(f => File.ReadAllLines(f).Length)

    Just name the executable loc.exe and put it into your PATH - you're good to go. Input "loc" in the command prompt to get the total number of LOC in the current directory, and "loc /s" to do recursive search.

    Please note that the way I wrote this program is not very good for debugging, because you can't put a breakpoint on substatements (technically speaking, this program consists of only one statement). In production code, I would rather write something like this:

    string path = Environment.CurrentDirectory;
    string pattern = "*";
    string commandLine = string.Join(" ", args);
    SearchOption searchRecursively = commandLine.Contains("/s") 
                ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly;
    string[] files = Directory.GetFiles(path, pattern, searchRecursively); IEnumerable<int> lengths = files.Select(f => File.ReadAllLines(f).Length); int totalLOC = lengths.Sum();

    because it better conveys the step-by-step actions and allows to put a breakpoint on any step. However, for my little program, my sense of style guided me to use the former notation.

    As another side note, the "production version" doesn't use var for type inference. I think it improves readability in this case.

  • Kirill Osenkov

    Links about Visual Studio 2010 and C# 4.0


    A while ago we announced Visual Studio 2010 and C# 4.0. In case you'd like to catch up and read articles or watch videos about the upcoming new features, I wanted to gather some links here.


    TechEd EMEA 2008 (Barcelona)

    C# 4.0 Language

    As you probably know by now, C# 4 is mostly oriented on interoperability: better COM interop, better interop with dynamic languages (like Python), better interop with the browser (Silverlight, JavaScript) and better Office interop. Briefly, the new language features are:

    1. Dynamic - "late binding" in C# - a way to delay binding and code execution to runtime
    2. Variance - generic co- and contravariance
    3. Named and Optional parameters - better parity with VB, no more "ref missing", more readable code with named parameters
    4. COM interop support - omitting "ref" for COM calls, NoPIA (embedding Primary Interop Assemblies directly into your assembly to ease deployment) and some others

    What to read about C# 4 language features? Well, first and foremost, there is always Charlie's blog that accumulates all C# news:

    C# compiler developer's blogs are a terrific technical resource - highly recommended:

    C# IDE improvements

    As far as I can tell, we don't yet have a comprehensive resource about the new features in the C# IDE. Well, let me spill the beans and briefly mention them here, before I dive into the details in my upcoming posts.

    1. Call Hierarchy - for a method or property, allows to view Calls To (callers), Calls From (callees), Overrides (if applicable) and Implementations (if applicable).
    2. Quick Symbol Search - a very simple dialog with a textbox and a listbox that allows you to type a symbol's name or part of it (any type or member) and hit Enter to navigate to it quickly. Ctrl+, is the shortcut that you can reconfigure if necessary.
    3. Generate From Usage - just like we had Generate Method Stub by pressing Ctrl+. on a SmartTag under undefined method call, now you can generate a new class for an undefined class, a constructor for an unknown constructor, a property, an enum, etc.
    4. Highlight References - when a cursor is on a symbol, all the references to this symbol are highlighted in the currently opened file.
    5. Many other miscellaneous improvements, such as a better background compiler ("Live Squiggles"), XML doc rename refactoring, etc.

    .NET Framework 4.0

    Justin van Patten has a great overview of upcoming new features in .NET 4.0. It is worth mentioning, that 4.0 will have a new CLR version 4.0 (2.0, 3.0 and 3.5 were all working on the CLR version 2.0). Most notably, .NET will introduce tuples, code contracts, parallel extensions, variance annotations and a whole lot more.

    Other Visual Studio improvements

    Of course, I'm only scratching the surface of what the next Visual Studio will bring - many teams across the division worked hard and implemented a lot of cool stuff (e.g. Architecture Explorer, Sequence Diagrams, etc). I'm only mentioning some features that I personally find really interesting as a member of the C# team.

    • New Editor - the Visual Studio code editor has been fully rewritten from scratch using C#, WPF and MEF. It shims the old editor interfaces and pretends to behave just like the old editor, but it also exposes a nice new managed API that you can program against. A huge advantage of the New Editor is flexibility and extensibility - you can do really weird things with it, given the fact that it's WPF and allows any UIElement to be placed on it.
    • WPF Shell - the Visual Studio UI (Shell, menus, toolwindows, toolbars) has been rewritten using WPF as well, to allow customization. Not all of the UI has been converted, so there are some "islands of hWnd" floating in WinFormsHost, which will be converted in future versions.
    • Back-in-time debugger - the Debugger team introduced a new Historical Debugger, which records the program's execution and allows you to rollback the program's state (callstack, variables, etc.) to an earlier point in time.

    In my next blog posts, I will start talking more about the upcoming new features (especially the Call Hierarchy one - since I was the one who tested it).

    kick it on DotNetKicks.com

  • Kirill Osenkov



    Static analysis tools allow us to measure code quality and better understand the design and architecture of software. There are things about source code that are not visible to eye in our day-to-day work: dependencies between components, cyclomatic complexity of methods, hidden and indirect dependencies. There are dangers that a method or a type will become overly complex and induce maintenance nightmare without us even noticing it.

    Today’s post is about NDepend, a static analysis tool developed by Patrick Smacchia. NDepend measures software quality by extracting metrics, gathering and reporting statistics and visualizing architecture.

    Here’s what NDepend does:

    • It analyzes .NET assemblies and provides all thinkable information about them both in a report and interactively
    • It reports various metrics about assemblies, namespaces, types and members
    • It treats the source code as a database and provides a powerful Code Querying Language (CQL) that allows you to write custom SQL-like queries against your codebase
    • It allows navigating the codebase using links, references, usage graphs, call graphs, etc.
    • Unlike some other static analysis tools, NDepend provides rich interactive visualizations of your architecture, that allow you to visually understand layering, distribution of code, code sizes, relationships and dependencies
    • NDepend has an extensive set of Queries and Constraints (compare with FxCop rules) that run against compiled binaries and provide suggestions on how to improve your code (“Top 10 complex methods that need refactoring”, “All methods that shouldn’t be public and can be made internal”, etc.)
    • You can run all these queries to get a full analysis report about your software

    Since both Patrick and I like static analysis and are interested in measuring software quality, Patrick suggested that I try out NDepend on one of my projects and share my experiences – and I gladly agreed. I’ve heard about NDepend long ago, and even before I joined Microsoft I blogged about static analysis tools and publicly promised to take a look at NDepend someday, which I felt was a very interesting tool with unique capabilities. Well, since Patrick recently contacted me himself :), there can be no better opportunity to download NDepend and give it a try. Here I’ll provide my first impressions.

    I went to www.ndepend.com and started to look around. I liked the site and the fact that there were screenshots and some nice 3-min overview videos. Seconds after clicking on a big download button, NDepend was on my desktop. It comes with two executables – command line NDepend.Console.exe (for the build automation) and the UI VisualNDepend.exe. Of course, I started the VisualNDepend first and it presented me with a nice start page:


    The next thing is to create a new NDepend project and add the assemblies to it that you want to analyze.

    1. First thing that you have to do is to create a project and add the .dlls. Make sure the .pdb files are next to it, so that NDepend can locate the source files and provide additional goodness. If .pdb files point to a non-existing source location, you can use the source file rebasing feature in Project Properties -> Analysis.
    2. I loaded my 6-assembly StructuredEditor project and it automatically displayed the dependent .NET framework .dlls:
    3. The next step was easy to find as well: the Big Green Run Button to start the analysis:

    After the analysis ran, it displayed a very comprehensive report about my project. Even glancing over this report already provided a lot of useful information to me:

    1. I was pleased to see that my overall code quality is pretty good :)
    2. It was good to know some quick metrics (number of lines of code, comments, IL instructions, assemblies, types, abstract classed, interfaces, value types, exception classes, attribute classes, delegate classes, percentage of public types and methods, etc.)
    3. It provided a nice visual view of all assemblies, types and members sorted by code size:
    4. I was pleased to see my code in the green area of Abstractness vs. Instability (green is good!):
    5. NDepend visualized the assembly reference graph for me:
      This is really great, since I had this graph in mind before, but I never saw a tool draw this graph for me automatically – it’s like it was reading the architecture from my thoughts.
    6. Finally, it provided a long list of CQL Queries and Constraints (you have to see for yourself, I won’t post it here)
    7. Just reading the rules can be as useful as reading Framework Design Guidelines. Just for example, I learned this:
      // <Name>Attribute classes should be sealed</Name>
      WARN IF Count > 0 IN SELECT TYPES WHERE IsAttributeClass
      AND !IsSealed AND !IsAbstract AND IsPublic AND !IsInFrameworkAssembly
      // The .NET Framework class library provides methods for retrieving custom attributes. 
      // By default, these methods search the attribute inheritance hierarchy; for example 
      // System.Attribute.GetCustomAttribute searches for the specified attribute type, or 
      // any attribute type that extends the specified attribute type. Sealing the attribute 
      // eliminates the search through the inheritance hierarchy, and can improve performance.

    After examining the report, I came back to the main application to find that it presents all the information contained in the report, but interactive:


    I have to say that the UI looks great, but at first seems rather overwhelming and I felt like sitting in a Boeing cockpit. But hey – you can’t expect a complex static analysis tool to have a notepad-like interface – it has to expose its features somehow. Surprisingly, you get used to the UI really quickly and I found my way around really easily.

    All you have are the following things:

    1. Class Browser – displays all assemblies, namespaces, types and members
    2. Metrics – displays a visual map on various levels (types, members, namespaces, etc)
    3. Dependency matrix – helps you analyze dependencies between two code elements
    4. Dependency graph – draws a DAG (remember I was writing about DAG and Topological Sorting?)
    5. CQL Queries – a library of predefined queries over your codebase (and you can write your own!)
    6. Info – information about the selection (updated with focus changes)

    Hopefully this can provide a basic overview of NDepend, what it’s for and what it does, as well as its main features. I was very impressed with the tool, really liked it and looking forward to use it more for better architectural insights and better code quality.

  • Kirill Osenkov

    Book review: Essential C# 3.0 by Mark Michaelis


    Sometimes I'm being asked what book I'd recommend for learning C#. Until recently, I was hesitant and didn't know what to answer. It depends, among other things, on the following factors:

    • Whether the learner has existing programming background
    • Whether he or she wants to learn the language, or the .NET framework as well
    • Whether the learner is already familiar with C# 1.0 (2.0) and wants to learn primarily about C# 3.0 features, such as LINQ

    Well, now I have a good answer for the majority of the cases. Mark Michaelis, a C# MVP, wrote a great book called Essential C# 3.0. It requires pretty much no programming background, and beginners can ramp up easily. One immediate thing about Essential C# 3.0 is that the book is very easy to read. You literally swallow page after page freely and effortlessly. This is one of the major reasons why I'm recommending this book to beginners who ask me where to start.

    The first several chapters introduce basic programming concepts - variables, statements, control flow, methods, and continue gracefully to the .NET type system, classes and objects. It's nice to know that even when you're learning the basics, you're learning the latest version of a modern language and an industry standard.

    The book continues to do a great job at the intermediate level, with chapters about classes, inheritance, interfaces and value types. A great asset is a special chapter called Well-Formed Types, which collects a series of useful information and best practices, such as overriding object members, referencing other assemblies, generating XML comments, finalization and garbage collection etc.

    C# 2.0 features are covered well in a dedicated chapter about Generics and also in other parts of the book (iterators and yield return, anonymous methods).

    Of course, given the book's title, C# 3.0 features and LINQ are introduced and explained well in the following chapters: Delegates and Lambda Expressions, Collection Interfaces with Standard Query Operators, Query Expressions. Several other chapters show more aspects of using the C# language: Events, Exception Handling, Building Custom Collections.

    Finally, an advanced C# user will find a lot of interesting things here as well. I was particularly attracted by a great chapter on Reflection and Attributes, and the two "icing-on-the-cake" chapters: Multithreading and Multithreading patterns. I've learned a lot of new things for myself here. Also, throughout the rest of the book, special "Advanced Topic" fragments provide a lot of value to an experienced reader. For example, I've learned that the new operator generates newobj IL instruction for reference types and initobj for value types - I didn't think about this distinction before.

    To sum up, a great book about C# for every reader - beginner, intermediate, advanced. It's not only about the language - CLR is also covered pretty well (IL, garbage collection, JIT, AppDomains, reflection, threading). This book doesn't cover any of the applied libraries (such as ASP.NET, ADO.NET, WPF or Silverlight), but even without it, it's already more than 700 pages thick. I also really love the formatting, fonts, spacing and illustrations - very ".NET-style".

    A great read and highly recommended.

  • Kirill Osenkov

    Live Geometry project updated to Silverlight 2 final release


    Finally I had time again to continue working on my Live Geometry project on CodePlex. While I was busy with other stuff, Silverlight 2 has officially released and the rest of the world has already upgraded from Beta 2 to the release version. Now I'm catching up: I've updated the source code for both Silverlight (online) and WPF (desktop) editions, so it should build fine. As usual, you can preview the Silverlight (online) edition at:



    As you can see, I started implementing the coordinate grid and axes. I'm thinking to implement pan/zoom functionality by dragging the coordinate system and using the mouse wheel. Also I need to have a toggle button to show/hide the grid/axes, and numbers (-2, -1, 0, 1, 2, ...) to show up along the coordinate axes. Then I can start working on plotting function graphs and charts.

    I've also updated the desktop (WPF) version, although I didn't have time to make it look beautiful:


    The beauty of WPF is composability. The toolbar icons use the same logic for drawing points/lines/circles as the main canvas. This way the icons are vector graphics and can be scaled if I want to make the toolbar buttons larger in the future. Also, the buttons in the toolbar are generated automatically using Reflection and data-binding, so when I add a class for a new tool, I don't need to do anything else - reflection will pick it up and create a toolbar button for it using WPF databinding.

    Also the good news is that the WPF edition shares the same source code with the Silverlight version. Only at two or three places in code I had to use the #if SILVERLIGHT compiler directives, mostly because WPF VisualTreeHelper.HitTest is called VisualTreeHelper.FindElementsInHostCoordinates in Silverlight (because of slightly different semantics). Andy Beaulieu blogs about this breaking change here: http://www.andybeaulieu.com/Default.aspx?tabid=67&EntryID=95

  • Kirill Osenkov

    Kirill's Visual Studio Tips


    I decided to compile a small list of Visual Studio features that I use a lot.

    Code Editor

    1. Code Snippets for ‘prop’, ‘foreach’, ‘if’, ‘class’, ‘interface’, Tab-Tab completion
    2. Surround with (snippets) - try, if
    3. Ctrl+K,D – FormatDocument
    4. Ctrl+Left/Right arrows in code editor – jump between words faster
    5. Ctrl+K,C – Ctrl+K,U – comment/uncomment selection
    6. Dynamic Snippets in the XML Editor - I bet a lot of people don't know about this one


    1. F12 (Go To Definition) – position the caret on an identifier and hit F12
    2. Shift+F12 (Find All References) – then hit F8 to step through the results in the Find Symbol results window
    3. Click the “back” button on the Microsoft Explorer mouse with your thumb navigates back (under C# profile)
    4. Right-click on a document Tab – Copy Full Path, Open Containing Folder


    1. Position the caret on an identifier and hit F2 (Refactor -> Rename)
    2. Extract Method (Context Menu)

    Code generation

    1. Generate Method Stub (Smart Tag) - Ctrl+. and enter
    2. WindowsContextMenuButton + O + ENTER + A – quick way to organize, remove and sort usings


    1. Debug -> Exceptions (allows breaking on a first-chance exception)
    2. Right-click on a breakpoint brings up a context-menu which not everyone knows about ;-)
    3. Tools -> Options -> Debugging -> Enable just my code
    4. Attach to process -> Cross Platform Debugging (Managed, Native)
    5. .load sos, !clrstack, !dumpheap, !gcroot, etc.

    Solution Explorer

    1. Tools -> Options -> Projects and Solutions -> Track Active Item in Solution Explorer
    2. Right-click on an item in Solution Explorer – Open Folder in Windows Explorer
    3. Right-click on a project -> Unload -> edit .csproj file in XML

    Macros and automation:

    1. Coding productivity: macros, shortcuts and snippets
    2. How to map a Visual Studio macro to a keyboard shortcut?

    P.S. Sara Ford has much much more in her famous Did you know series.

    kick it on DotNetKicks.com
  • Kirill Osenkov

    An algorithm to solve a matrix game


    There is a nice game called Enlight from FeejoSoft. I have it on my PocketPC phone and for the life of me I just couldn't get past level 33. So yesterday I finally sat down and implemented an algorithm to solve this game automatically.

    Of course, I used C# and Visual Studio (as a tester, I used our latest internal build of Visual Studio 2010, although it will run just as fine on 2008 as well).

    The game

    The rules are really simple. There is a boolean matrix 10x10, and you can invert cells. When you invert a cell, all its neighbors (usually four) are inverted with it. Given this operation, you are required to nullify the matrix (turn off all the lights):

    FeejoSoft Enlight

    The game is freeware and you can use your favorite search engine to find and download it (e.g. from freewareppc.com, no responsibility for links). If you'd like to run it in an emulator, here are the steps:

    1. On a machine with ActiveSync, install the game executable.
    2. Use dir /s setup*.cab to find the .cab file (usually it will unpack to \Windows\WindowsMobile\TapIsland)
    3. Open Visual Studio and create a new SmartPhone application
    4. Run (F5) to launch the emulator
    5. In the emulator menu, click File -> Configure -> Shared path and specify the path to the .cab file from step 2
    6. Open File Explorer in the emulator and go to the Storage card folder
    7. Install the game by running the .cab file from there
    8. Run the game!

    The algorithm

    The idea I used is commonly used for such problems (e.g. eight queens puzzle (http://en.wikipedia.org/wiki/8_queens) is usually solved using depth-first search with backtracking). I encourage you to visit this wikipedia article, it is very interesting read.

    We'll visit every cell column by column. Every cell will have it's next cell which is down in the next row, or the first cell of the next column to the right. For brute force, you would recurse to visit the next cell, then come back, toggle the cell and recurse again. This way every cell will be tried in both combinations. However this will yield 2^100 iterations. We can considerably trim down the search if we don't recurse for knowingly invalid boards. This way we prune the "phase space" of the problem significantly. Here's the complete code for the solver:

    namespace MatrixGame
        public class Solver
            private Field sourceField;
            private Field solution = new Field();
            private Field currentSolution = new Field();
            private Solver(Field field)
                this.sourceField = new Field(field);
            public static Field FindSolution(Field field)
                Solver solver = new Solver(field);
                return solver.solution;
            private int steps = 0;
            /// <summary>
            /// Main algorithm
            /// </summary>
            void Search(FieldCoordinates current)
                // we traverse the cells column by column, 
                // left to right, top to bottom
                var next = current.GetNextCell(sourceField);
                // we've reached the end of the field
                if (next.Column == sourceField.Width)
                    if (LeftCellIsEmpty(current))
                // pruning - only continue searching if this is a plausible board so far
                if (current.Column == 0 || LeftCellIsEmpty(current))
                // let's invert the current cell and search again
                if (current.Column == 0 || LeftCellIsEmpty(current))
            /// <summary>
            /// If the cell to the left from current is not empty,
            /// the current board cannot be a solution - 
            /// no point to continue down this branch.
            /// Prune this and backtrack to another branch.
            /// </summary>
            bool LeftCellIsEmpty(FieldCoordinates current)
                return !sourceField[current.Row, current.Column - 1];
            /// <summary>
            /// We've traversed all elements - 
            /// let's check if what we've come up with is actually a solution
            /// we checked almost all cells in LeftCellIsEmpty
            /// let's just check the last (rightmost) column
            /// </summary>
            void CheckSolutionCandidate()
                if (sourceField.GetColumn(sourceField.Width - 1).IsNull())
            /// <summary>
            /// We got it! Do something with the solution!
            /// </summary>
            void SolutionFound()
                solution = new Field(currentSolution);

    The solution program

    Here's what I came up with. On the left board, you set up the level like you see in the game (the picture shows The Notorious Level 33!). Left-click to toggle a cell, drag the mouse to "draw" adjacent cells.

    On the right, you have a chance to play this level, however, the orange dots show which cells you need to click to turn off the lights. If you feel honest, you can turn off the orange dots using the checkbox.


    You can download the full source code here:


    The research

    First of all, I was amazed how efficient the pruning is. The fact that we don't check for knowingly invalid boards brings us down from 2^100 to just 93183 times the Search method is executed. On my machine this is so fast, that I manage to recalculate the hint solution in real time, as the user clicks in the left board.

    Also, building this project raised more questions than it answered. I'm sure there is a whole science behind this seemingly simple domain - Conway's Game of Life is just around the corner, not to mention MineSweeper, cellular automata, even-odd-logic, 0,1-matrices, constraint programming, etc. etc. It is fascinating how complex solutions exist to invert a single pixel on the board:


    What is the pattern here? Is the solution always unique? Why is there no solution for some configurations? What if you apply the solution to the solution? What about unlimited two-dimensional plane? Which single-cell source board have a solution? Which don't? What is special about a board consisting of such cells? What is the ratio of yellow cells for such boards? All these are interesting questions I'd love to think about, if only someone would do my main job for me :)

    The challenge

    As usual, better/faster/shorter solutions are welcome. I'd be especially curious to see a solution in F# or Prolog. Feel free to reuse my UI code.

  • Kirill Osenkov

    Too much type information, or welcome back System.Object and boxing


    We all know that generics are good - they promote code reuse, static type checking by the compiler, increase runtime performance, allow more flexible OOP designs, lay the foundation for LINQ, help the IDE to provide more helpful IntelliSense and have tons and tons of other vital advantages. "var" is another good feature, which (unlike "object"), also helps to preserve full static type information.

    However I hit a rare case recently where I had too much static type information about my code, so I had to use System.Object (and boxing) to get the desired effect. I had a method that used reflection to set a property on a type, similar to this:

    static void SetProperty(object f)
        Type type = f.GetType();
        PropertyInfo property = type.GetProperty("Bar");
        property.SetValue(f, 1, new object[0]);

    I also had a struct like this:

    struct Foo
        public int Bar { get; set; }

    Now, I tried to set the Bar property on an instance of the struct:

    static void Main(string[] args)
        var f = new Foo();
        Foo foo = (Foo)f;

    It didn't work! It printed out 0! I was puzzled. And then I realized what is happening. Since Foo is a struct, and f (thanks to var!) is also statically known to be a struct, the compiler passes a copy of the struct by value to the SetProperty method. This copy is modified, but the original f is not.

    One simple change and it started working fine:

    static void Main(string[] args)
        object f = new Foo();
        Foo foo = (Foo)f;

    I changed var to object, the struct was boxed into an object on the heap, the reference to this same object was passed to the SetProperty method, method set the property on the boxed instance, and (Foo) unboxed the same modified instance - the code now prints out 1 and everything is OK again.

    "var" provided too much type information to the compiler - it avoided boxing, and knew that the variable is a struct, so I lost the modified value. After casting to object, we hid the extra information from the compiler and got the uniform behavior for both value types and reference types.

    In my original code where I encountered this peculiar behavior (a custom deserializer that reads XML and uses reflection to set properties on objects), I was too focused on working with all types so I forgot that those can be value types as well. Since I had everything strongly typed with generics, type inference, vars and other modern goodness, the kind hardworking compiler preserved all the information for me and avoided boxing where I was expecting to get reference type behavior. Thankfully, unit-tests revealed the error 10 minutes after it was introduced (I definitely need to post about the usefulness of unit-tests and TDD in the future), so it was a quick fix to box a type into object before filling its properties.

    It was an amusing experience.

  • Kirill Osenkov

    More comments on generics


    Here are some more unsorted thoughts on generics to continue this post (which has some interesting comments too).

    Preserving types with generic type inference

    To reiterate, instead of having a method like:

    void Process(object instance)

    one can write:

    void Process<T>(T instance)

    and thus preserve the static type information about the parameter. One can still call this method like Process(stuff) without having to explicitly specify the generic type parameter: Process<StuffType>(stuff).

    Casting by example

    This technique can be useful (among others) in one specific situation - where we can't specify the generic type parameter, because it is of an anonymous type. We can still call a generic method and "specify" the anonymous type parameter using type inference (more precisely, the smart C# compiler will do it for us). I blogged more about this here: http://kirillosenkov.blogspot.com/2008/01/how-to-create-generic-list-of-anonymous.html

    Static data is not shared among constructed generic types

    Jacob Carpenter reminds us of this important (and useful!) fact: http://jacobcarpenter.wordpress.com/2008/07/21/c-reminder-of-the-day/

    You can constrain a generic type parameter on a type to be its derived type

    Darren Clark mentions another nice trick with generics:

    public class MyBase<T> where T: MyBase<T>

    Here, we essentially limit T to be a type derived from MyBase<T>.

    Limits of type inference

    C# 3.0 compiler is much smarter than the 2.0 when it comes to type inference. Jon Skeet described this really well in his book C# In Depth. However there is still room for future improvement - the type inferencing engine can be made even smarter to deduce types from the generic constraints. I hit this limitation recently when I tried to implement the topological sorting algorithm as a mix-in (using extension methods and generic type inference). Paste this program and it will compile fine, however if you remove the generic arguments in the call to TopologicalSort in Main(), it will fail to compile:

    using System.Collections.Generic;
    // Generalized algorithms and data-structures - I want to reuse them by "mixing-in"
    interface IDependencyNode<TNode, TNodeList>
        where TNode : IDependencyNode<TNode, TNodeList>
        where TNodeList : IDependencyList<TNode>
        // TNodeList Dependencies { get; } <-- that's why I need TNodeList
    interface IDependencyList<TNode> : IEnumerable<TNode> { }
    static class Extensions
        public static IEnumerable<TNode> TopologicalSort<TNode, TNodeList>(this TNodeList nodes)
            where TNode : IDependencyNode<TNode, TNodeList>
            where TNodeList : IDependencyList<TNode>
            return null; // algorithm goes here
    // Mixing-in to my concrete world of Figures and FigureLists
    // I basically get the implementation of topological sort for free
    // without inheriting from any classes
    class Figure : IDependencyNode<Figure, FigureList> { }
    class FigureList : List<Figure>, IDependencyList<Figure> { }
    class Program
        static void Main(string[] args)
            FigureList list = new FigureList();
            // wouldn't it be sweet if we could infer the type arguments here??
            // list.TopologicalSort(); // doesn't compile
            list.TopologicalSort<Figure, FigureList>(); // compiles fine

    In this case, one could potentially figure out the types Figure and FigureList (at least, it is doable by a human :), but then the type inferencing algorithm becomes even more complex than it is now (in fact, I suspect that it would become as powerful as a typical Prolog solver because it would require unification). The C# compiler team has certainly higher priority tasks now than implementing Prolog into the C# compiler.

    Finally, when I look at the code above, it is too complex, difficult to understand and clumsy. One shouldn't pay such a high price for the flexibility of mix-ins. There is a much more simple and elegant solution to the problem which I hope to come back to later.

    kick it on DotNetKicks.com
  • Kirill Osenkov

    Why a comparison of a value type with null is a warning?


    A reader (Petar Petrov) asked me a question recently which I didn't quite know how to answer:

    Why a comparison of a value type against null is a warning? I definitely think it should be a compiler error.

    So I asked the C# compiler team and here's the explanation (please welcome today's special guest Eric Lippert):

    Why is it legal?

    It's legal because the lifted comparison operator is applicable. If you are comparing an int to null then the comparison operator that takes two int?s is applicable.

    Why is it a warning?

    It's a warning because the comparison always results in "false".

    Why is that not an error?

    Let's turn that around -- why should it be an error?  Why should any comparison which the compiler knows the answer to be an error?  That is, if you think this should be an error, then why shouldn't
    if (123 == 456)
    be an error?  Or for that matter, why shouldn't
    if (false)
    be an error?

    Three reasons why none of these things should be errors:
    First, the argument that this is work for me. The spec is complicated enough already and the implementation is divergent from the spec enough already; let's not be adding even more special cases that I can then get wrong for you.

    Second, the argument from design. By design, C# is an "enough rope" programming language -- we do not try to constrain you to writing only meaningful programs. Rather, we let you write almost any program, and then give you warnings when it looks like you might be entangling yourself in the rope we gave you. If you don't like that, choose a language that gives you less flexibility.  (This is part of the impetus behind the push towards more declarative programming languages; declarative programming languages are less likely to contain senseless commands because they consist of descriptions of how things are desired to be.)

    Third, the argument about generated code. We do not disallow statements like if (12 == 13) { whatever... } because not all code is typed in by humans. Some of it is generated by machines, and machines often follow the same rigid rules generating code that compilers do consuming it. Do we really want to put the burden upon machine-generated-code providers that they must jump through the same constant-folding hoops that the C# compiler does in order to avoid compiler errors? Do we want to make machine-generated code not only have to be syntactically and grammatically correct C# code, but also _clever_ and _well-written_ C# code? I don't think we do; I think that makes the job of both the code producer and the compiler writer harder without any corresponding gain in safety or productivity.


  • Kirill Osenkov

    How I started to really understand generics


    First off, a little explanation for the post title: not that I didn't understand generics before. I'd had a pretty good understanding of how they work, how they are implemented in the CLR, what facilities are provided by the compiler, etc.

    It's just that recently I started seeing some patterns and analogies that I hadn't seen before or just didn't pay enough attention to. This is a little fuzzy feeling but I'll still try and describe these insights, hopefully the reader will understand what I mean. These insights have helped me to think more clearly and have a better understanding of how to use generics in a richer way.

    For a start, let's consider a method called Process that processes command line arguments and modifies the argument list by removing the arguments that it could process:

    public ArgumentList Process(ArgumentList source)

    This design is not very good because it mutates the source variable, and I read somewhere that a method should either be immutable and return something or mutate stuff and return void, but not both at the same time. I think it was in Martin Fowler's Refactoring, but I'm not sure.

    But that's not the thing I wanted to talk about today. This design has another serious flaw: it essentially truncates its return type to be ArgumentList, so by calling this method we basically "seal" the output type to be ArgumentList:

    CustomArgumentList customList = new CustomArgumentList();
    ArgumentList processed = Process(customList);

    Even if we originally had a CustomArgumentList, once we pass it into our method, it "truncates" the type and all we have left on the output is the basic ArgumentList. We lose some type information and cannot enjoy the benefits of static typing and polymorphism at its fullest.

    So what have generics to do with all this? Well, generics allow us to "unseal" the return type of our method and to preserve the full static type information available:

    public T Process<T>(T source)
        where T : ArgumentList

    This way we can call Process on our CustomArgumentList and receive a CustomArgumentList back:

    CustomArgumentList customList = new CustomArgumentList();
    CustomArgumentList processed = Process(customList);

    See, type information is not sealed anymore and can freely "flow" from the input type to the output type. Generics enabled us to use polymorphism and preserve full static typing where it wasn't possible before. It might be obvious for many of you, but it was a wow-effect discovery for me. Note how we used type inference to make a call to Process<T> look exactly like a call to Process.

    Another observation here is that when you're talking about covariance and contravariance of generics, you can divide type usages into two groups: "in" and "out". Thus, we can have return type covariance and input parameter contravariance. Let's notice how "out" type usages essentially truncate the type information and limit ("seal") it to one base type. Generics are there to "unseal" this limitation and to allow for derived types to be returned without losing type information.

    I hope this wasn't too fuzzy and handwavy. It's just these little discoveries make me very excited and I'd like to share the way I see these things.

    Let's consider another insight: what's the difference between:

    1. class MyType<T> : IEquatable<T> and
    2. class MyType<T> where T: IEquatable<T>

    Update: I'd like to explain in a little bit more detail, what's going on here. In the first line, we're declaring that the type MyType<T> implements the IEquatable<T> interface (complies with the IEquatable<T> contract). Simpler said, MyType<T> is IEquatable<T> (can be compared with objects of type T). The <T> part reads "of T", which means MyType has a method Equals that accepts a parameter of type T. Hence, the first line imposes the constraint on the type MyType<T> itself.

    Now the second line imposes the constraint on the type parameter, not on the type itself. Now the type parameter that you specify must be a type that has an Equals method that accepts a parameter of type T (objects of type T can be compared with other objects of type T). I would imagine second line could be useful more often than the first line.

    Also, another line is possible:

    class MyType<T> : IEquatable<MyType<T>>

    This line would mean that MyType can be compared with itself (and, independently of this fact, use some other type T).

    Thinking about this difference made another aspect of generics clearer for me: with generics, there are always at least two different types involved: the actual type and the generic type parameters.

    Armed with this wisdom, let's move to a third mental exercise (again, it might all be simple and obvious for some readers, in this case I apologize and am very honored to have such readers). Is the following thing valid?

    class MyGenericType<T> : T

    It turns out it's not valid - we cannot inherit from a generic type parameter (Update: in the original post I wrote that this would lead to multiple inheritance - that's not true). This illustrates my previous points - with generics, there are always at least two different types involved, and we shouldn't mix them arbitrarily.

    Finally, to at least partially save this post from being a complete theoretical disaster, I'd like to share some code I wrote recently. Suppose you're implementing the Command design pattern and have various concrete Commands: CutCommand, CopyCommand, CloseAllDocuments command etc. Now the requirement is to support a unified exception throwing and catching functionality - every command should throw a generic exception InvalidOperationException<T>, where T is the type of the command. You might find this weird (who needs generic exceptions?) but in my real project at work we have some very clever exception classification logic, which requires that each exception should have a different type. Basically, each exception should know what type threw it. Based on this strongly typed exception source information, we do some magic to greatly simplify logging and classify exceptions into buckets based on their root cause. Anyway, here's the source code:

    class Command
        public virtual void Execute() { }
    class InvalidOperationException<T> : InvalidOperationException
        where T : Command
        public InvalidOperationException(string message) : base(message) { }
        // some specific information about
        // the command type T that threw this exception
    static class CommandExtensions
        public static void ThrowInvalidOperationException<TCommand>(
            this TCommand command, string message) 
            where TCommand : Command
            throw new InvalidOperationException<TCommand>(message);
    class CopyCommand : Command
        public override void Execute()
            // after something went wrong:
            this.ThrowInvalidOperationException("Something went wrong");
    class CutCommand : Command
        public override void Execute()
            // after something went wrong:
            this.ThrowInvalidOperationException("Something else went wrong");

    Note how two seemingly equal calls to ThrowInvalidOperationException will, in fact, throw different exceptions. The call in CopyCommand will throw an InvalidOperationException<CopyCommand>:


    while the same call from CutCommand will throw an InvalidOperationException<CutCommand>:


    Here (attention!) we infer the type of the command from the type of the "this" reference. If you try calling it without the "this." part, the compiler will fail to resolve the call:


    So this is a situation where extension methods and generic type inference play nicely together to enable what I think is a very interesting trick. We preserve type information all the way from where an exception is thrown, "remember" the type that threw the exception, and still have full type information available when we catch and process it. A nice example of how generics help type information "flow" from one part of your code to other. This is also in line with my first example where we see how type information "flows" from input to output parameter of a method.

    Note: for those of you who spent some time with Prolog, don't you have a déjà vu of how Prolog allows values to "flow" between predicates? In Prolog every parameter to a predicate (method) can be "in" or "out" - input or output. Prolog infers which parameter is which depending on its usage elsewhere and inside the predicate body. I don't know if this analogy is valid at all, but I feel that C# compiler in a certain way does its type inference in a similar manner.

  • Kirill Osenkov

    Representing dependencies in code


    It is more often than you would suspect that people have to deal with some sort of dependencies in an application they're developing. In this post I'll talk about some common algorithms and data-structures to work with dependencies that might turn out helpful with some real-life programming tasks.

    To better understand what kind of dependencies I'm talking about, let's consider a couple of examples first.

    Example 1: A Spreadsheet Application

    In Excel every cell can contain a formula which depends on other cells:


    Here we have the value in C5 depending on values in B2 and D3. A couple of things to note:

    1. Whenever a value in B2 changes, the value in C5 is recalculated automatically based on the formula.
    2. A cell can have multiple dependencies (i.e. cells that this cell depends upon)
    3. Cells can build dependency chains and trees (whenever a cell is changed, the entire subtree is recalculated)
    4. Cells cannot have circular dependencies. If you try to create a circular dependency, Excel will show a message box and draw a nice blue arrow to indicate a cycle in the dependency graph:

    Example 2: Visual Studio Project System

    If you look at a solution in Visual Studio with multiple projects, those projects will most probably reference other projects. When you build the solution, Visual Studio somehow figures out the correct order in which to build the projects, and always guarantees that all the dependencies are built before it builds any given projects.

    Same things to note here:

    1. Whenever a project changes, VS figures out that its dependent projects will need to be rebuilt during the next build.
    2. A project can reference multiple projects
    3. Projects can build dependency chains and trees
    4. Projects cannot have circular references.

    Example 3: Dynamic Geometry

    In a dynamic geometry application, like the one I'm building as a hobby project, geometric figures depend on other figures during their construction. For instance, you need two points to draw a line - and this line will depend on the two points.

    1. As you move a point, the line will redraw to pass through this point
    2. A figure can depend on multiple figures (a parallel line needs a point and a line to be constructed)
    3. Same as above
    4. Same as above

    Why is this important?

    It is important to be aware of such things as dependencies in your domain model and to be explicit about them in your code. Clearly expressed intent in code is goodness that is incredibly useful for those who write the code, who read the code and who use the code.

    I've seen a case where a program naively relied on the order in which objects were created, and failed to save them if the order was changed. It gave a message box like "Please rearrange your objects because I can't figure out in which order to save them, because they have complex interdependencies". Well, the dependency scheme was exactly like in the three examples above, so the application could very well handle the dependencies itself.

    Can you imagine Visual Studio giving you a message box saying: "Please rearrange the projects in your solution because I can't figure out in what order to build them"? That's why it's important - thinking a little bit about such things can make your code clean, smart, robust and the application easy to use.

    Data structure: DAG (Directed Acyclic Graph)

    A data structure that is commonly used to model such dependencies is called DAG. Wikipedia does a great job describing it, so I will just highlight some points here. First, and foremost, every tree is a DAG:


    But in a tree, a node cannot depend on more than one parent (like single inheritance in C#). In a DAG, a node can very well have more than one parent:


    This "diamond" shape is well known to C++ programmers - multiple inheritance is a great example of a DAG. Wikipedia makes a great point that a DAG "flows" in one direction (there can be no cycles).

    Each node in a DAG has a list of dependencies ("parents") and a list of dependents ("children"):

    interface IDagNode
        IDagNodeList Dependencies { get; }
        IDagNodeList Dependents { get; }

    You don't have to store Dependents as well and recalculate them every time, but storing dependents greatly simplifies many algorithms, and it also makes the data structure symmetric.

    With a tree, you clearly know where's the top (root) and the bottom (leaves). A DAG, on the other hand, is symmetric - it can have many "roots" and many "leaves". Our diamond, for example, has one root (above) and one leaf (below) - but if you draw the arrows in the opposite direction, a root becomes a leaf, and a leaf becomes a root. Officially, a "root" is called a source, and a "leaf" is called a sink.

    Let's try to give some definitions:

    • Node A directly depends on node B if A.Dependencies.Contains(B).
    • Node A depends on node B if A directly depends on B or A directly depends on some C, which in turn depends on B (recursive definition).
    • A node is a source if it doesn't depend on any other node.
    • A node is a sink if no other node depends on it.

    Note that a non-empty DAG always has at least one source and at least one sink. Finding sources and sinks is easy - iterate over all nodes and select those that have their Dependencies/Dependents collection empty.

    Algorithm: Topological Sort

    So what's the use of a data structure if we don't have a useful algorithm that comes with it? Well, it turns out, we do. Every DAG can be linearized (serialized) into a sequence, where a node only depends on nodes that come before it. This special sequence is called topological sort of the DAG.

    Thus, to figure out the build order in which we want our Visual Studio projects to be built, we just take all of our projects and sort them topologically with regard to their references.

    The algorithm is actually pretty simple:

    1. Before visiting a note, check if we have already visited it (e.g. using a hashtable or marking the node) - don't visit it the second time.
    2. For every node that we visit, first visit all the dependencies recursively.
    3. After visiting the children, append the node being visited to the end of the results list.

    After visiting all the nodes, the results list will contain all the nodes sorted topologically.

    In my plane geometry application I use topological sorting to recalculate the coordinates of figures in the correct order - if I recalculate a line before a point is moved, the line will use the points old coordinates and will appear off-base. If you're interested, here's the source code.

    The problem with my implementation is that the algorithm is too tied to the nature of the objects being sorted (figures). In the future, I'll look into making this algorithm generic and reusable in other projects.

    Finally, I'll be interested in any examples of DAGs that you come across. I'm collecting such examples, so I'll be grateful if you tell me about your experiences with DAGs and topological sorting and where you encountered this remarkable data structure.

  • Kirill Osenkov

    Software Development Meme


    John tagged me in a software development meme - a sort of a game where you tell stuff about yourself and then pass on to other people. My moral values tell me it's not a chain letter so I guess it's OK to post :) Besides, it's fun and I'd be curious to learn more about others in the blogosphere.

    How old were you when you started programming?


    How did you get started in programming?

    My parents got me a book about BASIC and I got hooked up immediately. I only got a computer years later, so I was writing my first programs on paper (poor kid) and imagining how I'd be typing them in on a new shiny gray keyboard with loud clicking sounds... (people start having tears here). I stayed late in school computer class after classes and wrote code. What surprises me is that I guess today I'm no less fascinated by programming as back then...

    I wonder if I still can get my first program right:

    10 PRINT "What is your name?"
    20 INPUT A$
    30 PRINT "Hello " & A$

    I wonder if it "compiles".

    What was your first language?

    GWBASIC. It had an awesome IDE, boasting install size of about 40K, xcopy deployment, 100% keyboard support, lightning-fast responsiveness and an ultra-modern REPL loop similar to F# Interactive. Moreover it stored the entire program in memory - not every IDE can afford that nowadays.

    What was the first real program you wrote?

    There is a belief that every young russian-speaking programmer enthusiast HAS TO implement his/her own clone of Norton Commander and his/her clone of Tetris. This is if you're wondering where all those Volkov Commanders, Dos Navigators and FARs come from.

    Of course, I was no exception to that rule:


    Introducing Disk Guide - a file manager for Windows 95, written during 1998-1999 using VB5! If you're adventurous enough, you can give it a spin: http://www.osenkov.com/projects/diskguide/DiskGuide.rar - believe it or not, I'm still using it pretty regularly, after almost 10 years. It has a couple of gotchas (e.g. it chooses to bypass Recycle Bin when deleting files, so be careful), but I haven't seen any major bugs in it during the last several years. There are a lot of "features" though, so be warned (it's not a bug, it's a feature!).

    And here's the obligatory Tetris (VB6):


    Which reminds me. We had a hallway conversation with a colleague recently who is a developer on VB IDE team - he had asserted to be able to implement Tetris from scratch in 15 minutes. So we took a recent build of Visual Studio, a stopwatch - and of course it took him longer than 15 minutes. It took him whole 35 minutes.

    What languages have you used since you started programming?

    GWBASIC, LOGO, QuickBasic, TurboBasic (yeah!!!), VBDOS 4.0, VB5, VB6, TurboPascal, Borland Pascal, C++ (just enough to form a strong opinion about this language), VB.NET 7.0, Haskell (my world goes upside down), Prolog (I thought my world went upside down with Haskell??!), VHDL, ... drumroll... C# 1.0 (here it goes baby!!!), C# 1.1, C# 2.0, VB 8, C# 3.0 and lately some C# 4.0 and VB 10.

    What was your first professional programming gig?

    Ehmm... what is a gig? *goes-to-look-up* Ohh, I see.

    I wrote a dynamic geometry educational software package in VB6. Some pretty serious stuff: http://dg.osenkov.com

    If you knew then what you know now, would you have started programming?

    Hell yeah.

    If there is one thing you learned along the way that you would tell new developers, what would it be?

    People, please strive to write clean code - learn, learn, learn - always seek ways to improve, rewrite and refactor your code - be perfectionist about the code. It's the journey that's important, not the destination.

    What’s the most fun you’ve ever had... programming?

    Definitely implementing the dynamic geometry project - first feeling of control over your code, first results, first rewarding successes and implemented features - that was fun. By the way I shipped it, I can't believe it.

    Over to you

    I herewith tag Mr. Campbell, Mr. McNamara, Mr. Carpenter, Mr. Maino and Mr. Smith.

  • Kirill Osenkov

    A curious subtlety about how CLR does interface dispatch on array types


    This post is about some internal CLR magic that usually makes accessing .NET arrays through an interface up to 100 times faster (if you're an optimist; or accessing arrays up to 100 times slower in the worst case, if you're a pessimist). I'm usually very curious about how this universe works under the hood - and finding out about the internals of the CLR for me is like discovering laws of physics :) So if you ever find yourself measuring the timing of array access through generic interfaces in performance critical code, this post is probably worth taking a look at.

    Anyway, here's the story. A customer (Robert Kieboom from CityGIS) has reported an intermittent performance decrease when accessing arrays through the IList<T> interface. Specifically, for some weird reason, same codepath started executing 100 times slower after a call to some seemingly unrelated code. Huge thanks to Robert for providing this precise and exact repro (I hope he doesn't mind if I share it):

    class Program
        public static int AddValues(IList<int> list)
            int add = 0;
            /*foreach ( int value in list )
              add += value;*/
            for (int i = 0; i < list.Count; i++)
                add += list[i];
            return add;
        static void TestAddArray(int[] items)
            Stopwatch timer = Stopwatch.StartNew();
            Console.WriteLine("Array: {0} ms.", timer.ElapsedMilliseconds);
        static void TestAddIList(int[] items)
            IList<int> cl = new List<int>(items);
            Stopwatch timer = Stopwatch.StartNew();
            Console.WriteLine("IList: {0} ms.", timer.ElapsedMilliseconds);
        static void Main(string[] args)
            int[] items = new int[1000000];
            for (int i = 0; i < items.Length; i++)
                items[i] = i - (items.Length / 2);
            TestAddArray(items); // fast
            TestAddArray(items); // fast
            TestAddIList(items); // something weird happens??!
            TestAddArray(items); // slow
            TestAddArray(items); // slow

    Here's what this program does. First we call TestAddArray a couple of times that uses an array disguised as an IList<int> - both calls are very fast, each about 108ms on my machine.

    Then we call TestAddIList that uses a List<int> disguised as an IList<int> - this call is even faster, 21 ms.

    Now - watch this - both subsequent calls to TestAddArray take 3638ms and 3677ms respectively - about 33 times slower!

    If you remove the call to TestAddIList(items), all four calls will be equally fast. What's going on? The same codepath becomes slower after some random call? We were puzzled. After Eric couldn't find any explanation for this from the C# compiler standpoint (we were producing correct IL), it was clear that the issue is deeper than IL and has to do with the execution engine itself. So I routed the bug to the CLR team - and pretty soon we had a reply from them.

    It turns out, CLR has some optimizations on how they do virtual dispatch on a generic interface, if the runtime type is an array. As I'm afraid to talk nonsense outside of my area of expertise, I'll just quote the bug description from the CLR folks:

    The problem is rooted in the fact that IList<T>::get_Item is an interface invocation on an array type (in the example an int[]).  Because arrays are very common, and interface dispatch through interfaces like IList<T> is uncommon, the runtime saves significant space by having special logic for dispatching through generic interfaces like these. 

    Our interface dispatch logic has an important optimization where FOR EACH CALL SITE, it checks one particular target explicitly, and if that fails, does slower hash table lookup, and if that lookup fails, fall back to an  expensive lookup.  Thus for calls sites that tend to go to one destination it is very fast, and call sites that go to many targets, you get good, but not as great performance.

    As it turns out, there is a problem in the special logic for dispatch logic for arrays (but not for ordinary interfaces) that cause the hash table lookup to fail.    What this means is either interface dispatch FOR ARRAY TYPES  is very fast (if the explict check works), or very slow (when it does not). 

    In the fast case, the interface calls to 'get_Item and get_count' (used in the [] operator) call with at 'this' pointer of int[] first, which means this is the value that is used for the 'fast check'.   The second time we call it with List<int>, not int[]. However since there is no bug associated with normal (not array) dispatch, everything works well.

    However when the benchmarks are reversed, List<int> is the first 'this' that dispatches through 'get_Item' and 'get_Count' call sites so they get the benefit of the fast check. When the first benchmark is run, it now dispatches using an int[] as the 'this' pointer, and because of the bug, the hash table lookup always fails and so only the very slow lookup mechanism is used.


    1) In performance critical code, ensure that if arrays are passes as IList<T> (or super Interfaces), that T[] is always called first.  There is a good chance that this approach is impractical, but if there are only a few such critical paths then it is possibly useful. 

    2) Always use List<T> where you would have used T[]. By the way, if you can avoid using Collection<T> that is probably a good thing, as it is a wrapper that in most cases does not add much value.

    3) In any performance critical paths, special case the array case with code like

                 method(IList<T> aIList)


                        T[] asArray = aList as T[]

                        if (asArray != null)


                                  // logic on arrays




                                  // logic on aList


    Option 3 has the advantage that it is SIGNIFICANTLY faster than interface dispatch will ever be because the array check is done only once (and then lots of operations are done), rather than effectively being done on every interface invocation. I would not recommend any work-around that does not have some positive effect on your code base.  I could believe that options 2 and 3 MAY be a superior solution in the long term anyway, so I would suggest investigating them.

    This explains why calling arrays through generic interfaces might under certain conditions become considerably slower (or, alternatively, why these calls are in many cases considerably faster than the general case). In any case, virtual dispatch is slower than calling array members directly, so if you have a chance, just work with arrays directly or use generic collections like List<T>.

    Regarding the future fate of this particular CLR optimization failing in some cases, we are considering whether we shall fix it for the next release of the .NET framework. However, as always: no promises here folks! If you do hit this problem (which is pretty unlikely), just use the workarounds for now - thankfully they are easy and reliable.

    I hope this was insightful and many folks have found this interesting - I'd like to give credit to everyone who reported, reproduced and investigated this issue (including, among others, Robert Kieboom, Arie Leeuwesteijn, Erik Meijer, Eric Lippert, Vance Morrison and Jan Kotas) - and I hope they don't mind me sharing this with the rest of the .NET world.

  • Kirill Osenkov

    Live Geometry with Silverlight 2


    I'm happy to announce a project I started on CodePlex: http://codeplex.com/DynamicGeometry

    Live preview at: http://geometry.osenkov.com

    Dynamic geometry

    In a nutshell, it's an interactive designer for ruler-and-compass constructions - it lets you plot points, connect them with lines, construct circles, intersection points, parallel lines - everything that you usually do with "plain" geometry. But then the magic part comes in - you can drag the points and the entire construction is recalculated immediately - this is why they call it dynamic geometry. The way it works is the program doesn't store the coordinates of figures - instead, it stores the algorithm of their construction. Every time you move any of the points, the entire construction is being reconstructed on-the-fly given the new coordinates of the point using the algorithm stored in memory - that's why I like calling it "CAD with lazy evaluation".

    The actual program available online

    The application is deployed live at http://geometry.osenkov.com - you'll need Silverlight 2 Beta 2 to view this site. At first, the UI might seem not very intuitive for you - it takes a while to figure out how to use the toolbar and construct tools - but I hope an interested reader can tackle this challenge. The trick is to know the rules how the toolbox buttons work. Plotting points is easy - just select the point tool and click anywhere. Dragging points is easy too - select the arrow (cursor) tool and drag the points. Now, to construct a segment between two points, you select the segment tool, click (and release) the first point, and then click (and release) the second point. See, you specify that this segment depends on two points - every time you drag one of the points using the Drag tool, the segment will update itself accordingly.

    The source code is available online too!

    I used Visual Studio 2008, C# 3.0 and Silverlight 2 Beta 2. I have to say, those are awesome technologies and a pleasure to use (at least for me).

    I split the code into several projects. DynamicGeometry is a class library that provides a framework for defining figures and their interaction. My goal was to allow for extensibility - so that you can just plug in a new kind of a figure, and automatically consume all the centralized goodies - serialization, painting, dependency tracking, transaction system, interaction, etc. Now, the code for almost every figure fits in one screen of text - so I rarely have to scroll when editing these files. SilverlightDG is the Silverlight 2 front-end - UI. DG is the WPF front-end. I try to maintain two projects - WPF and Silverlight - to build from the same sources. So far so good :)

    Implementation details

    The project is still in its early Mickey Mouse stages and only some very basic functionality is there - however, there is already something that is worth noting:

    • an extensible type hierarchy to model geometric figures - all starts with IFigure and FigureBase
    • a State and Strategy pattern implementations to enable interactive mouse input from the user - different toolbox buttons provide different strategies to handle mouse input
    • a pretty decent transaction system, which enables nested transactions, Rollback, delayed execution, record/playback and, of course, unlimited Undo/Redo
    • a dependency tracking mechanism that allows to express dependencies between figures (i.e. a midpoint depends on two existing points) - I employ Topological Sorting to sort the figure DAG by the dependencies - we first want to recalculate the basic figures, and then their dependents
    • a toolbar that took me many painful hours to design - I highly respect UI designers and think that getting the UI right is more difficult than to implement the rest of the actual functionality. I hope I didn't do too bad, although I'm definitely not a great UI designer.

    Silverlight 2 Beta 2

    Most probably by this time you already know what Silverlight is all about - if not, www.silverlight.net is a great place to start. I personally believe that Silverlight is our liberation from the old web story and am very optimistic about this technology. While Silverlight 1.0 only allowed JavaScript programmability, Silverlight 2 gives you full experience of .NET and managed languages - C#, VB, IronPython, IronRuby and others (did I forget something?). I think that Silverlight's advantage over Adobe technologies is this rich programming model - the fact that you can use .NET and C# 3.0 for the web client just makes me so excited about this.

    Targeting both Silverlight and WPF from same source

    If you look closer at my source code, you will notice that I have two projects - the WPF one and the Silverlight one - including the same .cs files. Because WPF and Silverlight use different CLR and runtimes, the project format is incompatible - i.e. I can't include a Silverlight class library in a WPF application. Workaround is relatively easy - just create two separate .csproj files in the same folder, that reference the same sources - and you're fine.


    By the way, CodePlex is a quite awesome site for open-source projects - it provides version control (I love TFS and think it's excellent), item tracking, forums, wiki, downloads, stats - all for free and without annoying ads. Codeplex is one of the products that still make me proud to be working at Microsoft.

    DG 1.0 - the inspiration

    The project that I'm describing in this post is actually based on an earlier one that I've implemented 7-8 years earlier - here's a blog post about my original dynamic geometry project: http://kirillosenkov.blogspot.com/2007/12/dg-10-free-dynamic-geometry-software.html

    I'd like to reach parity with the first version - however this is going to be tricky, because it has a lot of different goodies - from analytical expressions to ability to create hyperlinks and compose complex documents. Interestingly enough, I implemented DG 1.0 using VB6 - and though I think VB6 was an awesome environment at that time (I think it's Edit-and-Continue is still better than the one we have in Visual Studio 2008), I feel so much better and more comfortable using C# 3.0 - it just lets me express stuff about my program that I want to express in the way that I want.

    I hope some of you have found this interesting. I'd be happy to hear any feedback or advice on this.


Page 6 of 7 (164 items) «34567