• Kirill Osenkov

    A one-line program to count lines of code

    • 10 Comments

    I wanted to sum the total lines of code in files in a given folder. I thought that writing my own program to do this would be faster than looking for it on the internet, so here's what I came up with (1 line broken into 7 lines to fit into your blog reader):

    using System;
    using System.Linq;
    using System.IO;
    
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(
                Directory.GetFiles(
                    Environment.CurrentDirectory, "*", 
                    string.Join(" ", args).Contains("/s") 
                        ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly)
                .Select(f => File.ReadAllLines(f).Length)
                .Sum());
        }
    }

    Just name the executable loc.exe and put it into your PATH - you're good to go. Input "loc" in the command prompt to get the total number of LOC in the current directory, and "loc /s" to do recursive search.

    Please note that the way I wrote this program is not very good for debugging, because you can't put a breakpoint on substatements (technically speaking, this program consists of only one statement). In production code, I would rather write something like this:

    string path = Environment.CurrentDirectory;
    string pattern = "*";
    string commandLine = string.Join(" ", args);
    SearchOption searchRecursively = commandLine.Contains("/s") 
                ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly;
    string[] files = Directory.GetFiles(path, pattern, searchRecursively); IEnumerable<int> lengths = files.Select(f => File.ReadAllLines(f).Length); int totalLOC = lengths.Sum();
    Console.WriteLine(totalLOC);

    because it better conveys the step-by-step actions and allows to put a breakpoint on any step. However, for my little program, my sense of style guided me to use the former notation.

    As another side note, the "production version" doesn't use var for type inference. I think it improves readability in this case.

  • Kirill Osenkov

    Links about Visual Studio 2010 and C# 4.0

    • 8 Comments

    A while ago we announced Visual Studio 2010 and C# 4.0. In case you'd like to catch up and read articles or watch videos about the upcoming new features, I wanted to gather some links here.

    PDC

    TechEd EMEA 2008 (Barcelona)

    C# 4.0 Language

    As you probably know by now, C# 4 is mostly oriented on interoperability: better COM interop, better interop with dynamic languages (like Python), better interop with the browser (Silverlight, JavaScript) and better Office interop. Briefly, the new language features are:

    1. Dynamic - "late binding" in C# - a way to delay binding and code execution to runtime
    2. Variance - generic co- and contravariance
    3. Named and Optional parameters - better parity with VB, no more "ref missing", more readable code with named parameters
    4. COM interop support - omitting "ref" for COM calls, NoPIA (embedding Primary Interop Assemblies directly into your assembly to ease deployment) and some others

    What to read about C# 4 language features? Well, first and foremost, there is always Charlie's blog that accumulates all C# news:

    C# compiler developer's blogs are a terrific technical resource - highly recommended:

    C# IDE improvements

    As far as I can tell, we don't yet have a comprehensive resource about the new features in the C# IDE. Well, let me spill the beans and briefly mention them here, before I dive into the details in my upcoming posts.

    1. Call Hierarchy - for a method or property, allows to view Calls To (callers), Calls From (callees), Overrides (if applicable) and Implementations (if applicable).
    2. Quick Symbol Search - a very simple dialog with a textbox and a listbox that allows you to type a symbol's name or part of it (any type or member) and hit Enter to navigate to it quickly. Ctrl+, is the shortcut that you can reconfigure if necessary.
    3. Generate From Usage - just like we had Generate Method Stub by pressing Ctrl+. on a SmartTag under undefined method call, now you can generate a new class for an undefined class, a constructor for an unknown constructor, a property, an enum, etc.
    4. Highlight References - when a cursor is on a symbol, all the references to this symbol are highlighted in the currently opened file.
    5. Many other miscellaneous improvements, such as a better background compiler ("Live Squiggles"), XML doc rename refactoring, etc.

    .NET Framework 4.0

    Justin van Patten has a great overview of upcoming new features in .NET 4.0. It is worth mentioning, that 4.0 will have a new CLR version 4.0 (2.0, 3.0 and 3.5 were all working on the CLR version 2.0). Most notably, .NET will introduce tuples, code contracts, parallel extensions, variance annotations and a whole lot more.

    Other Visual Studio improvements

    Of course, I'm only scratching the surface of what the next Visual Studio will bring - many teams across the division worked hard and implemented a lot of cool stuff (e.g. Architecture Explorer, Sequence Diagrams, etc). I'm only mentioning some features that I personally find really interesting as a member of the C# team.

    • New Editor - the Visual Studio code editor has been fully rewritten from scratch using C#, WPF and MEF. It shims the old editor interfaces and pretends to behave just like the old editor, but it also exposes a nice new managed API that you can program against. A huge advantage of the New Editor is flexibility and extensibility - you can do really weird things with it, given the fact that it's WPF and allows any UIElement to be placed on it.
    • WPF Shell - the Visual Studio UI (Shell, menus, toolwindows, toolbars) has been rewritten using WPF as well, to allow customization. Not all of the UI has been converted, so there are some "islands of hWnd" floating in WinFormsHost, which will be converted in future versions.
    • Back-in-time debugger - the Debugger team introduced a new Historical Debugger, which records the program's execution and allows you to rollback the program's state (callstack, variables, etc.) to an earlier point in time.

    In my next blog posts, I will start talking more about the upcoming new features (especially the Call Hierarchy one - since I was the one who tested it).

    kick it on DotNetKicks.com

  • Kirill Osenkov

    NDepend

    • 4 Comments

    Static analysis tools allow us to measure code quality and better understand the design and architecture of software. There are things about source code that are not visible to eye in our day-to-day work: dependencies between components, cyclomatic complexity of methods, hidden and indirect dependencies. There are dangers that a method or a type will become overly complex and induce maintenance nightmare without us even noticing it.

    Today’s post is about NDepend, a static analysis tool developed by Patrick Smacchia. NDepend measures software quality by extracting metrics, gathering and reporting statistics and visualizing architecture.

    Here’s what NDepend does:

    • It analyzes .NET assemblies and provides all thinkable information about them both in a report and interactively
    • It reports various metrics about assemblies, namespaces, types and members
    • It treats the source code as a database and provides a powerful Code Querying Language (CQL) that allows you to write custom SQL-like queries against your codebase
    • It allows navigating the codebase using links, references, usage graphs, call graphs, etc.
    • Unlike some other static analysis tools, NDepend provides rich interactive visualizations of your architecture, that allow you to visually understand layering, distribution of code, code sizes, relationships and dependencies
    • NDepend has an extensive set of Queries and Constraints (compare with FxCop rules) that run against compiled binaries and provide suggestions on how to improve your code (“Top 10 complex methods that need refactoring”, “All methods that shouldn’t be public and can be made internal”, etc.)
    • You can run all these queries to get a full analysis report about your software

    Since both Patrick and I like static analysis and are interested in measuring software quality, Patrick suggested that I try out NDepend on one of my projects and share my experiences – and I gladly agreed. I’ve heard about NDepend long ago, and even before I joined Microsoft I blogged about static analysis tools and publicly promised to take a look at NDepend someday, which I felt was a very interesting tool with unique capabilities. Well, since Patrick recently contacted me himself :), there can be no better opportunity to download NDepend and give it a try. Here I’ll provide my first impressions.

    I went to www.ndepend.com and started to look around. I liked the site and the fact that there were screenshots and some nice 3-min overview videos. Seconds after clicking on a big download button, NDepend was on my desktop. It comes with two executables – command line NDepend.Console.exe (for the build automation) and the UI VisualNDepend.exe. Of course, I started the VisualNDepend first and it presented me with a nice start page:

    clip_image002

    The next thing is to create a new NDepend project and add the assemblies to it that you want to analyze.

    1. First thing that you have to do is to create a project and add the .dlls. Make sure the .pdb files are next to it, so that NDepend can locate the source files and provide additional goodness. If .pdb files point to a non-existing source location, you can use the source file rebasing feature in Project Properties -> Analysis.
    2. I loaded my 6-assembly StructuredEditor project and it automatically displayed the dependent .NET framework .dlls:
      clip_image004
    3. The next step was easy to find as well: the Big Green Run Button to start the analysis:
      clip_image006

    After the analysis ran, it displayed a very comprehensive report about my project. Even glancing over this report already provided a lot of useful information to me:

    1. I was pleased to see that my overall code quality is pretty good :)
    2. It was good to know some quick metrics (number of lines of code, comments, IL instructions, assemblies, types, abstract classed, interfaces, value types, exception classes, attribute classes, delegate classes, percentage of public types and methods, etc.)
    3. It provided a nice visual view of all assemblies, types and members sorted by code size:
      image
    4. I was pleased to see my code in the green area of Abstractness vs. Instability (green is good!):
      image
    5. NDepend visualized the assembly reference graph for me:
      image
      This is really great, since I had this graph in mind before, but I never saw a tool draw this graph for me automatically – it’s like it was reading the architecture from my thoughts.
    6. Finally, it provided a long list of CQL Queries and Constraints (you have to see for yourself, I won’t post it here)
    7. Just reading the rules can be as useful as reading Framework Design Guidelines. Just for example, I learned this:
      // <Name>Attribute classes should be sealed</Name>
      WARN IF Count > 0 IN SELECT TYPES WHERE IsAttributeClass
      AND !IsSealed AND !IsAbstract AND IsPublic AND !IsInFrameworkAssembly
      // The .NET Framework class library provides methods for retrieving custom attributes. 
      // By default, these methods search the attribute inheritance hierarchy; for example 
      // System.Attribute.GetCustomAttribute searches for the specified attribute type, or 
      // any attribute type that extends the specified attribute type. Sealing the attribute 
      // eliminates the search through the inheritance hierarchy, and can improve performance.

    After examining the report, I came back to the main application to find that it presents all the information contained in the report, but interactive:

    image

    I have to say that the UI looks great, but at first seems rather overwhelming and I felt like sitting in a Boeing cockpit. But hey – you can’t expect a complex static analysis tool to have a notepad-like interface – it has to expose its features somehow. Surprisingly, you get used to the UI really quickly and I found my way around really easily.

    All you have are the following things:

    1. Class Browser – displays all assemblies, namespaces, types and members
    2. Metrics – displays a visual map on various levels (types, members, namespaces, etc)
    3. Dependency matrix – helps you analyze dependencies between two code elements
    4. Dependency graph – draws a DAG (remember I was writing about DAG and Topological Sorting?)
    5. CQL Queries – a library of predefined queries over your codebase (and you can write your own!)
    6. Info – information about the selection (updated with focus changes)

    Hopefully this can provide a basic overview of NDepend, what it’s for and what it does, as well as its main features. I was very impressed with the tool, really liked it and looking forward to use it more for better architectural insights and better code quality.

  • Kirill Osenkov

    Book review: Essential C# 3.0 by Mark Michaelis

    • 1 Comments

    Sometimes I'm being asked what book I'd recommend for learning C#. Until recently, I was hesitant and didn't know what to answer. It depends, among other things, on the following factors:

    • Whether the learner has existing programming background
    • Whether he or she wants to learn the language, or the .NET framework as well
    • Whether the learner is already familiar with C# 1.0 (2.0) and wants to learn primarily about C# 3.0 features, such as LINQ

    Well, now I have a good answer for the majority of the cases. Mark Michaelis, a C# MVP, wrote a great book called Essential C# 3.0. It requires pretty much no programming background, and beginners can ramp up easily. One immediate thing about Essential C# 3.0 is that the book is very easy to read. You literally swallow page after page freely and effortlessly. This is one of the major reasons why I'm recommending this book to beginners who ask me where to start.

    The first several chapters introduce basic programming concepts - variables, statements, control flow, methods, and continue gracefully to the .NET type system, classes and objects. It's nice to know that even when you're learning the basics, you're learning the latest version of a modern language and an industry standard.

    The book continues to do a great job at the intermediate level, with chapters about classes, inheritance, interfaces and value types. A great asset is a special chapter called Well-Formed Types, which collects a series of useful information and best practices, such as overriding object members, referencing other assemblies, generating XML comments, finalization and garbage collection etc.

    C# 2.0 features are covered well in a dedicated chapter about Generics and also in other parts of the book (iterators and yield return, anonymous methods).

    Of course, given the book's title, C# 3.0 features and LINQ are introduced and explained well in the following chapters: Delegates and Lambda Expressions, Collection Interfaces with Standard Query Operators, Query Expressions. Several other chapters show more aspects of using the C# language: Events, Exception Handling, Building Custom Collections.

    Finally, an advanced C# user will find a lot of interesting things here as well. I was particularly attracted by a great chapter on Reflection and Attributes, and the two "icing-on-the-cake" chapters: Multithreading and Multithreading patterns. I've learned a lot of new things for myself here. Also, throughout the rest of the book, special "Advanced Topic" fragments provide a lot of value to an experienced reader. For example, I've learned that the new operator generates newobj IL instruction for reference types and initobj for value types - I didn't think about this distinction before.

    To sum up, a great book about C# for every reader - beginner, intermediate, advanced. It's not only about the language - CLR is also covered pretty well (IL, garbage collection, JIT, AppDomains, reflection, threading). This book doesn't cover any of the applied libraries (such as ASP.NET, ADO.NET, WPF or Silverlight), but even without it, it's already more than 700 pages thick. I also really love the formatting, fonts, spacing and illustrations - very ".NET-style".

    A great read and highly recommended.

  • Kirill Osenkov

    Live Geometry project updated to Silverlight 2 final release

    • 3 Comments

    Finally I had time again to continue working on my Live Geometry project on CodePlex. While I was busy with other stuff, Silverlight 2 has officially released and the rest of the world has already upgraded from Beta 2 to the release version. Now I'm catching up: I've updated the source code for both Silverlight (online) and WPF (desktop) editions, so it should build fine. As usual, you can preview the Silverlight (online) edition at:

    http://geometry.osenkov.com

    image

    As you can see, I started implementing the coordinate grid and axes. I'm thinking to implement pan/zoom functionality by dragging the coordinate system and using the mouse wheel. Also I need to have a toggle button to show/hide the grid/axes, and numbers (-2, -1, 0, 1, 2, ...) to show up along the coordinate axes. Then I can start working on plotting function graphs and charts.

    I've also updated the desktop (WPF) version, although I didn't have time to make it look beautiful:

    image

    The beauty of WPF is composability. The toolbar icons use the same logic for drawing points/lines/circles as the main canvas. This way the icons are vector graphics and can be scaled if I want to make the toolbar buttons larger in the future. Also, the buttons in the toolbar are generated automatically using Reflection and data-binding, so when I add a class for a new tool, I don't need to do anything else - reflection will pick it up and create a toolbar button for it using WPF databinding.

    Also the good news is that the WPF edition shares the same source code with the Silverlight version. Only at two or three places in code I had to use the #if SILVERLIGHT compiler directives, mostly because WPF VisualTreeHelper.HitTest is called VisualTreeHelper.FindElementsInHostCoordinates in Silverlight (because of slightly different semantics). Andy Beaulieu blogs about this breaking change here: http://www.andybeaulieu.com/Default.aspx?tabid=67&EntryID=95

  • Kirill Osenkov

    Kirill's Visual Studio Tips

    • 10 Comments

    I decided to compile a small list of Visual Studio features that I use a lot.

    Code Editor

    1. Code Snippets for ‘prop’, ‘foreach’, ‘if’, ‘class’, ‘interface’, Tab-Tab completion
    2. Surround with (snippets) - try, if
    3. Ctrl+K,D – FormatDocument
    4. Ctrl+Left/Right arrows in code editor – jump between words faster
    5. Ctrl+K,C – Ctrl+K,U – comment/uncomment selection
    6. Dynamic Snippets in the XML Editor - I bet a lot of people don't know about this one

    Navigation

    1. F12 (Go To Definition) – position the caret on an identifier and hit F12
    2. Shift+F12 (Find All References) – then hit F8 to step through the results in the Find Symbol results window
    3. Click the “back” button on the Microsoft Explorer mouse with your thumb navigates back (under C# profile)
    4. Right-click on a document Tab – Copy Full Path, Open Containing Folder

    Refactoring

    1. Position the caret on an identifier and hit F2 (Refactor -> Rename)
    2. Extract Method (Context Menu)

    Code generation

    1. Generate Method Stub (Smart Tag) - Ctrl+. and enter
    2. WindowsContextMenuButton + O + ENTER + A – quick way to organize, remove and sort usings

    Debugging

    1. Debug -> Exceptions (allows breaking on a first-chance exception)
    2. Right-click on a breakpoint brings up a context-menu which not everyone knows about ;-)
    3. Tools -> Options -> Debugging -> Enable just my code
    4. Attach to process -> Cross Platform Debugging (Managed, Native)
    5. .load sos, !clrstack, !dumpheap, !gcroot, etc.

    Solution Explorer

    1. Tools -> Options -> Projects and Solutions -> Track Active Item in Solution Explorer
    2. Right-click on an item in Solution Explorer – Open Folder in Windows Explorer
    3. Right-click on a project -> Unload -> edit .csproj file in XML

    Macros and automation:

    1. Coding productivity: macros, shortcuts and snippets
    2. How to map a Visual Studio macro to a keyboard shortcut?

    P.S. Sara Ford has much much more in her famous Did you know series.

    kick it on DotNetKicks.com
  • Kirill Osenkov

    An algorithm to solve a matrix game

    • 2 Comments

    There is a nice game called Enlight from FeejoSoft. I have it on my PocketPC phone and for the life of me I just couldn't get past level 33. So yesterday I finally sat down and implemented an algorithm to solve this game automatically.

    Of course, I used C# and Visual Studio (as a tester, I used our latest internal build of Visual Studio 2010, although it will run just as fine on 2008 as well).

    The game

    The rules are really simple. There is a boolean matrix 10x10, and you can invert cells. When you invert a cell, all its neighbors (usually four) are inverted with it. Given this operation, you are required to nullify the matrix (turn off all the lights):

    FeejoSoft Enlight

    The game is freeware and you can use your favorite search engine to find and download it (e.g. from freewareppc.com, no responsibility for links). If you'd like to run it in an emulator, here are the steps:

    1. On a machine with ActiveSync, install the game executable.
    2. Use dir /s setup*.cab to find the .cab file (usually it will unpack to \Windows\WindowsMobile\TapIsland)
    3. Open Visual Studio and create a new SmartPhone application
    4. Run (F5) to launch the emulator
    5. In the emulator menu, click File -> Configure -> Shared path and specify the path to the .cab file from step 2
    6. Open File Explorer in the emulator and go to the Storage card folder
    7. Install the game by running the .cab file from there
    8. Run the game!

    The algorithm

    The idea I used is commonly used for such problems (e.g. eight queens puzzle (http://en.wikipedia.org/wiki/8_queens) is usually solved using depth-first search with backtracking). I encourage you to visit this wikipedia article, it is very interesting read.

    We'll visit every cell column by column. Every cell will have it's next cell which is down in the next row, or the first cell of the next column to the right. For brute force, you would recurse to visit the next cell, then come back, toggle the cell and recurse again. This way every cell will be tried in both combinations. However this will yield 2^100 iterations. We can considerably trim down the search if we don't recurse for knowingly invalid boards. This way we prune the "phase space" of the problem significantly. Here's the complete code for the solver:

    namespace MatrixGame
    {
        public class Solver
        {
            private Field sourceField;
            private Field solution = new Field();
            private Field currentSolution = new Field();
    
            private Solver(Field field)
            {
                this.sourceField = new Field(field);
            }
    
            public static Field FindSolution(Field field)
            {
                Solver solver = new Solver(field);
                solver.Search(FieldCoordinates.StartingPoint);
                return solver.solution;
            }
    
            private int steps = 0;
    
            /// <summary>
            /// Main algorithm
            /// </summary>
            void Search(FieldCoordinates current)
            {
                steps++;
    
                // we traverse the cells column by column, 
                // left to right, top to bottom
                var next = current.GetNextCell(sourceField);
    
                // we've reached the end of the field
                if (next.Column == sourceField.Width)
                {
                    if (LeftCellIsEmpty(current))
                    {
                        CheckSolutionCandidate();
                    }
                    return;
                }
    
                // pruning - only continue searching if this is a plausible board so far
                if (current.Column == 0 || LeftCellIsEmpty(current))
                {
                    Search(next);
                }
    
                // let's invert the current cell and search again
                currentSolution.InvertCell(current);
                sourceField.Invert5Cross(current);
    
                if (current.Column == 0 || LeftCellIsEmpty(current))
                {
                    Search(next);
                }
            }
    
            /// <summary>
            /// If the cell to the left from current is not empty,
            /// the current board cannot be a solution - 
            /// no point to continue down this branch.
            /// Prune this and backtrack to another branch.
            /// </summary>
            bool LeftCellIsEmpty(FieldCoordinates current)
            {
                return !sourceField[current.Row, current.Column - 1];
            }
    
            /// <summary>
            /// We've traversed all elements - 
            /// let's check if what we've come up with is actually a solution
            /// we checked almost all cells in LeftCellIsEmpty
            /// let's just check the last (rightmost) column
            /// </summary>
            void CheckSolutionCandidate()
            {
                if (sourceField.GetColumn(sourceField.Width - 1).IsNull())
                {
                    SolutionFound();
                }
            }
    
            /// <summary>
            /// We got it! Do something with the solution!
            /// </summary>
            void SolutionFound()
            {
                solution = new Field(currentSolution);
            }
        }
    }

    The solution program

    Here's what I came up with. On the left board, you set up the level like you see in the game (the picture shows The Notorious Level 33!). Left-click to toggle a cell, drag the mouse to "draw" adjacent cells.

    On the right, you have a chance to play this level, however, the orange dots show which cells you need to click to turn off the lights. If you feel honest, you can turn off the orange dots using the checkbox.

    image

    You can download the full source code here:

    http://code.msdn.microsoft.com/EnlightGameSolver

    The research

    First of all, I was amazed how efficient the pruning is. The fact that we don't check for knowingly invalid boards brings us down from 2^100 to just 93183 times the Search method is executed. On my machine this is so fast, that I manage to recalculate the hint solution in real time, as the user clicks in the left board.

    Also, building this project raised more questions than it answered. I'm sure there is a whole science behind this seemingly simple domain - Conway's Game of Life is just around the corner, not to mention MineSweeper, cellular automata, even-odd-logic, 0,1-matrices, constraint programming, etc. etc. It is fascinating how complex solutions exist to invert a single pixel on the board:

    image

    What is the pattern here? Is the solution always unique? Why is there no solution for some configurations? What if you apply the solution to the solution? What about unlimited two-dimensional plane? Which single-cell source board have a solution? Which don't? What is special about a board consisting of such cells? What is the ratio of yellow cells for such boards? All these are interesting questions I'd love to think about, if only someone would do my main job for me :)

    The challenge

    As usual, better/faster/shorter solutions are welcome. I'd be especially curious to see a solution in F# or Prolog. Feel free to reuse my UI code.

  • Kirill Osenkov

    Too much type information, or welcome back System.Object and boxing

    • 22 Comments

    We all know that generics are good - they promote code reuse, static type checking by the compiler, increase runtime performance, allow more flexible OOP designs, lay the foundation for LINQ, help the IDE to provide more helpful IntelliSense and have tons and tons of other vital advantages. "var" is another good feature, which (unlike "object"), also helps to preserve full static type information.

    However I hit a rare case recently where I had too much static type information about my code, so I had to use System.Object (and boxing) to get the desired effect. I had a method that used reflection to set a property on a type, similar to this:

    static void SetProperty(object f)
    {
        Type type = f.GetType();
        PropertyInfo property = type.GetProperty("Bar");
        property.SetValue(f, 1, new object[0]);
    }

    I also had a struct like this:

    struct Foo
    {
        public int Bar { get; set; }
    }

    Now, I tried to set the Bar property on an instance of the struct:

    static void Main(string[] args)
    {
        var f = new Foo();
        SetProperty(f);
        Foo foo = (Foo)f;
        Console.WriteLine(foo.Bar);
    }

    It didn't work! It printed out 0! I was puzzled. And then I realized what is happening. Since Foo is a struct, and f (thanks to var!) is also statically known to be a struct, the compiler passes a copy of the struct by value to the SetProperty method. This copy is modified, but the original f is not.

    One simple change and it started working fine:

    static void Main(string[] args)
    {
        object f = new Foo();
        SetProperty(f);
        Foo foo = (Foo)f;
        Console.WriteLine(foo.Bar);
    }

    I changed var to object, the struct was boxed into an object on the heap, the reference to this same object was passed to the SetProperty method, method set the property on the boxed instance, and (Foo) unboxed the same modified instance - the code now prints out 1 and everything is OK again.

    "var" provided too much type information to the compiler - it avoided boxing, and knew that the variable is a struct, so I lost the modified value. After casting to object, we hid the extra information from the compiler and got the uniform behavior for both value types and reference types.

    In my original code where I encountered this peculiar behavior (a custom deserializer that reads XML and uses reflection to set properties on objects), I was too focused on working with all types so I forgot that those can be value types as well. Since I had everything strongly typed with generics, type inference, vars and other modern goodness, the kind hardworking compiler preserved all the information for me and avoided boxing where I was expecting to get reference type behavior. Thankfully, unit-tests revealed the error 10 minutes after it was introduced (I definitely need to post about the usefulness of unit-tests and TDD in the future), so it was a quick fix to box a type into object before filling its properties.

    It was an amusing experience.

  • Kirill Osenkov

    More comments on generics

    • 9 Comments

    Here are some more unsorted thoughts on generics to continue this post (which has some interesting comments too).

    Preserving types with generic type inference

    To reiterate, instead of having a method like:

    void Process(object instance)
    {
    }

    one can write:

    void Process<T>(T instance)
    {
    }

    and thus preserve the static type information about the parameter. One can still call this method like Process(stuff) without having to explicitly specify the generic type parameter: Process<StuffType>(stuff).

    Casting by example

    This technique can be useful (among others) in one specific situation - where we can't specify the generic type parameter, because it is of an anonymous type. We can still call a generic method and "specify" the anonymous type parameter using type inference (more precisely, the smart C# compiler will do it for us). I blogged more about this here: http://kirillosenkov.blogspot.com/2008/01/how-to-create-generic-list-of-anonymous.html

    Static data is not shared among constructed generic types

    Jacob Carpenter reminds us of this important (and useful!) fact: http://jacobcarpenter.wordpress.com/2008/07/21/c-reminder-of-the-day/

    You can constrain a generic type parameter on a type to be its derived type

    Darren Clark mentions another nice trick with generics:

    public class MyBase<T> where T: MyBase<T>

    Here, we essentially limit T to be a type derived from MyBase<T>.

    Limits of type inference

    C# 3.0 compiler is much smarter than the 2.0 when it comes to type inference. Jon Skeet described this really well in his book C# In Depth. However there is still room for future improvement - the type inferencing engine can be made even smarter to deduce types from the generic constraints. I hit this limitation recently when I tried to implement the topological sorting algorithm as a mix-in (using extension methods and generic type inference). Paste this program and it will compile fine, however if you remove the generic arguments in the call to TopologicalSort in Main(), it will fail to compile:

    using System.Collections.Generic;
    
    // Generalized algorithms and data-structures - I want to reuse them by "mixing-in"
    interface IDependencyNode<TNode, TNodeList>
        where TNode : IDependencyNode<TNode, TNodeList>
        where TNodeList : IDependencyList<TNode>
    {
        // TNodeList Dependencies { get; } <-- that's why I need TNodeList
    }
    
    interface IDependencyList<TNode> : IEnumerable<TNode> { }
    
    static class Extensions
    {
        public static IEnumerable<TNode> TopologicalSort<TNode, TNodeList>(this TNodeList nodes)
            where TNode : IDependencyNode<TNode, TNodeList>
            where TNodeList : IDependencyList<TNode>
        {
            return null; // algorithm goes here
        }
    }
    
    // Mixing-in to my concrete world of Figures and FigureLists
    // I basically get the implementation of topological sort for free
    // without inheriting from any classes
    class Figure : IDependencyNode<Figure, FigureList> { }
    
    class FigureList : List<Figure>, IDependencyList<Figure> { }
    
    class Program
    {
        static void Main(string[] args)
        {
            FigureList list = new FigureList();
            // wouldn't it be sweet if we could infer the type arguments here??
            // list.TopologicalSort(); // doesn't compile
            list.TopologicalSort<Figure, FigureList>(); // compiles fine
        }
    }

    In this case, one could potentially figure out the types Figure and FigureList (at least, it is doable by a human :), but then the type inferencing algorithm becomes even more complex than it is now (in fact, I suspect that it would become as powerful as a typical Prolog solver because it would require unification). The C# compiler team has certainly higher priority tasks now than implementing Prolog into the C# compiler.

    Finally, when I look at the code above, it is too complex, difficult to understand and clumsy. One shouldn't pay such a high price for the flexibility of mix-ins. There is a much more simple and elegant solution to the problem which I hope to come back to later.

    kick it on DotNetKicks.com
  • Kirill Osenkov

    Why a comparison of a value type with null is a warning?

    • 4 Comments

    A reader (Petar Petrov) asked me a question recently which I didn't quite know how to answer:

    Why a comparison of a value type against null is a warning? I definitely think it should be a compiler error.

    So I asked the C# compiler team and here's the explanation (please welcome today's special guest Eric Lippert):

    Why is it legal?


    It's legal because the lifted comparison operator is applicable. If you are comparing an int to null then the comparison operator that takes two int?s is applicable.

    Why is it a warning?

    It's a warning because the comparison always results in "false".

    Why is that not an error?

    Let's turn that around -- why should it be an error?  Why should any comparison which the compiler knows the answer to be an error?  That is, if you think this should be an error, then why shouldn't
    if (123 == 456)
    be an error?  Or for that matter, why shouldn't
    if (false)
    be an error?

    Three reasons why none of these things should be errors:
    First, the argument that this is work for me. The spec is complicated enough already and the implementation is divergent from the spec enough already; let's not be adding even more special cases that I can then get wrong for you.

    Second, the argument from design. By design, C# is an "enough rope" programming language -- we do not try to constrain you to writing only meaningful programs. Rather, we let you write almost any program, and then give you warnings when it looks like you might be entangling yourself in the rope we gave you. If you don't like that, choose a language that gives you less flexibility.  (This is part of the impetus behind the push towards more declarative programming languages; declarative programming languages are less likely to contain senseless commands because they consist of descriptions of how things are desired to be.)

    Third, the argument about generated code. We do not disallow statements like if (12 == 13) { whatever... } because not all code is typed in by humans. Some of it is generated by machines, and machines often follow the same rigid rules generating code that compilers do consuming it. Do we really want to put the burden upon machine-generated-code providers that they must jump through the same constant-folding hoops that the C# compiler does in order to avoid compiler errors? Do we want to make machine-generated code not only have to be syntactically and grammatically correct C# code, but also _clever_ and _well-written_ C# code? I don't think we do; I think that makes the job of both the code producer and the compiler writer harder without any corresponding gain in safety or productivity.

    Eric

  • Kirill Osenkov

    How I started to really understand generics

    • 16 Comments

    First off, a little explanation for the post title: not that I didn't understand generics before. I'd had a pretty good understanding of how they work, how they are implemented in the CLR, what facilities are provided by the compiler, etc.

    It's just that recently I started seeing some patterns and analogies that I hadn't seen before or just didn't pay enough attention to. This is a little fuzzy feeling but I'll still try and describe these insights, hopefully the reader will understand what I mean. These insights have helped me to think more clearly and have a better understanding of how to use generics in a richer way.

    For a start, let's consider a method called Process that processes command line arguments and modifies the argument list by removing the arguments that it could process:

    public ArgumentList Process(ArgumentList source)
    {
    
    }

    This design is not very good because it mutates the source variable, and I read somewhere that a method should either be immutable and return something or mutate stuff and return void, but not both at the same time. I think it was in Martin Fowler's Refactoring, but I'm not sure.

    But that's not the thing I wanted to talk about today. This design has another serious flaw: it essentially truncates its return type to be ArgumentList, so by calling this method we basically "seal" the output type to be ArgumentList:

    CustomArgumentList customList = new CustomArgumentList();
    ArgumentList processed = Process(customList);

    Even if we originally had a CustomArgumentList, once we pass it into our method, it "truncates" the type and all we have left on the output is the basic ArgumentList. We lose some type information and cannot enjoy the benefits of static typing and polymorphism at its fullest.

    So what have generics to do with all this? Well, generics allow us to "unseal" the return type of our method and to preserve the full static type information available:

    public T Process<T>(T source)
        where T : ArgumentList
    {
    }

    This way we can call Process on our CustomArgumentList and receive a CustomArgumentList back:

    CustomArgumentList customList = new CustomArgumentList();
    CustomArgumentList processed = Process(customList);

    See, type information is not sealed anymore and can freely "flow" from the input type to the output type. Generics enabled us to use polymorphism and preserve full static typing where it wasn't possible before. It might be obvious for many of you, but it was a wow-effect discovery for me. Note how we used type inference to make a call to Process<T> look exactly like a call to Process.

    Another observation here is that when you're talking about covariance and contravariance of generics, you can divide type usages into two groups: "in" and "out". Thus, we can have return type covariance and input parameter contravariance. Let's notice how "out" type usages essentially truncate the type information and limit ("seal") it to one base type. Generics are there to "unseal" this limitation and to allow for derived types to be returned without losing type information.

    I hope this wasn't too fuzzy and handwavy. It's just these little discoveries make me very excited and I'd like to share the way I see these things.

    Let's consider another insight: what's the difference between:

    1. class MyType<T> : IEquatable<T> and
    2. class MyType<T> where T: IEquatable<T>

    Update: I'd like to explain in a little bit more detail, what's going on here. In the first line, we're declaring that the type MyType<T> implements the IEquatable<T> interface (complies with the IEquatable<T> contract). Simpler said, MyType<T> is IEquatable<T> (can be compared with objects of type T). The <T> part reads "of T", which means MyType has a method Equals that accepts a parameter of type T. Hence, the first line imposes the constraint on the type MyType<T> itself.

    Now the second line imposes the constraint on the type parameter, not on the type itself. Now the type parameter that you specify must be a type that has an Equals method that accepts a parameter of type T (objects of type T can be compared with other objects of type T). I would imagine second line could be useful more often than the first line.

    Also, another line is possible:

    class MyType<T> : IEquatable<MyType<T>>

    This line would mean that MyType can be compared with itself (and, independently of this fact, use some other type T).

    Thinking about this difference made another aspect of generics clearer for me: with generics, there are always at least two different types involved: the actual type and the generic type parameters.

    Armed with this wisdom, let's move to a third mental exercise (again, it might all be simple and obvious for some readers, in this case I apologize and am very honored to have such readers). Is the following thing valid?

    class MyGenericType<T> : T
    {
    }

    It turns out it's not valid - we cannot inherit from a generic type parameter (Update: in the original post I wrote that this would lead to multiple inheritance - that's not true). This illustrates my previous points - with generics, there are always at least two different types involved, and we shouldn't mix them arbitrarily.

    Finally, to at least partially save this post from being a complete theoretical disaster, I'd like to share some code I wrote recently. Suppose you're implementing the Command design pattern and have various concrete Commands: CutCommand, CopyCommand, CloseAllDocuments command etc. Now the requirement is to support a unified exception throwing and catching functionality - every command should throw a generic exception InvalidOperationException<T>, where T is the type of the command. You might find this weird (who needs generic exceptions?) but in my real project at work we have some very clever exception classification logic, which requires that each exception should have a different type. Basically, each exception should know what type threw it. Based on this strongly typed exception source information, we do some magic to greatly simplify logging and classify exceptions into buckets based on their root cause. Anyway, here's the source code:

    class Command
    {
        public virtual void Execute() { }
    }
    
    class InvalidOperationException<T> : InvalidOperationException
        where T : Command
    {
        public InvalidOperationException(string message) : base(message) { }
        // some specific information about
        // the command type T that threw this exception
    }
    
    static class CommandExtensions
    {
        public static void ThrowInvalidOperationException<TCommand>(
            this TCommand command, string message) 
            where TCommand : Command
        {
            throw new InvalidOperationException<TCommand>(message);
        }
    }
    
    class CopyCommand : Command
    {
        public override void Execute()
        {
            // after something went wrong:
            this.ThrowInvalidOperationException("Something went wrong");
        }
    }
    
    class CutCommand : Command
    {
        public override void Execute()
        {
            // after something went wrong:
            this.ThrowInvalidOperationException("Something else went wrong");
        }
    }

    Note how two seemingly equal calls to ThrowInvalidOperationException will, in fact, throw different exceptions. The call in CopyCommand will throw an InvalidOperationException<CopyCommand>:

    image

    while the same call from CutCommand will throw an InvalidOperationException<CutCommand>:

    image

    Here (attention!) we infer the type of the command from the type of the "this" reference. If you try calling it without the "this." part, the compiler will fail to resolve the call:

    image

    So this is a situation where extension methods and generic type inference play nicely together to enable what I think is a very interesting trick. We preserve type information all the way from where an exception is thrown, "remember" the type that threw the exception, and still have full type information available when we catch and process it. A nice example of how generics help type information "flow" from one part of your code to other. This is also in line with my first example where we see how type information "flows" from input to output parameter of a method.

    Note: for those of you who spent some time with Prolog, don't you have a déjà vu of how Prolog allows values to "flow" between predicates? In Prolog every parameter to a predicate (method) can be "in" or "out" - input or output. Prolog infers which parameter is which depending on its usage elsewhere and inside the predicate body. I don't know if this analogy is valid at all, but I feel that C# compiler in a certain way does its type inference in a similar manner.

  • Kirill Osenkov

    Representing dependencies in code

    • 6 Comments

    It is more often than you would suspect that people have to deal with some sort of dependencies in an application they're developing. In this post I'll talk about some common algorithms and data-structures to work with dependencies that might turn out helpful with some real-life programming tasks.

    To better understand what kind of dependencies I'm talking about, let's consider a couple of examples first.

    Example 1: A Spreadsheet Application

    In Excel every cell can contain a formula which depends on other cells:

    image

    Here we have the value in C5 depending on values in B2 and D3. A couple of things to note:

    1. Whenever a value in B2 changes, the value in C5 is recalculated automatically based on the formula.
    2. A cell can have multiple dependencies (i.e. cells that this cell depends upon)
    3. Cells can build dependency chains and trees (whenever a cell is changed, the entire subtree is recalculated)
    4. Cells cannot have circular dependencies. If you try to create a circular dependency, Excel will show a message box and draw a nice blue arrow to indicate a cycle in the dependency graph:
      image

    Example 2: Visual Studio Project System

    If you look at a solution in Visual Studio with multiple projects, those projects will most probably reference other projects. When you build the solution, Visual Studio somehow figures out the correct order in which to build the projects, and always guarantees that all the dependencies are built before it builds any given projects.

    Same things to note here:

    1. Whenever a project changes, VS figures out that its dependent projects will need to be rebuilt during the next build.
    2. A project can reference multiple projects
    3. Projects can build dependency chains and trees
    4. Projects cannot have circular references.

    Example 3: Dynamic Geometry

    In a dynamic geometry application, like the one I'm building as a hobby project, geometric figures depend on other figures during their construction. For instance, you need two points to draw a line - and this line will depend on the two points.

    1. As you move a point, the line will redraw to pass through this point
    2. A figure can depend on multiple figures (a parallel line needs a point and a line to be constructed)
    3. Same as above
    4. Same as above

    Why is this important?

    It is important to be aware of such things as dependencies in your domain model and to be explicit about them in your code. Clearly expressed intent in code is goodness that is incredibly useful for those who write the code, who read the code and who use the code.

    I've seen a case where a program naively relied on the order in which objects were created, and failed to save them if the order was changed. It gave a message box like "Please rearrange your objects because I can't figure out in which order to save them, because they have complex interdependencies". Well, the dependency scheme was exactly like in the three examples above, so the application could very well handle the dependencies itself.

    Can you imagine Visual Studio giving you a message box saying: "Please rearrange the projects in your solution because I can't figure out in what order to build them"? That's why it's important - thinking a little bit about such things can make your code clean, smart, robust and the application easy to use.

    Data structure: DAG (Directed Acyclic Graph)

    A data structure that is commonly used to model such dependencies is called DAG. Wikipedia does a great job describing it, so I will just highlight some points here. First, and foremost, every tree is a DAG:

    image

    But in a tree, a node cannot depend on more than one parent (like single inheritance in C#). In a DAG, a node can very well have more than one parent:

    image

    This "diamond" shape is well known to C++ programmers - multiple inheritance is a great example of a DAG. Wikipedia makes a great point that a DAG "flows" in one direction (there can be no cycles).

    Each node in a DAG has a list of dependencies ("parents") and a list of dependents ("children"):

    interface IDagNode
    {
        IDagNodeList Dependencies { get; }
        IDagNodeList Dependents { get; }
    }

    You don't have to store Dependents as well and recalculate them every time, but storing dependents greatly simplifies many algorithms, and it also makes the data structure symmetric.

    With a tree, you clearly know where's the top (root) and the bottom (leaves). A DAG, on the other hand, is symmetric - it can have many "roots" and many "leaves". Our diamond, for example, has one root (above) and one leaf (below) - but if you draw the arrows in the opposite direction, a root becomes a leaf, and a leaf becomes a root. Officially, a "root" is called a source, and a "leaf" is called a sink.

    Let's try to give some definitions:

    • Node A directly depends on node B if A.Dependencies.Contains(B).
    • Node A depends on node B if A directly depends on B or A directly depends on some C, which in turn depends on B (recursive definition).
    • A node is a source if it doesn't depend on any other node.
    • A node is a sink if no other node depends on it.

    Note that a non-empty DAG always has at least one source and at least one sink. Finding sources and sinks is easy - iterate over all nodes and select those that have their Dependencies/Dependents collection empty.

    Algorithm: Topological Sort

    So what's the use of a data structure if we don't have a useful algorithm that comes with it? Well, it turns out, we do. Every DAG can be linearized (serialized) into a sequence, where a node only depends on nodes that come before it. This special sequence is called topological sort of the DAG.

    Thus, to figure out the build order in which we want our Visual Studio projects to be built, we just take all of our projects and sort them topologically with regard to their references.

    The algorithm is actually pretty simple:

    1. Before visiting a note, check if we have already visited it (e.g. using a hashtable or marking the node) - don't visit it the second time.
    2. For every node that we visit, first visit all the dependencies recursively.
    3. After visiting the children, append the node being visited to the end of the results list.

    After visiting all the nodes, the results list will contain all the nodes sorted topologically.

    In my plane geometry application I use topological sorting to recalculate the coordinates of figures in the correct order - if I recalculate a line before a point is moved, the line will use the points old coordinates and will appear off-base. If you're interested, here's the source code.

    The problem with my implementation is that the algorithm is too tied to the nature of the objects being sorted (figures). In the future, I'll look into making this algorithm generic and reusable in other projects.

    Finally, I'll be interested in any examples of DAGs that you come across. I'm collecting such examples, so I'll be grateful if you tell me about your experiences with DAGs and topological sorting and where you encountered this remarkable data structure.

  • Kirill Osenkov

    Software Development Meme

    • 2 Comments

    John tagged me in a software development meme - a sort of a game where you tell stuff about yourself and then pass on to other people. My moral values tell me it's not a chain letter so I guess it's OK to post :) Besides, it's fun and I'd be curious to learn more about others in the blogosphere.

    How old were you when you started programming?

    16.

    How did you get started in programming?

    My parents got me a book about BASIC and I got hooked up immediately. I only got a computer years later, so I was writing my first programs on paper (poor kid) and imagining how I'd be typing them in on a new shiny gray keyboard with loud clicking sounds... (people start having tears here). I stayed late in school computer class after classes and wrote code. What surprises me is that I guess today I'm no less fascinated by programming as back then...

    I wonder if I still can get my first program right:

    10 PRINT "What is your name?"
    20 INPUT A$
    30 PRINT "Hello " & A$

    I wonder if it "compiles".

    What was your first language?

    GWBASIC. It had an awesome IDE, boasting install size of about 40K, xcopy deployment, 100% keyboard support, lightning-fast responsiveness and an ultra-modern REPL loop similar to F# Interactive. Moreover it stored the entire program in memory - not every IDE can afford that nowadays.

    What was the first real program you wrote?

    There is a belief that every young russian-speaking programmer enthusiast HAS TO implement his/her own clone of Norton Commander and his/her clone of Tetris. This is if you're wondering where all those Volkov Commanders, Dos Navigators and FARs come from.

    Of course, I was no exception to that rule:

    DiskGuide

    Introducing Disk Guide - a file manager for Windows 95, written during 1998-1999 using VB5! If you're adventurous enough, you can give it a spin: http://www.osenkov.com/projects/diskguide/DiskGuide.rar - believe it or not, I'm still using it pretty regularly, after almost 10 years. It has a couple of gotchas (e.g. it chooses to bypass Recycle Bin when deleting files, so be careful), but I haven't seen any major bugs in it during the last several years. There are a lot of "features" though, so be warned (it's not a bug, it's a feature!).

    And here's the obligatory Tetris (VB6):

    Tetris

    Which reminds me. We had a hallway conversation with a colleague recently who is a developer on VB IDE team - he had asserted to be able to implement Tetris from scratch in 15 minutes. So we took a recent build of Visual Studio, a stopwatch - and of course it took him longer than 15 minutes. It took him whole 35 minutes.

    What languages have you used since you started programming?

    GWBASIC, LOGO, QuickBasic, TurboBasic (yeah!!!), VBDOS 4.0, VB5, VB6, TurboPascal, Borland Pascal, C++ (just enough to form a strong opinion about this language), VB.NET 7.0, Haskell (my world goes upside down), Prolog (I thought my world went upside down with Haskell??!), VHDL, ... drumroll... C# 1.0 (here it goes baby!!!), C# 1.1, C# 2.0, VB 8, C# 3.0 and lately some C# 4.0 and VB 10.

    What was your first professional programming gig?

    Ehmm... what is a gig? *goes-to-look-up* Ohh, I see.

    I wrote a dynamic geometry educational software package in VB6. Some pretty serious stuff: http://dg.osenkov.com

    If you knew then what you know now, would you have started programming?

    Hell yeah.

    If there is one thing you learned along the way that you would tell new developers, what would it be?

    People, please strive to write clean code - learn, learn, learn - always seek ways to improve, rewrite and refactor your code - be perfectionist about the code. It's the journey that's important, not the destination.

    What’s the most fun you’ve ever had... programming?

    Definitely implementing the dynamic geometry project - first feeling of control over your code, first results, first rewarding successes and implemented features - that was fun. By the way I shipped it, I can't believe it.

    Over to you

    I herewith tag Mr. Campbell, Mr. McNamara, Mr. Carpenter, Mr. Maino and Mr. Smith.

  • Kirill Osenkov

    A curious subtlety about how CLR does interface dispatch on array types

    • 15 Comments

    This post is about some internal CLR magic that usually makes accessing .NET arrays through an interface up to 100 times faster (if you're an optimist; or accessing arrays up to 100 times slower in the worst case, if you're a pessimist). I'm usually very curious about how this universe works under the hood - and finding out about the internals of the CLR for me is like discovering laws of physics :) So if you ever find yourself measuring the timing of array access through generic interfaces in performance critical code, this post is probably worth taking a look at.

    Anyway, here's the story. A customer (Robert Kieboom from CityGIS) has reported an intermittent performance decrease when accessing arrays through the IList<T> interface. Specifically, for some weird reason, same codepath started executing 100 times slower after a call to some seemingly unrelated code. Huge thanks to Robert for providing this precise and exact repro (I hope he doesn't mind if I share it):

    class Program
    {
        public static int AddValues(IList<int> list)
        {
            int add = 0;
            /*foreach ( int value in list )
              add += value;*/
            for (int i = 0; i < list.Count; i++)
                add += list[i];
            return add;
        }
    
        static void TestAddArray(int[] items)
        {
            Stopwatch timer = Stopwatch.StartNew();
            AddValues(items);
            timer.Stop();
            Console.WriteLine("Array: {0} ms.", timer.ElapsedMilliseconds);
        }
    
        static void TestAddIList(int[] items)
        {
            IList<int> cl = new List<int>(items);
    
            Stopwatch timer = Stopwatch.StartNew();
            AddValues(cl);
            timer.Stop();
    
            Console.WriteLine("IList: {0} ms.", timer.ElapsedMilliseconds);
        }
    
        static void Main(string[] args)
        {
            int[] items = new int[1000000];
            for (int i = 0; i < items.Length; i++)
                items[i] = i - (items.Length / 2);
    
            TestAddArray(items); // fast
            TestAddArray(items); // fast
    
            TestAddIList(items); // something weird happens??!
    
            TestAddArray(items); // slow
            TestAddArray(items); // slow
        }
    }

    Here's what this program does. First we call TestAddArray a couple of times that uses an array disguised as an IList<int> - both calls are very fast, each about 108ms on my machine.

    Then we call TestAddIList that uses a List<int> disguised as an IList<int> - this call is even faster, 21 ms.

    Now - watch this - both subsequent calls to TestAddArray take 3638ms and 3677ms respectively - about 33 times slower!

    If you remove the call to TestAddIList(items), all four calls will be equally fast. What's going on? The same codepath becomes slower after some random call? We were puzzled. After Eric couldn't find any explanation for this from the C# compiler standpoint (we were producing correct IL), it was clear that the issue is deeper than IL and has to do with the execution engine itself. So I routed the bug to the CLR team - and pretty soon we had a reply from them.

    It turns out, CLR has some optimizations on how they do virtual dispatch on a generic interface, if the runtime type is an array. As I'm afraid to talk nonsense outside of my area of expertise, I'll just quote the bug description from the CLR folks:

    The problem is rooted in the fact that IList<T>::get_Item is an interface invocation on an array type (in the example an int[]).  Because arrays are very common, and interface dispatch through interfaces like IList<T> is uncommon, the runtime saves significant space by having special logic for dispatching through generic interfaces like these. 

    Our interface dispatch logic has an important optimization where FOR EACH CALL SITE, it checks one particular target explicitly, and if that fails, does slower hash table lookup, and if that lookup fails, fall back to an  expensive lookup.  Thus for calls sites that tend to go to one destination it is very fast, and call sites that go to many targets, you get good, but not as great performance.

    As it turns out, there is a problem in the special logic for dispatch logic for arrays (but not for ordinary interfaces) that cause the hash table lookup to fail.    What this means is either interface dispatch FOR ARRAY TYPES  is very fast (if the explict check works), or very slow (when it does not). 

    In the fast case, the interface calls to 'get_Item and get_count' (used in the [] operator) call with at 'this' pointer of int[] first, which means this is the value that is used for the 'fast check'.   The second time we call it with List<int>, not int[]. However since there is no bug associated with normal (not array) dispatch, everything works well.

    However when the benchmarks are reversed, List<int> is the first 'this' that dispatches through 'get_Item' and 'get_Count' call sites so they get the benefit of the fast check. When the first benchmark is run, it now dispatches using an int[] as the 'this' pointer, and because of the bug, the hash table lookup always fails and so only the very slow lookup mechanism is used.

    Workarounds:

    1) In performance critical code, ensure that if arrays are passes as IList<T> (or super Interfaces), that T[] is always called first.  There is a good chance that this approach is impractical, but if there are only a few such critical paths then it is possibly useful. 

    2) Always use List<T> where you would have used T[]. By the way, if you can avoid using Collection<T> that is probably a good thing, as it is a wrapper that in most cases does not add much value.

    3) In any performance critical paths, special case the array case with code like

                 method(IList<T> aIList)

                 {

                        T[] asArray = aList as T[]

                        if (asArray != null)

                        {

                                  // logic on arrays

                        }

                        else

                        {

                                  // logic on aList

                        }

    Option 3 has the advantage that it is SIGNIFICANTLY faster than interface dispatch will ever be because the array check is done only once (and then lots of operations are done), rather than effectively being done on every interface invocation. I would not recommend any work-around that does not have some positive effect on your code base.  I could believe that options 2 and 3 MAY be a superior solution in the long term anyway, so I would suggest investigating them.

    This explains why calling arrays through generic interfaces might under certain conditions become considerably slower (or, alternatively, why these calls are in many cases considerably faster than the general case). In any case, virtual dispatch is slower than calling array members directly, so if you have a chance, just work with arrays directly or use generic collections like List<T>.

    Regarding the future fate of this particular CLR optimization failing in some cases, we are considering whether we shall fix it for the next release of the .NET framework. However, as always: no promises here folks! If you do hit this problem (which is pretty unlikely), just use the workarounds for now - thankfully they are easy and reliable.

    I hope this was insightful and many folks have found this interesting - I'd like to give credit to everyone who reported, reproduced and investigated this issue (including, among others, Robert Kieboom, Arie Leeuwesteijn, Erik Meijer, Eric Lippert, Vance Morrison and Jan Kotas) - and I hope they don't mind me sharing this with the rest of the .NET world.

  • Kirill Osenkov

    Live Geometry with Silverlight 2

    • 24 Comments

    I'm happy to announce a project I started on CodePlex: http://codeplex.com/DynamicGeometry

    Live preview at: http://geometry.osenkov.com

    Dynamic geometry

    In a nutshell, it's an interactive designer for ruler-and-compass constructions - it lets you plot points, connect them with lines, construct circles, intersection points, parallel lines - everything that you usually do with "plain" geometry. But then the magic part comes in - you can drag the points and the entire construction is recalculated immediately - this is why they call it dynamic geometry. The way it works is the program doesn't store the coordinates of figures - instead, it stores the algorithm of their construction. Every time you move any of the points, the entire construction is being reconstructed on-the-fly given the new coordinates of the point using the algorithm stored in memory - that's why I like calling it "CAD with lazy evaluation".

    The actual program available online

    The application is deployed live at http://geometry.osenkov.com - you'll need Silverlight 2 Beta 2 to view this site. At first, the UI might seem not very intuitive for you - it takes a while to figure out how to use the toolbar and construct tools - but I hope an interested reader can tackle this challenge. The trick is to know the rules how the toolbox buttons work. Plotting points is easy - just select the point tool and click anywhere. Dragging points is easy too - select the arrow (cursor) tool and drag the points. Now, to construct a segment between two points, you select the segment tool, click (and release) the first point, and then click (and release) the second point. See, you specify that this segment depends on two points - every time you drag one of the points using the Drag tool, the segment will update itself accordingly.

    The source code is available online too!

    I used Visual Studio 2008, C# 3.0 and Silverlight 2 Beta 2. I have to say, those are awesome technologies and a pleasure to use (at least for me).

    I split the code into several projects. DynamicGeometry is a class library that provides a framework for defining figures and their interaction. My goal was to allow for extensibility - so that you can just plug in a new kind of a figure, and automatically consume all the centralized goodies - serialization, painting, dependency tracking, transaction system, interaction, etc. Now, the code for almost every figure fits in one screen of text - so I rarely have to scroll when editing these files. SilverlightDG is the Silverlight 2 front-end - UI. DG is the WPF front-end. I try to maintain two projects - WPF and Silverlight - to build from the same sources. So far so good :)

    Implementation details

    The project is still in its early Mickey Mouse stages and only some very basic functionality is there - however, there is already something that is worth noting:

    • an extensible type hierarchy to model geometric figures - all starts with IFigure and FigureBase
    • a State and Strategy pattern implementations to enable interactive mouse input from the user - different toolbox buttons provide different strategies to handle mouse input
    • a pretty decent transaction system, which enables nested transactions, Rollback, delayed execution, record/playback and, of course, unlimited Undo/Redo
    • a dependency tracking mechanism that allows to express dependencies between figures (i.e. a midpoint depends on two existing points) - I employ Topological Sorting to sort the figure DAG by the dependencies - we first want to recalculate the basic figures, and then their dependents
    • a toolbar that took me many painful hours to design - I highly respect UI designers and think that getting the UI right is more difficult than to implement the rest of the actual functionality. I hope I didn't do too bad, although I'm definitely not a great UI designer.

    Silverlight 2 Beta 2

    Most probably by this time you already know what Silverlight is all about - if not, www.silverlight.net is a great place to start. I personally believe that Silverlight is our liberation from the old web story and am very optimistic about this technology. While Silverlight 1.0 only allowed JavaScript programmability, Silverlight 2 gives you full experience of .NET and managed languages - C#, VB, IronPython, IronRuby and others (did I forget something?). I think that Silverlight's advantage over Adobe technologies is this rich programming model - the fact that you can use .NET and C# 3.0 for the web client just makes me so excited about this.

    Targeting both Silverlight and WPF from same source

    If you look closer at my source code, you will notice that I have two projects - the WPF one and the Silverlight one - including the same .cs files. Because WPF and Silverlight use different CLR and runtimes, the project format is incompatible - i.e. I can't include a Silverlight class library in a WPF application. Workaround is relatively easy - just create two separate .csproj files in the same folder, that reference the same sources - and you're fine.

    CodePlex

    By the way, CodePlex is a quite awesome site for open-source projects - it provides version control (I love TFS and think it's excellent), item tracking, forums, wiki, downloads, stats - all for free and without annoying ads. Codeplex is one of the products that still make me proud to be working at Microsoft.

    DG 1.0 - the inspiration

    The project that I'm describing in this post is actually based on an earlier one that I've implemented 7-8 years earlier - here's a blog post about my original dynamic geometry project: http://kirillosenkov.blogspot.com/2007/12/dg-10-free-dynamic-geometry-software.html

    I'd like to reach parity with the first version - however this is going to be tricky, because it has a lot of different goodies - from analytical expressions to ability to create hyperlinks and compose complex documents. Interestingly enough, I implemented DG 1.0 using VB6 - and though I think VB6 was an awesome environment at that time (I think it's Edit-and-Continue is still better than the one we have in Visual Studio 2008), I feel so much better and more comfortable using C# 3.0 - it just lets me express stuff about my program that I want to express in the way that I want.

    I hope some of you have found this interesting. I'd be happy to hear any feedback or advice on this.

    Thanks!

  • Kirill Osenkov

    MEF: Microsoft's Managed Extensibility Framework

    • 2 Comments

    Krzysztof Cwalina announced that Microsoft has released a CTP of its Managed Extensibility Framework (MEF).

    MEF is basically a dependency injection framework that allows to provide extensibility for your applications (add-in/plug-in framework). Also, it's a very good way to componentize your application's architecture and make components/layers pluggable and replacible. Here's a very good overview of what MEF actually is with nice examples and excellent comments from readers: http://blogs.msdn.com/kcwalina/archive/2008/04/25/MEF.aspx

    Go grab it here: http://code.msdn.microsoft.com/mef. The download contains MEF itself (ComponentModel.dll) and several interesting samples.

  • Kirill Osenkov

    Testing

    • 10 Comments

    If you're interested in testing or would like to learn more about it, here are a couple of links:

    • MSDN recently announced Tester Center: http://msdn.com/testercenter - this is a centralized resource about testing from Microsoft. Quote:
      "The Microsoft Tester Center showcases the test discipline as an integral part of the application lifecycle, describes test roles and responsibilities, and promotes the test investments required to deliver high-quality software."
    • Our QA team has a blog about testing called CSharpening: http://blogs.msdn.com/csharpening/ Unfortunately, the content there is not being updated very frequently, because we're busy testing the Visual Studio 2008 SP1 and the first new features of Visual Studio "10". Yours truly intends to post testing-related articles there.

    Some common facts about my job as a tester at Microsoft (to clarify some common misconceptions):

    • Officially my position is called SDET - Software Design Engineer in Test.
    • We develop all of our test code using most recent builds of Visual Studio "10" and C# 4.0.
    • We do almost no manual testing - almost all of our tests are automated, which means they can run on a lab machine unattended.
    • We run our tests on a variety of operating systems, languages and VS SKUs (e.g. Visual C# Express 2008 on Japanese Vista or VSTS on Windows Server 2008 with Visual Studio 2005 already installed side-by-side).
    • The Visual Studio Managed Languages Team has about 30000 functional C# and VB tests which run overnight for about 12 hours on 5 to 10 lab machines (paralleled).
    • C# tests are written in C#, VB tests are written in VB, F# tests are written in F#.
    • We have a lot of automated UI tests - where mouse moves automatically and keyboard "types" on its own - its fun to sit and watch a test creating a C# project, opening files, typing, using refactorings, navigating, etc.
    • We prefer testing on the component level, bypassing the UI.
    • Our developers write unit-tests, while we concentrate on the functional testing and end-to-end scenarios.
    • We reach more than 70% code coverage with our tests - this is a required minimum for our team.
    • We use TFS (Team Foundation Server) for source control and work item tracking.
    • Many of our tests use a new data-driven test framework where our test scripts are written in XML and are interpreted by a script-like engine.
    • Most of Visual Studio tests are out-of-process - our tests attach to a Visual Studio process and control it using Remoting.
    • We have a "bug-bash" day about once a month where the entire team just uses Visual Studio and logs bugs.
    • We also have AppWeeks, where we form groups and build applications using our product.

    Please let me know if you'd like to know more. Thanks!

  • Kirill Osenkov

    Algorithms in C#: Connected Component Labeling

    • 11 Comments

    I have to admit, I'm not very good at interviews. For some reason my mind isn't trained well for sorting linked lists or balancing binary trees on the whiteboard. I have no problem designing 600+ type hierarchies and building complex object-oriented frameworks, but typical interview questions were never an area where I was shining. Maybe it's because I like to think slowly and take my time, maybe for some other reasons :) Anyway, just out of the blue I decided to implement an algorithm in C# with the goal to improve my algorithm skills and to enjoy the power and expressiveness of my most favorite language and platform.

    As it usually goes, I first sat down and spent three hours coding the whole thing, and after that I did some online research, which revealed the algorithms official name:

    Connected component labeling

    The goal is, given a 2D-matrix of colors, to find all adjacent areas of the same color, similar to the flood-fill algorithm used in Paint.

    Standard (existing) solution - Union-Find

    Online research had revealed that there is a standard algorithm for this that heavily employs Union-Find. The union-find approach starts with a disjoint matrix, where every element is its own set and then iteratively joins (unions) neighboring sets together. The secret of Union-Find is twofold:

    • Extremely fast (constant-time) union of two sets
    • Extremely fast (almost constant-time) determining whether two points belong to the same set or not

    I remember when I was at the university, I even knew how to prove that the complexity of Union-Find is bounded by the inverse Ackermann function, which, believe me, grows very slowly. The important thing that I carried out for the future is that for all practical purposes you can safely consider this O(1).

    My solution

    I was somewhat proud to have implemented a different approach. If you have seen this anywhere before, I wouldn't be surprised though, so please bear with me. The idea is to split the "image" into horizontal stripes ("spans") of the same color. Then scan all the rows of the image row-by-row, top-to-bottom, and for each row, for each span in a row, attach the span to the neighbor spans of the previous row. If you're "touching" more than one existing component, join them into one. If you're not touching any pixels of the same color, create a new component. This way, we split all horizontal spans into three generations: 0, 1 and 2 (like the .NET Garbage Collector!). Generation 0 is the current row, Generation 1 is the row above it, and Generation 2 are all the rows above these two. As we finish processing a row, Generation 1 is added to Generation 2 and Generation 0 becomes Generation 1.

    MSDN Code Gallery

    I published the source code at http://code.msdn.microsoft.com/ConnectedComponents. MSDN Code Gallery is an awesome new website where you can publish your code samples and source code. I discovered that I'm not the only one to publish algorithms in C#: see for example, A* path finding.

    Performance challenge

    I'd be very curious to measure the performance of my current code. I know it isn't optimal because there is at least one place where I can do considerably better. For the profiling wizards among you, I won't reveal where it is for now. For the tuning wizards, if my algorithm takes 100% time, how many percent can you get? I myself expect (but not guarantee) to be around 5-10% faster. Can anyone do better?

    Also, I'd be curious how my algorithm performs compared to the classical union-find implementation. Anyone interested to implement and see if it does better? (I hope I won't regret posting this when someone shares an implementation which is 100 times faster than mine...)

    Cleaner code challenge

    I dare to hope that my code is more or less clean and more or less well-factored. However I learned one lesson in life - whenever I start thinking that I'm good, life immediately proves otherwise. That's why I welcome any improvement suggestions. I understand that on such Mickey Mouse-size projects it doesn't really matter that much if the code is clean or not, but anyway. The design could be better, the use of .NET and C# could be better, the distribution of responsibilities could be better, my OOP kung-fu could be better. I'd like to use little projects like this to learn to code better.

    Language challenge

    Care to implement it in F# or, say, Nemerle? This is your chance to show the world that C# is not the only .NET language! Of course, non-.NET languages are highly welcome as well. I'd personally be interested in, for example, Haskell, Lisp, Prolog, Scala, Nemerle, Boo, (Iron)Ruby, (Iron)Python, Smalltalk, Comega, SpecSharp, SML.NET and so on.

    Silverlight challenge

    Have you installed Silverlight 2 beta? Need an idea what to implement? ;-)

  • Kirill Osenkov

    How to determine the .NET installation directory and CLR version

    • 0 Comments

    I recently discovered the System.Runtime.InteropServices.RuntimeEnvironment class, especially RuntimeEnvironment.GetRuntimeDirectory().

    Also, RuntimeEnvironment.GetSystemVersion() gets the version of the CLR that is running the current process.

  • Kirill Osenkov

    How to map a Visual Studio macro to a keyboard shortcut?

    • 3 Comments

    The complete answer is here: http://msdn.microsoft.com/en-us/library/a0003t62.aspx

    Excerpt:

    1. Choose Options on the Tools menu to display the Options dialog box.
    2. In the Environment folder, click Keyboard.
    3. In the Show commands containing box, type "macros." Once you do this, all commands beginning with the word "macros" appear in the commands list.
    4. Scroll down the list to your macro.
    5. Click the Press shortcut key(s) box type a key combination, such as CTRL+SHIFT+ALT+A. This will be the keyboard shortcut that executes the macro. You can use a different key sequence if you prefer.
    6. Click Assign and then click OK. Your macro is now bound to that keyboard shortcut.
    kick it on DotNetKicks.com
  • Kirill Osenkov

    More blogging goodness from C# IDE QA

    • 0 Comments

    I'm very happy to announce that two of my team members have started blogging.

    Eric Maino (http://blogs.msdn.com/eric) is our infrastructure expert and, as Charlie puts it quite correctly, one of QA's finest engineers. Also, Eric has a misfortune of having been appointed as my own personal dedicated mentor, which means he answers all sorts of lame questions I come up with and otherwise helps me to ramp up as a new Microsoftie. He's doing a terrific job. If you remember Mr. Wolfe from Pulp Fiction, he is the guy who solves problems quickly and effectively whenever you're in despair :)

    Daniel Rathbone (http://blogs.msdn.com/rathblog) is another great peer of mine on the team, known for organized way of thinking and working. Besides organizing information around himself, Dan also organizes the Weekly-Friday-4:30-PM-Visual-Studio-Languages-Drink-Of-The-Week, where the Managed Languages folk gets to taste various kinds of alcohol from all around the planet. And no, we don't write production code after or during Drink of the Week.

     

    P.S. When you read Eric's post about Disposing your Finalizers, I want you all to know, that I was the one who wrote the bad code. There, I admit it. Poor Eric spent a lot of time debugging before he found the problem that I caused.

     

    I will understand, learn and use the Dispose pattern correctly.

    I will understand, learn and use the Dispose pattern correctly.

    I will understand, learn and use the Dispose pattern correctly.

    I will understand, learn and use the Dispose pattern correctly.

    I will understand, learn and use the Dispose pattern correctly.

    I will understand, learn and use the Dispose pattern correctly.

    I will understand, learn and use the Dispose pattern correctly.

    I will understand, learn and use the Dispose pattern correctly.

    ...

  • Kirill Osenkov

    Book review: Jon Skeet's C# in Depth

    • 3 Comments

    If you share some of my interests (like writing clean code, or, even better, writing clean code in C#), you'll already know and love Jon Skeet's Coding Blog. If you enjoy Jon's blog as I do, then there's some really good news: Jon has actually written a book about C# called C# in Depth (http://csharpindepth.com). Manning publications was even kind enough to provide me with a free copy of the book, which is greatly appreciated.

    Now, this is not a usual C# book that just rewrites the spec in a human-readable form. C# in Depth is far from being a boring description of numeric types and for/while/do/break/continue constructs. It shows the most interesting and shiny aspects of the language and provides interesting background and insight on why the feature was introduced and what's the best way to use it. Moreover, Jon does a great job at comparing how to achieve the same results in C# 1.0, 2.0 and 3.0 - and I have to admit, the 3.0 version always looks aesthetically pleasing and way better than the earlier versions. Mastering lambdas, anonymous types & Co is something very important for developing a good programming style and the book will teach exactly that - how to write clean, elegant, beautiful code given all the power of the C# language.

    I really enjoyed the book and I highly recommend it. Thanks Jon!

  • Kirill Osenkov

    Coding productivity: macros, shortcuts and snippets

    • 19 Comments

    Visual Studio macros are a fantastic productivity booster, which is often under-estimated. It's so easy to record a macro for your repetitive action and then just play it back. Even better, map a macro to a keyboard shortcut. I'll share a couple of examples.

    InsertCurlies

    If you open up Visual Studio and type in this code:

    Void1

    How many keystrokes did you need? I've got 10 (including holding the Shift key once). It's because I have this macro mapped to Shift+Enter:

    Sub InsertCurlies()
        DTE.ActiveDocument.Selection.NewLine()
        DTE.ActiveDocument.Selection.Text = "{"
        DTE.ActiveDocument.Selection.NewLine()
        DTE.ActiveDocument.Selection.Text = "}"
        DTE.ActiveDocument.Selection.LineUp()
        DTE.ActiveDocument.Selection.NewLine()
    End Sub

    So I just typed in void Foo() and hit Shift+Enter to insert a pair of curlies and place the cursor inside. Remarkably, I've noticed that with this macro I now almost never have to hit the curly brace keys on my keyboard. Readers from Germany will especially appreciate this macro, because on German keyboard layouts you have to press Right-Alt and the curly key, which really takes some time to get used to.

    This macro is also useful to convert an auto-property to a usual property: you select the semicolon and hit Shift+Enter:

    ConvertAutopropToProp1

    Try it out!

    ConvertFieldToAutoProp

    Suppose you have a field which you'd like to convert to an auto-implemented property:

    ConvertFieldToAutoprop1

    And when you click the menu item, you get:

    ConvertFieldToAutoprop2

    How did I do it? First, here's the macro:

        Sub ConvertFieldToAutoprop()
            DTE.ActiveDocument.Selection.StartOfLine( _
                vsStartOfLineOptions.vsStartOfLineOptionsFirstText)
            DTE.ActiveDocument.Selection.EndOfLine(True)
    
            Dim fieldtext As String = DTE.ActiveDocument.Selection.Text
    
            If fieldtext.StartsWith("protected") _
                Or fieldtext.StartsWith("internal") _
                Or fieldtext.StartsWith("private") Then
                fieldtext = fieldtext.Replace("protected internal", "public")
                fieldtext = fieldtext.Replace("protected", "public")
                fieldtext = fieldtext.Replace("internal", "public")
                fieldtext = fieldtext.Replace("private", "public")
            ElseIf Not fieldtext.StartsWith("public") Then
                fieldtext = "public " + fieldtext
            End If
    
            fieldtext = fieldtext.Replace(";", " { get; set; }")
    
            DTE.ActiveDocument.Selection.Text = fieldtext
        End Sub
    

    And then just add the macro command to the refactor context menu or any other place. This may seem like no big deal, but I had to convert fields to auto-properties recently in 50+ files. I really learned to appreciate this macro.

    gs code snippet

    This is a very little but useful snippet: gs expands to { get; set; }

    <?xml version="1.0" encoding="utf-8" ?>
    <CodeSnippets  xmlns="http://schemas.microsoft.com/VisualStudio/2005/CodeSnippet">
        <CodeSnippet Format="1.0.0">
            <Header>
                <Title>gs</Title>
                <Shortcut>gs</Shortcut>
                <Description>Code snippet for { get; set; }</Description>
                <Author>Microsoft Corporation</Author>
                <SnippetTypes>
                    <SnippetType>Expansion</SnippetType>
                </SnippetTypes>
            </Header>
            <Snippet>
                <Code Language="csharp"><![CDATA[{ get; set; }$end$]]>
                </Code>
            </Snippet>
        </CodeSnippet>
    </CodeSnippets>

    Although I usually use the prop snippet to create auto-implemented properties, but gs is useful in some cases as well.

    I hope this has inspired you to do some personal usability research experiments and define everyday actions that you can optimize using macros, snippets and shortcuts. I would love to hear about your personal tips and tricks as well.

    kick it on DotNetKicks.com

  • Kirill Osenkov

    Did you know? Delegates to extension methods

    • 2 Comments

    I didn't know about this until recently: You can take a delegate to an extension method as if it was an instance method: http://csharpindepth.com/ViewNote.aspx?NoteID=92 The delegate's target will be set to the instance. Nice!

  • Kirill Osenkov

    Some links

    • 1 Comments

    Several interesting blog links I'd like to share.

    • Charlie Calvert - Charlie's blog is where the C# community meets the C# team. We are proud that the communication is bidirectional - not only we are sharing information with the community, but we are also actively listening to feedback (see for example Jon Skeet's wish list for C# 4.0 or Vladimir Reshetnikov's findings about the C# compiler and spec). I highly recommend the Future Focus series - please let us know what do you think about our plans for the next version of C# and Visual Studio. You have a unique opportunity to influence the C# 4.0 features before they are set in stone.
    • Vladimir Reshetnikov - if you like digging deep into the C# grammar, the spec and the dark corners of our compiler, this blog is for you. Vladimir is also known as "nikov".
    • Jacob Carpenter - Jacob finds interesting ways to use the C# language features to better express the program in the language. See also some recent posts on generic type parameter chaining (applied to implement pattern-matching in C# as fluent API).
    • Chris Smith - Chris sits in the office across the hallway and works on the F# QA team. He posts about the F# language and has a lot of nice samples to learn from. This is the right guy to ask, among other things, about pattern-matching, using type-system to check measurement units and Project Euler.
    • Jon Skeet - Jon had a great series of posts recently about the most-wanted new features in C# 4.0.

    The order of this list is randomized and highly non-deterministic.

Page 6 of 7 (154 items) «34567