Fabulous Adventures In Coding

Eric Lippert's Blog

  • The correct answer is "no"

    No technology today. I have not done a post on relationship advice in ages!

    Compare and contrast these two conversations:

    ******

    Version One:

    Alice: Thanks for having lunch with me. I suppose you know what I want to talk about.

    Eric: Yeah, I do. I think you shouldn't jump to conclusions solely on the basis of all the rumours that are going around. There are any number of perfectly innocent reasons why Bob and Eve could have been seen together at Chez Franch together last month.

    Alice: What?!

    Eric: Huh?

    Alice: First she intercepts my email, then she steals my boyfriend?!? Passive attacker my eye!

    This conversation, already off to a bad start, is not going to get any better.

    ******

    Version Two:

    Alice: Thanks for having lunch with me. I suppose you know what I want to talk about.

    Eric: No, I do not.

    Alice: I was hurt that you didn't invite me to the book signing.

    Eric: I did invite you to the book signing. Check your spam filter.

    Alice: [boot laptop, search email...] Ah. Yes you did invite me. And I see your new book is called "Make Money Fast With Nigerian Viagra"; did anyone actually get the invitation?

    Eric: Hmm, that might explain the low turnout.

    ******

    Important human relationship safety tip: the correct answer to "do you know what I am thinking?" is "no". Remember, there are at least two thick slabs of bone between your brain and everyone else's brain. Those thick slabs of bone impede telepathy.

  • The void is invariant

    [UPDATES below] 

    A while back I described a kind of variance that we’ve supported since C# 2.0. When assigning a method group to a delegate type, such that both the selected method and the delegate target agree that their return type is a reference type, then the conversion is allowed to be covariant. That is, you can say:

    Giraffe GetGiraffe() { … }

    Func<Animal> f = GetGiraffe;

    This works logically because anyone who calls f must be able to handle any animal that comes back. The actual method claims to only return animals, and in fact, makes the stronger claim to only return giraffes.

    This works out in the CLR because the bits that make up a reference to an instance of Giraffe are exactly the same bits that make up a reference to that Giraffe interpreted as an instance of Animal. We can allow this magical conversion to happen because the CLR guarantees that it will all just work out without going in there and having to futz around with the bits.

    This is why this trick only works with reference types. A method that returns, say, a double cannot be converted via a covariant conversion to a delegate type that expects the method to return an object. Somewhere there would have to be code emitted that takes the returned double and boxes it to object; the bits of a double and the bits of a reference to an object boxing a double are completely different.

    But why doesn’t this trick work with void types? Here we have a method that returns some sort of success or failure code. Maybe we don’t care what it returns.

    static bool DoSomething(bool b)
    {
      if (b) return DoTheThing();
      else return DoTheOtherThing();
    }

    Action<bool> action = DoSomething;

    This doesn’t work. Why not? The caller of the action is not even going to use the returned value, so it doesn’t matter one bit what it is! Shouldn’t “void” be considered a supertype of all possible types for the purposes of covariant return type conversions from method groups to delegate types?

    No, and I’ll tell you why.

    Consider what happens when you do this:

    bool x = DoSomething(true);

    We spit out IL that does the following:

    (1) put true on the IL stack – the stack gets one deeper
    (2) call DoSomething – the argument is removed from the stack and the return value is placed on the stack.  Net, the stack stays the same size as before
    (3) stuff whatever on top of the stack into local variable x – the stack now returns to its original depth.

    Now consider what happens when you do this:

    DoSomething(true);

    We spit out IL that does the first two steps as before. But we cannot stop there! There is now a bool on the IL stack which needs to be removed. We generate a pop instruction to represent the fact that the returned bool has been discarded.

    Now consider what happens when you do this:

    action(true);

    The compiler believes that action is a void-returning method, so it does not generate a pop instruction. If we allowed you to stuff DoSomething into the action, then we would be allowing you to misalign the IL stack!

    But didn’t I say “the stack is an implementation detail?” Yes, but that’s a different stack. The CLI specification describes a “virtual machine” which passes around arguments and returned values on a stack. An implementation of the CLI is required to make something that behaves like the specified machine, but it is not required to do so in any particular manner. It is not required to use the million-bytes-per-thread stack supplied to each thread by the operating system as its implementation of the IL stack; that’s a convenient structure to use, of course, but it’s an implementation detail that it does so.

    (As an aside: when we implemented the script engines, we also first specified our own private stack-based virtual machine. When we implemented it, we decided to put the information about “return addresses” – that is, “what code do I run next?” on the system stack, but we put arguments and return values of script functions in a stack-shaped block of memory that we allocated on our own. This made building the JScript garbage collector easier.)

    In practice, the jitter uses the system stack for some things and registers for other things. Return values are actually often sent back in a register, not on the stack. But that implementation detail doesn’t help us out when deciding what the conversion rules are; we have to assume that the implementation can do no more than what the CLI specification says. Had the CLI specification said “the returned value of any function is passed back in a ‘virtual register’” rather than having it pushed onto the stack, then we could have made void-returning delegates compatible with functions that returned anything. You can always just ignore the value in the register. But that’s not what the CLI specified, so that’s not what we can do.

    [UPDATE]

    A number of people have asked in the comments why we do not simply generate a helper method that does what you want. That is, when you say

    Action<bool> action = DoSomething;

    realize that as

    static void DoSomethingHelper(bool b)
    {
       bool result = DoSomething(b); // result is ignored
    }
    ...
    Action<bool> action = DoSomethingHelper;

    We could do that. But where would you like the line to be drawn? Should you be able to assign a reference to a method that returns an int to a Func<Nullable<int>>? We could spit a helper method that converts the int to a nullable int. What about Func<double>? We could spit a helper method that converts the int to a double. What about Func<object>? We could spit a helper method that boxes the int, unexpectedly allocating memory off the heap every time you call it. What about a Func<Foo> where there is a user-defined implicit conversion from int to Foo?

    We could be spitting arbitrarily complex fixer-upper methods that would seamlessly "do what you meant to say", and we have to stop somewhere. The exact semantics of what we do and do not fix up would have to be designed, specified, implemented, tested, documented, shipped to customers and maintained forever. Those are costs. Plus, every time we add a new conversion rule to the language we add breaking changes. The costs of those breaking changes to our customers have to be factored in.

    But more fundamentally, one of the design principles of C# is "if you say something wrong then we tell you rather than trying to guess what you meant". JScript is deliberately a "muddle on through and do the best you can" language; C# is not. If what you want to do is make a delegate to a helper method then you express that intention by going right ahead and making that method.

  • Iterators at the Summer Games

    Ed "Scripting Guy" Wilson was kind enough to ask me to be a guest commentator at this years Summer Scripting Games, which have just completed.

    I've been working on a series for this blog about some unusual cases in the design of the "iterator block" feature in C# 2.0; this bit from my commentary is going to be germane to that series. I thought I'd post it here as an appetizer before we get into the main courses of the series. The series on iterators will probably run throughout July.

    The problem for event ten of the scripting games was to write a script which changes the priority of every new process that has a particular name.

    ***********

    There’s an odd thing that you learn when working on developer tools: The people who design and build the tools are often not the experts on the actual real-world use of those tools. I could tell you anything you want to know about the VBScript parser or the code generator or the runtime library, but I’m no expert on writing actual scripts that solve real problems. This is why I was both intrigued and a bit worried when the Scripting Guys approached me and asked if I’d like to be a guest commentator for the 2009 Summer Scripting Games.

    I wrote this script the same way most scripters approach a technical problem that they don’t immediately know how to solve; I searched the Internet for keywords from the problem domain to see what I could come up with. Of course, I already knew about our MSDN documentation, I had a (very) basic understanding of WMI, and I knew that the Scripting Guys had a massive repository of handy scripts.

    My initial naïve thought was that I would have to go with a polling solution; sit there in a loop, querying the process table every couple of seconds, waiting for new processes to pop up. Fortunately, my Internet searches quickly led me to discover that process startup events can be treated as an endless collection of objects returned by a WMI query.

    That got me thinking about the powerful isomorphism between events and collections.

    A collection typically uses a “pull” model — the consumer asks for each item in the collection one at a time as needed, and the call returns when the item is available. Events typically work on a “push” model — the consumer registers a method that gets called every time the event fires. But not necessarily; the WMI provider implements events on a “pull” model. The event is realized as a collection of “event objects.” It can be queried like any other collection. Asking for the next event object that matches the query simply blocks until it is available.

    Similarly, collection iterators could be implemented on a “push” model. They could call a method whenever the next item in the collection becomes available. The next version of the CLR framework is likely to have standard interfaces that represent “observable collections”, that is, collections that “push” data to you, like events do. The ability to treat events as collections and collections as events can lead to some interesting and powerful coding patterns.

    ***********

    The rest of the solution and analysis is here.

    Have a good weekend!

  • Mmm, Curry

    A recent comment asked why Haskell programmers sometimes write C# lambdas in this style:

    Func<int, Func<int, int>> add = x=>y=>x+y;

    which is then invoked as

    sum = add(2)(3);

    because of course the first invocation returns a function that adds two, which is then invoked with three. Why do that instead of the more straightforward

    Func<int, int, int> add = (x,y)=>x+y;

    and invoke it as

    sum = add(2,3);

    ???

    I was going to write a short article on that when I remembered that Wes already had done a better job at it than I likely would. Read this first before you read on.

    Welcome back. I would add to that two interesting facts.

    First, that the operation of rewriting an n-parameter function as a bunch of single-parameter functions to achieve “partial application” is called “currying” in honor of Haskell Curry, the logician after whom the programming languages Haskell and Curry are named.

    And second, that it is a little-known fact that there is an ugly but sometimes useful way to “curry away” the first parameter of an existing function using reflection.

    For example, suppose you've got

    public class C {
      public static int M(T t, int x) { whatever }
    }

    In C#, to curry away the first parameter you could simply say

    T t = something;
    Func<int, int> d = x=>C.M(t, x);

    Which is of course a syntactic sugar for creating a hidden nested class with a field t and a method that takes an int, blah blah blah, lots of boring code spit out to do that.

    If you need to curry away the first parameter of a method and are using some managed language that does not have anonymous functions, or you are writing a compiler but you don’t feel like spitting a whole bunch of helper methods like the C# compiler does, you can also do this directly with reflection:

    T t = something;
    MethodInfo methodInfo = typeof(C).GetMethod("M", BindingFlags.Static | BindingFlags.Public);
    Func<int, int> d = (Func<int, int>) Delegate.CreateDelegate(typeof(Func<int, int>), t, methodInfo);

    This works in the .NET framework 2.0 release or higher, and, unfortunately, requires that T be a reference type.

    Notice that this kind of currying is essentially the same currying that happens when you make an extension method into a delegate. If static method C.M is an extension method extending type T then in C# you can say

    Func<int, int> d = t.M;

    and we essentially curry away the first argument to the extension method when creating the delegate, using tricks similar to the reflection trick. (We pass the managed address of the extension method to a constructor rather than passing the method info to the factory, but, basically its the same thing behind the scenes.) If you try this on an extension of a value type, you'll get an error. Sree has just posted an analysis of why this has to be a reference type.

  • It Already Is A Scripting Language

    My recent post about the possibility of considering maybe someday perhaps adding "top level" methods to C# in order to better enable "scripty" scenarios generated a surprising amount of immediate emphatic pushback. Check out the comments to see what I mean.

    Two things immediately come to mind.

    First off, the suggestion made by a significant number of the commenters is "instead of allowing top-level methods, strengthen the using directive." That is, if you said "using System.Math;" then all the static members of that class could be used without qualification.

    Though that is a perfectly reasonable idea that is requested frequently, it does not actually address the problem that top-level methods solve. A better "using" makes things easier for the developer writing the call. The point of top-level methods for scripty scenarios is to make it easier on the developer writing the declaration. The point is to eliminate the "ritual" of declaring an unnecessary class solely to act as a container for code.

    Second, and more generally, I am surprised by this pushback because of course C# already is a scripting language, and has had this feature for almost a decade. Does this code fragment look familiar?

    <%@ Page Language="C#" %>   
    <script runat="server">   
      void Page_Load(object sender, EventArgs e) 
      {
          // whatever
      }
    </script>
    ...

    Where's the class? Where are the using directives that allow "EventArgs" to be used without qualification? Where's the code that adds this method group to the event's delegate? All the ritual seems to have been eliminated somehow. That sure looks like a "top-level method" to me.

    Of course we know that behind the scenes it isn't any such thing. ASP.NET does textual transformations on the page to generate a class file. And of course, we recommend that you use the "code behind" technique to make a file that actually contains the methods in the context of an explicit (partial) page class, to emphasize that yes, this is a class-based approach to server-side processing.

    But you certainly do not have to use "code behind". If you're an ASP traditionalist (like me!) and would rather see C# as a "scripting language" that "scripts" the creation of content on a web server, you go right ahead. You can put your "top level" methods in "script" blocks and your "top level" statements in "<% %>" blocks and you'll thereby avoid the ritual of having to be explicit about the containers for those code elements. The ASP.NET code automatically reorganizes your "script" code into something that the C# compiler can deal with, adding all the boring boilerplate code that has to be there to keep the compiler happy.

    But consider now the burden placed upon the developers of ASP.NET by the language design. Those guys cannot simply parse out the C# text and hand it to the C# compiler. They've got to have a solid understanding of what the legal code container topologies are, and jump through hoops -- not particularly difficult hoops, but hoops nevertheless -- to generate a class that can actually be compiled and executed.

    This same burden is placed upon every developer who would like to expose the ability to add execution of end-user-supplied code to their application, whether that's in the form of adding extensibility via scripting, or by enabling evaluation of user-supplied expressions (such as queries). It's a burden placed on developers of productivity tools, like Jon Skeet's "snippet compiler", or LINQPad. It's a burden on developers who wish to experiment with REPL-like approaches to rapid development of prototypes, test cases, and so on.

    I am not particularly excited about the convenience of the ability to save five characters by eliding the "Math." when trying to calculate a cosine. The exciting value props are that we might be able to lower the cost of building tools that extend the power of the C# language into third-party applications, and at the same time enable new ways to rapidly experiment and develop high-quality code.

    Of course we do not want to wreck the existing value props of the language in doing so; as I said last time, we also believe that there is huge value in the language design naturally leading professional, large-scale application developers towards building well-factored component-based programs.

    Like all design decisions, when we're faced with a number of competing, compelling, valuable and noncompossible ideas, we've got to find a workable compromise. We don't do that except by considering all the possibilites, which is what we're doing in this case.

  • Why Doesn't C# Implement "Top Level" Methods?

    C# requires that every method be in some class, even if it is a static method in a static class in the global namespace. Other languages allow "top level" functions. A recent stackoverflow post asks why that is.

    I am asked "why doesn't C# implement feature X?" all the time. The answer is always the same: because no one ever designed, specified, implemented, tested, documented and shipped that feature. All six of those things are necessary to make a feature happen. All of them cost huge amounts of time, effort and money. Features are not cheap, and we try very hard to make sure that we are only shipping those features which give the best possible benefits to our users given our constrained time, effort and money budgets.

    I understand that such a general answer probably does not address the specific question.

    In this particular case, the clear user benefit was in the past not large enough to justify the complications to the language which would ensue. By restricting how different language entities nest inside each other we (1) restrict legal programs to be in a common, easily understood style, and (2) make it possible to define "identifier lookup" rules which are comprehensible, specifiable, implementable, testable and documentable.

    By restricting method bodies to always be inside a struct or class, we make it easier to reason about the meaning of an unqualified identifier used in an invocation context; such a thing is always an invocable member of the current type (or a base type). 

    Now, JScript.NET has this feature. (And in fact, JScript.NET goes even further; you can have program statements "at the top level" too.) A reasonable question is "why is this feature good for JScript but bad for C#?"

    First off, I reject the premise that the feature is "bad" for C#. The feature might well be good for C#, just not good enough compared to its costs (and to the opportunity cost of doing that feature instead of a more valuable feature.) The feature might become good enough for C# if its costs are lowered, or if the compelling benefit to customers becomes higher.

    Second, the question assumes that the feature is good for JScript.NET. Why is it good for JScript.NET?

    It's good for JScript.NET because JScript.NET was designed to be a "scripty" language as well as a "large-scale development" language. "JScript classic"'s original design as a scripting language requires that "a one-line program actually be one line". If your intention is to make a language that allows for rapid development of short, simple scripts by novice developers then you want to minimize the amount of "ritual incantations" that must happen in every program. In JScript you do not want to have to start with a bunch of using clauses and define a class and then put stuff in the class and have a Main routine and blah blah blah, all this ritual just to get Hello World running.

    C# was designed to be a large-scale application development language geared towards pro devs from day one; it was never intended to be a scripting language. It's design therefore encourages enforcing the immediate organization of even small chunks of code into components. C# is a component-oriented language. We therefore want to encourage programming in a component-based style and discourage features that work against that style.

    This is changing. "REPL" languages like F#, long popular in academia, are increasing in popularity in industry. There's a renewed interest in "scripty" application programmability via tools like Visual Studio Tools for Applications. These forces cause us to re-evaluate whether "a one line program is one line" is a sensible goal for hypothetical future versions of C#. Hitherto it has been an explicit non-goal of the language design.

    (As always, whenever I discuss the hypothetical "next version of C#", keep in mind that we have not announced any next version, that it might never happen, and that it is utterly premature to think about feature sets or schedules. All speculation about future versions of unannounced products should be taken as "for entertainment purposes only" musings, not as promises about future offerings.)

    We are therefore considering adding this feature to a hypothetical future version of C#, in order to better support "scripty" scenarios and REPL evaluation. When the existence of powerful new tools is predicated upon the existence of language features, that is points towards getting the language features done.

    UPDATE: More thoughts on considerations motivating this potential change here.

  • Use your legs, not your back

    In C# you can "lift", "raise" and "hoist", and they all mean different things.

    To "lift" an operator is to take an operator that operates on non-nullable value types, and create from it a similar operator that operates on nullable value types. (We are a little bit inconsistent in exactly how we use the word "lifted", which I documented here.)

    For example, if you have

     public static Complex operator +(Complex x, Complex y) { ... }

     then we automatically generate a lifted operator for you that basically does this:

     public static Complex? operator +(Complex? x, Complex? y) 
     {
       return (x == null || y == null) ?
        (Complex?) null : 
        (Complex?) (x.Value + y.Value);
     }

    "Raising" by contrast refers to events -- not to exceptions, which are of course "thrown". Another common term for raising an event is "firing". Given that it makes sense to standardize on one or the other, the usage committee people felt that between "raising" and "firing", they'd pick the less bellicose-sounding one. Which is maybe a bit silly, but if you've got to pick one, then I suppose that's as good a criterion as any.

    Finally, "hoisting" is what we call it when the compiler emits a field for what looks like a local variable, because that local variable is in fact a captured outer variable of an anonymous function (or a local of an iterator block). When you have:

    class C
    {
      void M()
      {
        int x = 123;
        Func<int, int> f = y=>x+y;
    ...

    then we rewrite that as if you'd written something like:

    class C
    {
      private class Locals
      {
        public int x;
        public int Method(int y) { return this.x + y }
      }
      void M()
      {
        Locals locals = new Locals();
        locals.x = 123;
        Func<int, int> f = locals.Method;
    ...

     see, local "x" has been "hoisted" up and out of its declaration space.

    Even after a number of years on the compiler team, I still misuse "raise", "lift" and "hoist" in casual conversation; that they have such similar meanings in English and dissimilar meanings as jargon is unfortunate, but usually doesn't result in too much confusion.

  • Making it easier

    I read an article in a technology column on MSNBC a while back, the upshot of which was “I have umpteen-dozen passwords I’ve got to have memorized these days; I thought technology was supposed to make my life easier!”

    Really?

    First of all, let’s leave aside the obvious fact that our column writer has a technology-driven standard of living which includes affordable access to varieties of food, drink, shelter, clothing, medicine, travel, and entertainment all of which were beyond the wildest dreams of the crowned heads of Europe less than two centuries ago, much less the peasants. Technology does seem to have made lives a lot easier all around. But I rather want to address the actual claim: is the purpose of technology to make your life easier?

    I don’t think it is. I think the purpose of technology is to make specific tasks easier.

    Surprise! Let’s take a task like “communicate with a person in Australia”. Suppose you live in, say, London. In the early 1800’s, the best way to get a message to a colleague in Australia was to write the message out with your goose quill pen, making multiple copies. Wrap each up in oilcloth, the most waterproof substance of the time. Find several convenient Royal Navy ships or merchant ships going to Australia by different routes and at different times; Napoleon might sink several of them but one of them will probably get through. Entrust your waterproofed messages to the captains of those ships, and then wait the several months it takes to get the message around either the Cape or the Horn and the reply back to you.

    I take this opportunity to point out that this already requires an extremely high level of technology. Safely navigating a vessel from England to Australia is not an easy task under the best of circumstances. But obviously we can do better now; the 1800’s solution strikes me as somewhat more arduous and time-consuming as a whole than having to remember your email password.

    The task of communicating with Australia is now much easier, thanks to improved technology. And because we have increased the scope of what is feasible, we can take advantage of new capabilities; in the 1800’s, real-time arbitraging of price differences between Australian and European commodity market derivatives was not anyone’s job description, but I’m sure that there are people pursuing this highly complex task today. The technology enables us to find new ways to complicate our lives, ways that were never possible before because we were too busy spending effort on solving other problems.

    I think about this kind of thing a lot in the context of programming language design.

    When we add a new feature to the language, we almost always make a specific task easier. But in doing so, we also almost always make the language as a whole more complex and therefore harder to learn. We make it more likely that there will be a communications divide between those who know how the new feature works and those who do not, making it harder for everyone to read and understand each other’s code.

    This is one of the major reasons why new features start with “-100 points”, as I’ve often said. Coming up with features that make specific tasks easier is, well, easy. But that’s not enough; the feature has to make things so much better that it justifies the additional complexity added to the language.

    ******

    Wikimedia Commons photo by Ted “Rufus” Ross.

  • What does the optimize switch do?

    I was asked recently exactly what optimizations the C# compiler performs when you specify the optimize switch. Before I answer that, I want to make sure that something is perfectly clear. The compiler’s “usage” string is not lying when it says:

    /debug[+|-]     Emit debugging information
    /optimize[+|-]  Enable optimizations

    Emitting debug information and optimizing the generated IL are orthogonal; they have no effect on each other at all (*). The usual thing to do is to have debug on and optimizations off, or vice versa, but the other two combinations are perfectly legal. (And while I’m at it, turning debug information generation on does not also do /d:DEBUG; that’s also orthogonal. Again, the sensible thing to do is to keep /debug and /d:DEBUG in sync, but you do not have to; you might want to debug the optimized version without assertions and we’re not going to stop you.)

    From this point on, when I say “emitting” I mean “producing metadata”, and when I say “generating” I mean “producing IL”.

    So, first off, there are some optimizations that the compiler always performs, regardless of the optimize flag.

    The language semantics require us to do constant folding because we need to detect that this is an error:

    switch(x) {
    case 2:
    case 1+1:

    The compiler will replace usages of constant fields with the constant, but does not perform any more sophisticated constant propagation, even though the compiler has a reachability analyzer.

    Speaking of which, there are two kinds of reachability analysis we perform for dead code elimination. First, there is the “according to spec” reachability. The specification calls out that reachability analysis consumes compile-time constants. This is the reachability analysis we use to give “unreachable code” warnings, to determine whether local variables are definitely assigned on all code paths, and to determine whether the end point of a non-void method is reachable. If this first pass determines that code is unreachable then we never generate IL for it. For example, this is optimized away:

    if (false) M();

    But if the expression involves non-compile-time constants then this first form of reachability analysis would tell you that the call to N here is reachable, and therefore is a definite assignment error:

    int x = M();
    int y;
    if (x * 0 != 0) N(y);

    If you fixed that up so that y was definitely assigned, then the first-pass reachability analyzer would NOT trim this code because it still thinks that the call is reachable.

    But clearly you and I know that the call is not reachable, because we know more about arithmetic than the specified reachability analyzer. We perform a second pass of reachability analysis during code generation which is much smarter about these things. We can do that because the second pass only affects codegen, it does not affect language semantics, error reporting, and so on.

    The existence of that second pass implies that we do a simple arithmetic optimizations on expressions which are only partially constant. For example, if you have a method M that returns an integer, then code like

    if (M() * 0 == 0) N();

    can be generated as though you’d written just:

    M();
    N();

    We have lots of simple number and type algebra optimizers that look for things like adding zero to an integer, multiplying integers by one, using “null” as an argument to “is” or “as”, concatenation of literal strings, and so on. The expression optimizations always happen, whether you’ve set the optimize flag or not; whether basic blocks that are determined to be unreachable after those optimizations are trimmed depends on the flag.

    We also perform some small optimizations on some call sites and null checks. (Though not as many as we could.) For example, suppose you have a non-virtual instance method M on a reference type C, and a method GetC that returns a C. If you say GetC().M() then we generate a callvirt instruction to call M. Why? Because it is legal to generate a callvirt for an instance method on a reference type, and callvirt automatically inserts a null check. The non-virtual call instruction does not do a null check; we'd have to generate extra code to check whether GetC returns null. So that's a small code size optimization. But we can do even better than that; if you have (new C()).M(), we generate a call instruction because we know that the result of the "new" operator is never null. That gives us a small time optimization because we can skip the nanosecond it takes to do the null check. Like I said, it's a small optimization.

    The /optimize flag does not change a huge amount of our emitting and generation logic. We try to always generate straightforward, verifiable code and then rely upon the jitter to do the heavy lifting of optimizations when it generates the real machine code. But we will do some simple optimizations with that flag set. For example, with the flag set:

    * Expressions which are determined to be only useful for their side effects are turned into code that merely produces the side effects.

    * We omit generating code for things like int foo = 0; because we know that the memory allocator will initialize fields to default values.

    * We omit emitting and generating empty static class constructors. (Which typically happens if the static constructor set all the fields to their default value and the previous optimization eliminated all of them.)

    * We omit emitting a field for any hoisted locals that are unused in an iterator block. (This includes that case where the local in question is used only inside an anonymous function in the iterator block, in which case it is going to become hoisted into a field of the closure class for the anonymous function. No need to hoist it twice if we don’t need to.)

    * We attempt to minimize the number of local variable and temporary slots allocated. For example, if you have:

    for (int i = …)  {…}
    for (int j = …) {…}

    then the compiler could generate code to re-use the local variable storage reserved for i when j comes along.

    * Also, if you have a local which is never used at all, then there is no storage allocated for it if the flag is set.

    * Similarly, the compiler is more aggressive about re-using the unnamed temporary slots sometimes used to store results of subexpression calculations.

    * Also, with the flag set the compiler is more aggressive about generating code that throws away “temporary” values quickly for things like controlling variables of switch statements, the condition in an “if” statement, the value being returned, and so on. In the non-optimized build these values are treated as unnamed local variables, loaded from and stored to specific locations. In the optimized build they can often be just kept on the stack proper.

    * We eliminate pretty much all of the “breakpoint convenience” no-ops.

    * If a try block is empty then clearly the catch blocks are not reachable and can be trimmed. (Finally blocks of empty tries are preserved as protected regions because they have unusual behaviour when faced with certain exceptions; see the comments for details.) 

    * If we have an instruction which branches to LABEL1, and the instruction at LABEL1 branches to LABEL2, then we rewrite the first instruction as a branch straight to LABEL2. Same with branches that go to returns.

    * We look for “branch over a branch” situations. For example, here we go to LABEL1 if condition is false, otherwise we go to LABEL2.

    brfalse condition, LABEL1
    br LABEL2
    LABEL1: somecode

    Since we are simply branching over another branch, we can rewrite this as simply "if condition is true, go to LABEL2":

    brtrue condition, LABEL2
    somecode

    * We look for “branch to nop” situations. If a branch goes to a nop then you can make it branch to the instruction after the nop.

    * We look for “branch to next” situations; if a branch goes to the next instruction then you can eliminate it.

    * We look for two return instructions in a row; this happens sometimes and obviously we can turn it into a single return instruction.

    That’s pretty much it. These are very straightforward optimizations; there’s no inlining of IL, no loop unrolling, no interprocedural analysis whatsoever. We let the jitter team worry about optimizing the heck out of the code when it is actually spit into machine code; that’s the place where you can get real wins.

    *******

    (*) A small lie. There is some interaction between the two when we generate the attributes that describe whether the assembly is Edit’n’Continuable during debugging.

  • “Out Of Memory” Does Not Refer to Physical Memory

    Inside Eric's head I started programming on x86 machines during a period of large and rapid change in the memory management strategies enabled by the Intel processors. The pain of having to know the difference between “extended memory” and “expanded memory” has faded with time, fortunately, along with my memory of the exact difference.

    As a result of that early experience, I am occasionally surprised by the fact that many professional programmers seem to have ideas about memory management that haven’t been true since before the “80286 protected mode” days.

    For example, I occasionally get the question “I got an ‘out of memory’ error but I checked and the machine has plenty of RAM, what’s up with that?”

    Imagine, thinking that the amount of memory you have in your machine is relevant when you run out of it! How charming! :-)

    The problem, I think, with most approaches to describing modern virtual memory management is that they start with assuming the DOS world – that “memory” equals RAM, aka “physical memory”, and that “virtual memory” is just a clever trick to make the physical memory seem bigger. Though historically that is how virtual memory evolved on Windows, and is a reasonable approach, that’s not how I personally conceptualize virtual memory management.

    So, a quick sketch of my somewhat backwards conceptualization of virtual memory. But first a caveat. The modern Windows memory management system is far more complex and interesting than this brief sketch, which is intended to give the flavour of virtual memory management systems in general and some mental tools for thinking clearly about what the relationship between storage and addressing is. It is not by any means a tutorial on the real memory manager. (For more details on how it actually works, try this MSDN article.)

    I’m going to start by assuming that you understand two concepts that need no additional explanation: the operating system manages processes, and the operating system manages files on disk.

    Each process can have as much data storage as it wants. It asks the operating system to create for it a certain amount of data storage, and the operating system does so.

    Now, already I am sure that myths and preconceptions are starting to crowd in. Surely the process cannot ask for “as much as it wants”. Surely the 32 bit process can only ask for 2 GB, tops. Or surely the 32 bit process can only ask for as much data storage as there is RAM. Neither of those assumptions are true. The amount of data storage reserved for a process is only limited by the amount of space that the operating system can get on the disk. (*)

    This is the key point: the data storage that we call “process memory” is in my opinion best visualized as a massive file on disk.

    So, suppose the 32 bit process requires huge amounts of storage, and it asks for storage many times. Perhaps it requires a total of 5 GB of storage. The operating system finds enough disk space for 5GB in files and tells the process that sure, the storage is available. How does the process then write to that storage? The process only has 32 bit pointers, but uniquely identifying every byte in 5GB worth of storage would require at least 33 bits.

    Solving that problem is where things start to get a bit tricky.

    The 5GB of storage is split up into chunks, typically 4KB each, called “pages”. The operating system gives the process a 4GB “virtual address space” – over a million pages - which can be addressed by a 32 bit pointer. The process then tells the operating system which pages from the 5GB of on-disk storage should be “mapped” into the 32 bit address space. (How? Here’s a page where Raymond Chen gives an example of how to allocate a 4GB chunk and map a portion of it.)

    Once the mapping is done then the operating system knows that when the process #98 attempts to use pointer 0x12340000 in its address space, that this corresponds to, say, the byte at the beginning of page #2477, and the operating system knows where that page is stored on disk. When that pointer is read from or written to, the operating system can figure out what byte of the disk storage is referred to, and do the appropriate read or write operation.

    An “out of memory” error almost never happens because there’s not enough storage available; as we’ve seen, storage is disk space, and disks are huge these days. Rather, an “out of memory” error happens because the process is unable to find a large enough section of contiguous unused pages in its virtual address space to do the requested mapping.

    Half (or, in some cases, a quarter) of the 4GB address space is reserved for the operating system to store it’s process-specific data. Of the remaining “user” half of the address space, significant amounts of it are taken up by the EXE and DLL files that make up the application’s code. Even if there is enough space in total, there might not be an unmapped “hole” in the address space large enough to meet the process’s needs.

    The process can deal with this situation by attempting to identify portions of the virtual address space that no longer need to be mapped, “unmap” them, and then map them to some other pages in the storage file. If the 32 bit process is designed to handle massive multi-GB data storages, obviously that’s what its got to do. Typically such programs are doing video processing or some such thing, and can safely and easily re-map big chunks of the address space to some other part of the “memory file”.

    But what if it isn’t? What if the process is a much more normal, well-behaved process that just wants a few hundred million bytes of storage? If such a process is just ticking along normally, and it then tries to allocate some massive string, the operating system will almost certainly be able to provide the disk space. But how will the process map the massive string’s pages into address space?

    If by chance there isn’t enough contiguous address space then the process will be unable to obtain a pointer to that data, and it is effectively useless. In that case the process issues an “out of memory” error. Which is a misnomer, these days. It really should be an “unable to find enough contiguous address space” error; there’s plenty of memory because memory equals disk space.

    I haven’t yet mentioned RAM. RAM can be seen as merely a performance optimization. Accessing data in RAM, where the information is stored in electric fields that propagate at close to the speed of light is much faster than accessing data on disk, where information is stored in enormous, heavy ferrous metal molecules that move at close to the speed of my Miata. (**)

    The operating system keeps track of what pages of storage from which processes are being accessed most frequently, and makes a copy of them in RAM, to get the speed increase. When a process accesses a pointer corresponding to a page that is not currently cached in RAM, the operating system does a “page fault”, goes out to the disk, and makes a copy of the page from disk to RAM, making the reasonable assumption that it’s about to be accessed again some time soon.

    The operating system is also very smart about sharing read-only resources. If two processes both load the same page of code from the same DLL, then the operating system can share the RAM cache between the two processes. Since the code is presumably not going to be changed by either process, it's perfectly sensible to save the duplicate page of RAM by sharing it.

    But even with clever sharing, eventually this caching system is going to run out of RAM. When that happens, the operating system makes a guess about which pages are least likely to be accessed again soon, writes them out to disk if they’ve changed, and frees up that RAM to read in something that is more likely to be accessed again soon.

    When the operating system guesses incorrectly, or, more likely, when there simply is not enough RAM to store all the frequently-accessed pages in all the running processes, then the machine starts “thrashing”. The operating system spends all of its time writing and reading the expensive disk storage, the disk runs constantly, and you don’t get any work done.

    This also means that "running out of RAM" seldom(***) results in an “out of memory” error. Instead of an error, it results in bad performance because the full cost of the fact that storage is actually on disk suddenly becomes relevant.

    Another way of looking at this is that the total amount of virtual memory your program consumes is really not hugely relevant to its performance. What is relevant is not the total amount of virtual memory consumed, but rather, (1) how much of that memory is not shared with other processes, (2) how big the "working set" of commonly-used pages is, and (3) whether the working sets of all active processes are larger than available RAM.

    By now it should be clear why “out of memory” errors usually have nothing to do with how much physical memory you have, or how even how much storage is available. It’s almost always about the address space, which on 32 bit Windows is relatively small and easily fragmented.

    And of course, many of these problems effectively go away on 64 bit Windows, where the address space is billions of times larger and therefore much harder to fragment. (The problem of thrashing of course still occurs if physical memory is smaller than total working set, no matter how big the address space gets.)

    This way of conceptualizing virtual memory is completely backwards from how it is usually conceived. Usually it’s conceived as storage being a chunk of physical memory, and that the contents of physical memory are swapped out to disk when physical memory gets too full. But I much prefer to think of storage as being a chunk of disk storage, and physical memory being a smart caching mechanism that makes the disk look faster. Maybe I’m crazy, but that helps me understand it better.

    *************

    (*) OK, I lied. 32 bit Windows limits the total amount of process storage on disk to 16 TB, and 64 bit Windows limits it to 256 TB. But there is no reason why a single process could not allocate multiple GB of that if there’s enough disk space.

    (**) Numerous electrical engineers pointed out to me that of course the individual electrons do not move fast at all; it's the field that moves so fast. I've updated the text; I hope you're all happy with it now.

    (***) It is possible in some virtual memory systems to mark a page as “the performance of this page is so crucial that it must always remain in RAM”. If there are more such pages than there are pages of RAM available, then you could get an “out of memory” error from not having enough RAM. But this is a much more rare occurrence than running out of address space.

  • Fabulous Adventures In Russian

    Peter the Fabulous? I am pleased to announce that a bunch of our Russian MVPs who (1) are awesome people and (2) apparently have some free time on their hands, have started translating this blog into Russian, of all things. So if you want some Фантастичные приключения на русском, check it out!

    http://blogs.msdn.com/ruericlippert/default.aspx

    Thanks to all the Russian MVPs and particularly to developer evangelist Gaidar Magdanurov for organizing and overseeing this effort.

  • Alas, Smith and Jones

    We have a feature in C# which allows you to declare a "friend assembly". If assembly Smith says that assembly Jones is its friend, then code in Jones is allowed to see "internal" types of Smith as though they were public(*). It's a pretty handy feature for building a "family" of assemblies that share common implementation details.

    The compiler enforces one particularly interesting rule about the naming of friendly assemblies. If assembly Smith is a strong-named assembly, and Smith says that assembly Jones is its friend, then Jones must also be strong-named. If, however, Smith is not strong-named, then Jones need not be strong-named either.

    I'm occasionally asked "what's up with that?"

    When you call a method in Smith and pass in some data, you are essentially trusting Smith to (1) make proper use of your data; the data might contain sensitive information that you don't want Smith to misuse, and (2) take some action on your behalf. In the non-computer world an entity which you share secrets with and then empower to take actions on your behalf is called an "attorney" (**).  That got me thinking, which usually spells trouble.

    You want to hire an attorney. You go to Smith & Jones, Esqs. You meet with Smith, the trustworthy-looking man on the left of this photo:

    alassmithandjones

    Scenario (1):

    You check Smith's ID. It really is Mel Smith. You are contemplating giving Smith a secret, and empowering Smith to act on your behalf with the knowledge of that secret. Smith says "I share my secrets with my partner Griff Rhys Jones, whose identity you may now verify, here is Jones and here is Jones' ID. Jones will also be acting on your behalf. Jones has full access to all secrets which are internal to this organization."

    You decide that you trust both Smith and Jones, so you share your secrets with Smith. A series of wacky sketch comedy hijinks then ensues.

    (Smith and Jones are both strong-named assemblies. That is, they have "ID" you can verify. Smith's internals are visible to Jones. Everything is fine.)

    Scenario (2):

    You check the ID. It is Mel Smith. Smith says "By the way, I share my internal secrets with everyone in the world who claims their name is Jones, I don't bother to check IDs, I just trust 'em, and you should too! Everyone named Jones is trustworthy! Oh, and that includes my partner over here, Jones. Good old Jones! No, you can't check Jonesie's ID. Doesn't have any ID, that Jones."

    This is ludicrous. Obviously this breaks the chain of trust that you showed you were looking for when you checked Smith's ID in the first place.

    (The compiler keeps you from getting into this terrible situation by preventing Smith from doing that; Smith, a strong-named assembly, can only state that it is sharing his internal details with another strong-named assembly. Smith cannot share its internals with any assembly named Jones, a weakly-named assembly in this scenario. We restrict Smith from exposing internals to weakly-named assemblies so that Smith does not accidentally create this security hole or accidentally mislead you into believing that Smith's internals are in any way hidden from partially-trusted code.)

    Scenario (3):

    You don't bother to check Smith's ID. In fact, you give your secrets and power of attorney to anyone named Smith, regardless of who they actually are or where they came from. The first Smith you pick off the street says "By the way, I'll give my internal secrets to anyone named Jones". 

    Do you care?  Why would you?  You're already giving secrets away to anyone named Smith! If a con artist wants your secrets, he can pretend to be Jones and take them from Smith, or he can pretend to be Smith and get them from you directly, but either way, the problem is that you are giving away secrets and power of attorney to strangers off the street.

    (Smith and Jones are both weakly named assemblies, Smith exposes internals to Jones. This is perfectly legal, because if you don't care enough about who you're talking to to check IDs then what's the point of the security system preventing those random strangers from talking to each other?)

    **************
    (*) Fully trusted assemblies of course can always use reflection to see private and internal details of other assemblies; that is what "full trust" means. And in the new CLR security model, two assemblies that have the same grant set can see each other's private and internal details via reflection. But friend assemblies are the only mechanism that allow compile time support for peering at the internal details of another assembly.

    (**) or "delegate", which obviously I want to avoid because that means something technical in C#.

     

  • Bug Psychology

    Fixing bugs is hard.

    For the purposes of this posting, I’m talking about those really “crisp” bugs -- those flaws which are entirely due to a failure on the developer’s part to correctly implement some mechanistic calculation or ensure some postcondition is met. I’m not talking about oops, we just found out that the product name sounds like a rude word in Urdu, or the specification wasn’t quite right so we changed it or the code wasn’t adequately robust in the face of a buggy caller. I mean those bugs where you were asked to compute some value and you just plain get the result wrong for some valid inputs.

    Let me give you an example.

    The first bug I ever fixed at Microsoft as a full-time employee was one of those. To understand the context of the bug, start by reading this post from the early days of FAIC, and then come back.

    Welcome back, I hope you enjoyed that little trip down memory lane as much as I did.

    Now that you understand how a VT_DATE is stored, that explains this bizarre behaviour in VBScript:

    print DateDiff("h", #12/31/1899 18:00#, #12/30/1899 6:00#) / 24
    print DateDiff("h", #12/31/1899 18:00#, #12/29/1899 6:00#) / 24

    This prints –1.5 and –2.5, as you’d expect. There’s a day and a half between 6 AM December 30th and 6 PM December 31st, and two and a half days between the other two dates. This is perfectly understandable. But if you just subtract the dates:

    print #12/31/1899 18:00# - #12/30/1899 6:00#
    print #12/31/1899 18:00# - #12/29/1899 6:00#

    You get 1.5 and 3, not 1.5 and 2.5. Because of the bizarre date format that VT_DATE chooses, when you convert dates to numbers, you cannot safely subtract them if they straddle the magic zero date. That’s why you need the helpful “DateDiff”, “DateAdd” and so on, methods.

    The bug I was assigned was that testing had discovered a particular pair of dates which DateDiff was not subtracting correctly. I took a look at the source code for one of the helper methods that DateDiff used to do one of the computations it needed along the way. To my fresh-out-of-college eyes it looked something like this:

    if (frob(x) > 0 && blarg(y)) return x – y;
    else if (frob(x) < blarg(y) && blah_blah(x) > 0 || blah_de_blah_blah_blah(x,y)) return frob(x) – x + y + 1;
    else if…

    There were seven such cases.

    My urge was to dive right in and add an eighth special case that fixed the bug. But my ability to get it right in the face of all this complexity concerned me. It seemed like this was an awfully complicated function already for what it was trying to do.

    I researched the history of the code a bit and discovered that in fact variations on this bug had been entered… seven times. Each special case in the code corresponded to a particular bug that had been “fixed”, a term I use guardedly in this case. A great many of those “fixes” had actually introduced new bugs, regressing existing correct behaviour, which then in turn were “fixed” by adding special cases on top of the broken special cases that had been added to “fix” previous bugs.

    I decided that this coding horror would end here. I deleted all the code (all seven lines of it! I was bold!) and started over.

    Deep breath.

    Spec the code requirements first. Then design the code to meet the spec. Then write the code to the design.

    Spec:

    * Input: two doubles representing dates in VT_DATE format.
    * VT_DATE format: signed integer portion of double is number of days since 12/30/1899, unsigned fractional part is portion of day gone by.
    * For example: –1.75 = 12/29/1899, 6 PM.
    * Output: double containing number of days, possibly fractional, between two dates.  Differences due to daylight savings time, and so on, to be ignored.

    Design strategy:

    * Problem: Some doubles cannot simply be subtracted because negative dates are not absolute offsets from epoch time
    * Therefore, convert all dates to a more sensible date format which can be simply subtracted.

    Code:

    double DateDiffHelper(double vtdate1, double vtdate2)
    {
      return SensibleDate(vtdate2) – SensibleDate(vtdate1);
    }
    double SensibleDate(double vtdate)
    {
      // negative dates like –2.75 mean “go back two days, then forward .75 days”:
      // Transform that into –1.25, meaning “go back 1.25 days”.
      return DatePart(vtdate) + TimePart(vtdate);
    }

    I already had helper methods DatePart and TimePart, so I was done. The new code was shorter, far more readable, generated smaller, faster machine code and most important, was clearly correct. No special cases; no bugs.

    It’s not that my coworkers were dummies. Far from it. These were smart people. But computer geek psychology is such that it is very easy to narrow-focus on the immediately wrong thing, and try to tweak it until it does the right thing.

    When faced with these sorts of “crisp” bugs, I try to restrain myself from diving right in. Rather, I try to psychoanalyze the person – who is, of course, usually my past self – who caused the bug. I ask myself “how was the person who wrote the buggy code fooled into thinking it was correct?” Did they not have a clear specification of what the method was supposed to do? Was it misleading? Did they have a clear plan for how to proceed? If so, where did it go wrong?

    If there never was either a spec or a plan, then for all you know the whole thing might only be working by sheer accident. There could be any number of design flaws in the thing that just haven’t come to light yet. Editing such a beast means adding unknown to unknown. which seldom leads to good results. Sometimes coming up with a new spec, a new plan and scrapping an existing bug farm is the best way to proceed.

    For many years after that, I would ask how to implement DateDiffHelper as my technical question for fresh-out-of-college candidates that I was interviewing for the scripting dev team. I reasoned that if that was the sort of problem I was given on my first day in the office, then maybe that would be a reasonable question to ask a candidate.

    When you ask the same question over and over again, you really get to see the massive difference in aptitude between candidates. I had some candidates who just picked up a marker, wrote a solution straight out on the board, wrote down the test cases they’d use to verify it, mentally ran a few of the tests in their head, and then we’d have another half hour to chat about the weather. And I had some candidates who tried earnestly to write the version using special cases, despite my specifically telling them “you might consider transforming this bad format into something more pleasant to work with”, after they got stuck on the third special case. I’d point out a bug and immediately they’d write down code for another special case, rather than stopping to think about the fact that they’d just written buggy code three times already and told me it was correct three times.

  • When Five Hundred Posts You Reach

    … look this good you will not. But man, I tell you, the memory goes.

    In the July 1985 issue of Fantasy and Science Fiction, Isaac Asimov wrote:

    Yesterday I sat down to write my 321st essay for Fantasy and Science Fiction. […] It went swimmingly. I was pleased at the ease with which I worked out its construction. It practically wrote itself and I scarcely had to look anything up. I whistled while I worked. And then, when I reached the last page and launched into my climactic paragraphs, I thought to myself: Why does this suddenly sound familiar to me? […] Hoping earnestly that my memory had misfired, I looked it up. It turned out to be essay #182, first published in the December 1973 issue. There it was. That earlier essay was essentially what I had just written.

    The other day I wrote a blog post. It went great. It was strongly opinionated and about an interesting topic. I did a web search to make sure that I had a particular fact straight and… I found my own previous article on the subject. Apparently the late Dr. Asimov and I have at least one thing in common. (I also have something in common with writer Neil Gaiman, but that’s another story.)

    It seemed to me looking back that I started off with a huge backlog of old emails answering real user questions which I could use for blog articles, and indeed, when I got started there, I was posting a couple a day. And then over the years I both got busy with other things (writing and editing books, for example) and ran out of canned emails to turn into blog posts, so my rate of writing dropped off. And then lately it seems like I’ve gotten more of a groove back and am doing a couple a week.

    I decided to see whether my memory was correct by computing the twenty-post moving average of number of days between successive posts: (Yes, I am a geek, but you already knew that.)

    blogchart

    Yep, I think in this case my memory was correct.

    Five hundred posts from me and six thousand comments from you all in less than six years seems like rather a lot. I want to take this opportunity to say thanks to everyone who has given me great feedback, interesting questions and technical and moral support over the years. You rock.

    Enough chit-chat. On to more fabulous adventures!

  • What Would Tufte Do?

    What is this a chart of? I'll post the answer tomorrow.

    mysterious chart

    UPDATE: Someone has already correctly deduced the answer. (And man, that was fast!) So don't read the comments if you don't want spoilers.

     

More Posts Next page »

This Blog

Syndication


© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker