What does the optimize switch do?

What does the optimize switch do?

Rate This
  • Comments 40

I was asked recently exactly what optimizations the C# compiler performs when you specify the optimize switch. Before I answer that, I want to make sure that something is perfectly clear. The compiler’s “usage” string is not lying when it says:

/debug[+|-]     Emit debugging information
/optimize[+|-]  Enable optimizations

Emitting debug information and optimizing the generated IL are orthogonal; they have no effect on each other at all (*). The usual thing to do is to have debug on and optimizations off, or vice versa, but the other two combinations are perfectly legal. (And while I’m at it, turning debug information generation on does not also do /d:DEBUG; that’s also orthogonal. Again, the sensible thing to do is to keep /debug and /d:DEBUG in sync, but you do not have to; you might want to debug the optimized version without assertions and we’re not going to stop you.)

From this point on, when I say “emitting” I mean “producing metadata”, and when I say “generating” I mean “producing IL”.

So, first off, there are some optimizations that the compiler always performs, regardless of the optimize flag.

The language semantics require us to do constant folding because we need to detect that this is an error:

switch(x) {
case 2:
case 1+1:

The compiler will replace usages of constant fields with the constant, but does not perform any more sophisticated constant propagation, even though the compiler has a reachability analyzer.

Speaking of which, there are two kinds of reachability analysis we perform for dead code elimination. First, there is the “according to spec” reachability. The specification calls out that reachability analysis consumes compile-time constants. This is the reachability analysis we use to give “unreachable code” warnings, to determine whether local variables are definitely assigned on all code paths, and to determine whether the end point of a non-void method is reachable. If this first pass determines that code is unreachable then we never generate IL for it. For example, this is optimized away:

if (false) M();

But if the expression involves non-compile-time constants then this first form of reachability analysis would tell you that the call to N here is reachable, and therefore is a definite assignment error:

int x = M();
int y;
if (x * 0 != 0) N(y);

If you fixed that up so that y was definitely assigned, then the first-pass reachability analyzer would NOT trim this code because it still thinks that the call is reachable.

But clearly you and I know that the call is not reachable, because we know more about arithmetic than the specified reachability analyzer. We perform a second pass of reachability analysis during code generation which is much smarter about these things. We can do that because the second pass only affects codegen, it does not affect language semantics, error reporting, and so on.

The existence of that second pass implies that we do a simple arithmetic optimizations on expressions which are only partially constant. For example, if you have a method M that returns an integer, then code like

if (M() * 0 == 0) N();

can be generated as though you’d written just:

M();
N();

We have lots of simple number and type algebra optimizers that look for things like adding zero to an integer, multiplying integers by one, using “null” as an argument to “is” or “as”, concatenation of literal strings, and so on. The expression optimizations always happen, whether you’ve set the optimize flag or not; whether basic blocks that are determined to be unreachable after those optimizations are trimmed depends on the flag.

We also perform some small optimizations on some call sites and null checks. (Though not as many as we could.) For example, suppose you have a non-virtual instance method M on a reference type C, and a method GetC that returns a C. If you say GetC().M() then we generate a callvirt instruction to call M. Why? Because it is legal to generate a callvirt for an instance method on a reference type, and callvirt automatically inserts a null check. The non-virtual call instruction does not do a null check; we'd have to generate extra code to check whether GetC returns null. So that's a small code size optimization. But we can do even better than that; if you have (new C()).M(), we generate a call instruction because we know that the result of the "new" operator is never null. That gives us a small time optimization because we can skip the nanosecond it takes to do the null check. Like I said, it's a small optimization.

The /optimize flag does not change a huge amount of our emitting and generation logic. We try to always generate straightforward, verifiable code and then rely upon the jitter to do the heavy lifting of optimizations when it generates the real machine code. But we will do some simple optimizations with that flag set. For example, with the flag set:

  • Expressions which are determined to be only useful for their side effects are turned into code that merely produces the side effects.
  • We omit generating code for things like int foo = 0; because we know that the memory allocator will initialize fields to default values.
  • We omit emitting and generating empty static class constructors. (Which typically happens if the static constructor set all the fields to their default value and the previous optimization eliminated all of them.)
  • We omit emitting a field for any hoisted locals that are unused in an iterator block. (This includes that case where the local in question is used only inside an anonymous function in the iterator block, in which case it is going to become hoisted into a field of the closure class for the anonymous function. No need to hoist it twice if we don’t need to.)
  • We attempt to minimize the number of local variable and temporary slots allocated. For example, if you have:

for (int i = …)  {…}
for (int i = …) {…}

then the compiler could generate code to re-use the local variable storage reserved for i when the second i comes along. (We eschew this optimization if the locals have different names because then it gets hard to emit sensible debug info, which we still want to do even for the optimized build. However, the jitter is free to perform this optimization if it wishes to.)

  • Also, if you have a local which is never used at all, then there is no storage allocated for it if the flag is set.
  • Similarly, the compiler is more aggressive about re-using the unnamed temporary slots sometimes used to store results of subexpression calculations.
  • Also, with the flag set the compiler is more aggressive about generating code that throws away “temporary” values quickly for things like controlling variables of switch statements, the condition in an “if” statement, the value being returned, and so on. In the non-optimized build these values are treated as unnamed local variables, loaded from and stored to specific locations. In the optimized build they can often be just kept on the stack proper.
  • We eliminate pretty much all of the “breakpoint convenience” no-ops.
  • If a try block is empty then clearly the catch blocks are not reachable and can be trimmed. (Finally blocks of empty tries are preserved as protected regions because they have unusual behaviour when faced with certain exceptions; see the comments for details.)
  • If we have an instruction which branches to LABEL1, and the instruction at LABEL1 branches to LABEL2, then we rewrite the first instruction as a branch straight to LABEL2. Same with branches that go to returns.
  • We look for “branch over a branch” situations. For example, here we go to LABEL1 if condition is false, otherwise we go to LABEL2.

    brfalse condition, LABEL1
    br LABEL2
    LABEL1: somecode

    Since we are simply branching over another branch, we can rewrite this as simply "if condition is true, go to LABEL2":

    brtrue condition, LABEL2
    somecode
  • We look for “branch to nop” situations. If a branch goes to a nop then you can make it branch to the instruction after the nop.
  • We look for “branch to next” situations; if a branch goes to the next instruction then you can eliminate it.
  • We look for two return instructions in a row; this happens sometimes and obviously we can turn it into a single return instruction.

That’s pretty much it. These are very straightforward optimizations; there’s no inlining of IL, no loop unrolling, no interprocedural analysis whatsoever. We let the jitter team worry about optimizing the heck out of the code when it is actually spit into machine code; that’s the place where you can get real wins.


(*) A small lie. There is some interaction between the two when we generate the attributes that describe whether the assembly is Edit’n’Continuable during debugging.

  • "because we know that the result of the "new" operator is never null" - well, there is a corner-case... - see: http://stackoverflow.com/questions/194484#194671

  • Re: null corner case

    That's certainly an unexpected (but logical) case, but Eric started with "suppose you have a non-virtual instance method M on a reference type C". The optimisation should still be valid, because Nullable<T> is not a reference type. :)

    How I wish there were support for non-nullable reference types in C#. I've tried Spec# out for a while, which does support it (with the obvious syntax of T! for a non-nullable T), and it works brilliantly. Everywhere I normally insert an "if (foo == null) throw new ArgumentNullException("foo");" I could specify that foo is a non-nullable parameter instead, and never have to worry about it again.

    On a related note, why are structs required to have a public parameterless constructor instead of requiring that the programmer initializes the structs himself? If there wasn't a way for everyone to create a struct with all its fields set to default values, I could sort of implement non-nullable types by having a light-weight struct that encapsulates a single reference, with a constructor that rejects null values for the field.

  • Interesting Finds: June 12, 2009

  • "The non-virtual call instruction does not do a null check; we'd have to generate extra code to check whether GetC returns null."

    What happens if the null check is omitted? The IL is then invalid?

    I encourage you to read the specification when you have questions like that. But because I am in a charitable mood, I'll reproduce the relevant line of the CIL spec, Partition III, here:

    "For a typical use of the call instruction, verification checks that (a) method refers to a valid methodref or methoddef token; (b) the types of the objects on the stack are consistent with the types expected by the method call, and (c) the method is accessible from the callsite, and (d) the method is not abstract"

    So, no, this would not be a verification issue. Why do you imagine that this would be a verification issue? There's no way that type safety can be compromised by passing a null "this", is there? callvirt needs to crash if "this" is null because then there is no virtual function table from which to determine the correct call, but if a non-virtual method call doesn't dereference "this" then it is perfectly verifiable to call it with a null "this". -- Eric

  • @marc.gravell

    It's not the new operator that returns null, new can never return null, it's the box operator that returns null, boxing a Nullable<T> value, has special semantics in the run time that will box a all structures that have the .HasValue field set to false, to a null reference.

    @Joren

    Unless non nullable references are implemented by .Net team and throughout the framework, that non nullable struct you try to implement would only be a clever trick for null checking. You haven't solved or reduced the possibility of a null reference exception in any way, you just moved the throwing of NullReferenceException to another place, maybe more convenient.

    Also a question, you want structures not to have a public parameterlss constructor, well then what happens when such a structure is a used as a field of an object, how can it be initialized?

    Because the default value is not a valid value for the structure, as you stated.

    You immediately need more (more complicated) semantics in the CLR itself to support such structures, the semantics of having structures that are in a valid or invalid state (initialized or not initialized with a valid value), and having those semantics means the non null reference issue is solved already, as it's a special case for valid values of a structure :) (take structure not as struct, but in the general sense, value reference etc).

  • "why are structs required to have a public parameterless constructor instead of requiring that the programmer initializes the structs himself"

    structs are *not* allowed to have parameterless constructors, this is a good thing since structs can be created in their 'blank' state by the creation of either default(T) or new T[]. This will not call a constructor so it avoid confusion.

    If you really need a constructor that does something similar consider having a private one that takes object and pass null to it.

  • Eric are you planning to mention also emitting Debuggable attribute with different values.

    This, of course, effects JIT a lot. Or you just planning a new post?

  • @Pop Catalin

    You're absolutely right about the CLR needing support for it if it were to be this way. Obviously this is not a simple change to make now, and I don't think anyone should bother (if people would be making changes to the CLR type system, I'd rather have true non-nullable reference types instead of subtly different structs) but I'm wondering why it wasn't designed like this in the first place.

    It is obvious that it is necessary for the CLR to be allowed to construct a struct with its fields initialized to defaults, if you want to be able to declare a variable of any struct type without initializing the variable.

    But I don't see why that is necessary. What is the point of allowing you to declare but not initialize a struct variable? Why would you allow the use of unassigned variables at all?

    @ShuggyCoUk

    >structs are *not* allowed to have parameterless constructors

    That's not true. You are not allowed to define parameterless constructors, but all structs have them implicitly.

    >If you really need a constructor that does something similar consider having a private one that takes object and pass null to it.

    How would that help me prevent the creation of a struct with its fields initialized to default values?

  • @ Joren's "why are structs required to have a public parameterless constructor"

    Allow me to restate the core point already suggested by Pop Catalin can ShuggyCoUk. This goes back to Eric's recent posts about the essential difference between value and reference types, doesnt' it? Structs are value types, so even having a reference to one requires an initialized instance, hence the requirement of a parameterless constructor.  Null reference semantics is a luxury reserved for reference types, by the nature of the difference between value and reference.  This is why Nullable<> must exist, i.e., why null reference semantics can't be given to value types inherently.

  • @Joren "What is the point of allowing you to declare but not initialize a struct variable?"

    I think you wanted the subjunctive, i.e., what *would" be the point of that, since it's hypothetical.  In fact, by the nature of value types, declaration of a struct variable entails initialization.  Cf Eric's recent posts:

    http://blogs.msdn.com/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx

    http://blogs.msdn.com/ericlippert/archive/2009/05/04/the-stack-is-an-implementation-detail-part-two.aspx

    You're asking to "prevent the creation of a struct with its fields initialized to default values".  This would be a breaking change to fundamental idioms.  To take a concrete example, consider an array of value types vs an array of reference types:

    var a = new MyValueType[1];

    var b = new MyReferenceType[1];

    Each array's item is initializes to the default value appropriate to its type. For b[0] that's simply null, but a[0] holds an instance of MyValueType, right from the get-go.   Where is that instance going to come from, if not a default constructor?  

  • > To generate non-static methods we need to generate them on some class. What class? We only have expression trees and statement trees; what we need are declaration trees.

    Eric, I was specifically talking about Expression.CompileToMethod(MethodBuilder) - http://msdn.microsoft.com/en-us/library/dd728224(VS.100).aspx - which already gets the MethodBuilder for a particular TypeBuilder. So I can produce the "declaration tree" using TypeBuilder/MethodBuilder as needed, and then "fill in" the methods with Expression.CompileToMethod.

    The problem is that:

    1) It explicitly checks that MethodBuilder is for a static method, and throws otherwise (not sure why, since I don't see why the generated code would be any different - it would just ignore the implicit receiver).

    2) Expression API not provide any way to reference "this" in the tree.

    Admittedly, I might well be trying to use this for a purpose it wasn't even remotely intended for, and the limitations I'm running into aren't really limitations for the real scenario. But I'm genuinely curious who this overload is intended for, then. Is it there just so that we can compile expression trees to AppDomains other than the current one?

  • [quote]The existence of that second pass implies that we do a simple arithmetic optimizations on expressions which are only partially constant. For example, if you have a method M that returns an integer, then code like

    if (M() * 0 == 0) N();

    can be generated as though you’d written just:

    M();

    N(); [/quote]

    Enter Mr. Evil Programmer (disregard any syntax errors, I haven't done operator overloading for a long long time.):

    public static int operator * ( Foo foo , int multiplier ) { return 1; }

    in a certain class returned by M( ).

    But M returns an integer, by assumption. If M() doesn't return an integer then we do not do this optimization.

     

    I believe you made a typo saying that, otherwise, its a serious bug. I am going to try anyway. :)

  • Thank you, Eric.  Sorry for not doing my homework. I thought the answer would be more complicated and was not sure where to look. To complement what you said, the C# 3.0 Specification says in 7.4.4 Function member invocation:

    The value of E is checked to be valid. If the value of E is null, a System.NullReferenceException is thrown and no further steps are executed.

  • @Mark Knell

    >I think you wanted the subjunctive, i.e., what *would" be the point of that, since it's hypothetical.  In fact, by the nature of value types, declaration of a struct variable entails initialization.

    You're right, but I meant "What is the point of allowing you to declare but not *personally* initialize a struct variable?".

    Your array example illustrates the point sufficiently though. I hadn't thought of that.

    >This is why Nullable<> must exist, i.e., why null reference semantics can't be given to value types inherently.

    I never wanted null for structs. I was talking about not allowing you to specifiy storage of a struct type without personally guaranteeing the storage is initialized sometime, similar to how that happens for local variables.

    You've shown that arrays make this inconvenient, so there is really no point in thinking about it further.

    @Tanveer Badar

    But Eric required M to return an integer, not just any Foo. You can't overload the int * int operator, so the optimization works.

  • What about things like common subexpression elimination and loop invariant hoisting? Those seem like things that the JITter could do, but would be applicable to all environments so the compiler could do it.

    Very few "optimizations" are guaranteed to always be improvements. Both of your examples trade increased use of temporary slots for decreased time: the classic space vs time tradeoff. Some CPUs have huge numbers of available registers, so generating more temporaries is usually a win.

    But on CPUs with a small number of available registers (a large number of our users use computers with x86 architecture!) sometimes generating more temporaries means that you need to move things that would have otherwise always been in registers onto the stack, and then back into registers again later. If the total cost of those data moves is larger than the cost of doing the calculation multiple times, then this optimization just made the codegen worse.

     We do not know what the register architecture is going to be when we generate the IL; therefore we leave decisions like that up to the jitter. -- Eric

Page 2 of 3 (40 items) 123