Compound Assignment, Part One

Compound Assignment, Part One

Rate This
  • Comments 28

When people try to explain the compound assignment operators += –= *= /= %= <<= >>= &= |= ^= to new C# programmers they usually say something like “x += 10; is just a short way of writing x = x + 10;”. Now, though that is undoubtedly true for a local variable x of type int, that’s not the whole story, not by far. There are actually many subtle details to the compound assignment operators that you might not appreciate at first glance.

First off, suppose the expression on the left hand side has a side effect or is expensive to call. You only want it to happen once:

class C
{
  private int f;
  private int P { get; set; }
  private static C s = new C();
  private static C M()
  { 
    Console.WriteLine("Hello");
    return s;
  }
  private struct Evil
  {
      public int f; // Mutable value type with a public field, evil!
      public int P { get; set; }
  }
  private static Evil[] evil = new Evil[2000];
  private static Evil[] N()
  {
    Console.WriteLine("Badness");
    return evil;
  }

If somewhere inside C you have M().f += 10; then you only want M’s side effect to happen once. This is not the same as M().f = M().f + 10;

What is it the same as then? How about this:

C receiver = M();
receiver.f = receiver.f + 10;

Is that right? It seems to be, but suppose we make it a bit more complicated. Suppose we have N()[123].f += 10; . Is this then

Evil receiver = N()[123];
receiver.f = receiver.f + 10;

Clearly not.  We've made a copy of the contents of variable N()[123] and we are now mutating the variable containing the copy but we need to be mutating the original.

Once more we see how much pure concentrated evil mutable value types are!

To express the real semantics concisely we need a feature that C# does not have, namely, “ref locals”. C# has ref-typed parameters, but not ref-typed locals. When you make a ref-typed parameter essentially you are saying “this parameter is an alias for this variable”:

    void N(ref int x) { x = 10; }
    …
    N(ref M().f);

That says “Evaluate the expression as a variable and then make the variable x refer to the same storage location as the variable”. Suppose we had the ability to do that with locals instead of just parameters. That is, we can make a local variable that is an alias for a (possibly non-local) variable. Then M().f += 10 would be equivalent to:

ref int variable = ref M().f;
variable = variable + 10;

And thus the side effect of M only happens once. Similarly  N()[123].f += 10; where the array is of mutable value type becomes

ref int variable = ref N()[123].f;
variable = variable + 10;

and the side effect of N only happens once, and we mutate the field of the correct variable.

C# does not have the “ref local” feature though we could implement it if we wanted to; the CLR supports it. I think we have higher priorities though.

What if instead of a variable we modified a property?

M().P += 10;

You might again think that this is a a syntactic sugar for

C receiver = M();
receiver.P = receiver.P + 10;

which is of course a syntactic sugar for:

C receiver = M();
receiver.set_P(receiver.get_P() + 10);

Again, we only want the side effect exercised once, though of course we have to call two different methods for the getter and the setter; that’s unavoidable.

But again, we have a problem if the receiver is a variable of value type. If we have  N()[123].P += 10; then we have to generate

ref Evil receiver = ref N()[123];
receiver.set_P(receiver.get_P() + 10);

So that we make sure that the mutable value type property we're invoking is on the right variable.

Similarly if we had an indexer defined on C:

M()[X()] += 10;

Now we have to keep track of both the receiver and the index to make sure they are not evaluated twice. That’s the same as:

C receiver = M();
int index = X();
receiver[index] = receiver[index] + 10;

and of course just as with properties, those too are just syntactic sugars for calls, and again, we need to make sure we get the refness right if the receiver is a variable of a mutable value type.

And similarly with += –= on events, though of course those are different because they are syntactic sugars for event add and remove methods.

Anyway, I don’t think I need to further belabour the point that side effects are only computed once and that determining the correct location to mutate is not as easy as you might think.

Another interesting aspect of the predefined compound operators is that if necessary, a cast – an allegedly “explicit” conversion – is inserted implicitly on your behalf. If you say

short s = 123;
s += 10;

then that is not analyzed as s = s + 10 because short plus int is int, so the assignment is bad. This is actually analyzed as

s = (short)(s + 10);

so that if the result overflows a short, it is automatically cut back down to size for you.

A final subtle point is that for the predefined operators if the assignment without the compounded operation would not have been legal, then the compound assignment is not legal either. If you say

int i = 10;
short s = 123;
s += i;

then that’s not legal because s = i is not legal.

Those design details are interesting in of themselves; next time we’ll see how some of these subtleties affect some proposed extensions to the language.

  • I discovered that

    short s = 123;

    s += 10;

    is treated as

    s = (short)(s + 10);

    a few years back.

    I ended up submitting a bug regarding Mono's handling of compound assignments in some cases when the type was smaller than int:

    lists.ximian.com/.../051214.html

    It was fixed some time ago, but there were some interesting comments on it:

    +We do have the extra conversions as specified in the spec, but they

    +happen too late in our pipeline.

    +

    +The problem is that we resolve target as an LValue before we consider

    +the allowed explicit conversions.   This is a bit of a chicken and egg

    +problem in our code, because the way we consider if the special rules

    +for implicit conversions that apply during compound assignment depend

    +on the lvalue being resolved (the "target" value in the source code).

    +

    +Not easy.

  • On a somewhat related matter:

    When you have a Indexer property on a class that returns a struct, I am often inclined to write:

    MyObject[i].f += 1;  

    This won't work, it won't even compile. For the right reasons, i suppose.  C# doesn't allow methods to return refs, and i am aware that i should not hold my breath until you do. But it would be fabulous to understand why that must be so...

  • Ferdinand: Are you asking why methods can't returns refs? It's because you could end up returning a reference to a local variable, which is on the stack. Once your method returns, though, that local variable ceases to exist. Do you understand why now?

    It's actually a fairly common bug in C code for a function to return a  pointer to a local variable. It often leads to unusual results because the value can stay there on the stack for some time before being overwritten by something else.

    ====

    Ah, but it is legal in the CLR for a method to return a ref. Now, it should not be verifiable for a method to return a ref to a local that is going away, as you note. But it is possible to determine when a method *definitely* returns a ref to a local that is going away, or *possibly* returns a ref to a local that is going away, and make that code not verifiable.

    Obviously it is easy to determine if a method definitely returns a ref to its own local; just do a flow analysis and see where the address that is being returned on the stack came from. The more complicated scenario is:


    struct S { public int f; }
    ref int M()
    {
      S s = new S(); 
      return ref N(ref s);
    }
    ref int N(ref S s) { return ref s.f; }

    See the problem? M indirectly returns a reference to s.f, but s.f is on the stack of the call to M! N is fine; the verifier knows that even if s.f is on the stack, it is on the stack *lower* than N. The verifier would have to be written to handle this situation and disallow it.

    - Eric

  • @Gabe: Given the CLR type system, you could restrict ref-typed C# expressions (return or otherwise) to the typesafe subset fairly easily: in particular, if C is an arbitray reference typed (including arrays) expr and F is a access of a value typed field the following form is safe: "ref" C ("." F | "[" expr "]")*. Notably, returning ref arguments should be illegal - otherwise this is permissible:

    ref int Inner(ref int a) { return a; }

    ref int Outer(int a) { return Inner(ref a); }

    Which returns a ref to an argument which has been popped off the stack. Of course, Inner could be defined in a C++/CLI assembly, but it would not be typesafe (/clr:safe) in that case.

    OT: One of the more annoying bugs in Adobe's Actionscript 3 compiler is it naively expanding "e += v" to "e = e + v", duplicating arbitrary expressions on the LHS. No matter how unreasonable I feel your hatred of mutable structs is, Eric, I'm continuously thankful you can make a compiler that *works correctly*.

  • Just as I thought! Once you get it, it's fabulous!

  • Nice article.. though i could only understand whole of it by going through it at least twice. :)

    And sorry but your comment confused me even more. :(

    I'm hoping you will explain it (as  you've said) in your update.

  • Eric, not as long as a week ago, we are debating the miss of the "ref" keyword also for inline declarations.

    The problem on a desktop platform maybe is not important, but we are talking around the .Net Micro Framework. On that environment the memory usage and the performance is very important. That's because I have realized that the only way to alter a structure is to pass it as "ref" on a dedicated function. Why there's no other way?

    Well, it was only a curiosity. I never missed that gap, also because structs scare me a lot.

    OK, nothing else. Just for your info.

    Cheers

  • Is the "ref local" feature available through reflection? How could "N()[123].f += 10" be executed efficiently through reflection?

  • @Rising - I struggled with the concept of "ref local" until I thought about it in terms of a call to a method:

    static void CompoundAdd(ref int variable, int i)

    {  variable = variable + i;  }

    N()[123].f += 10; would be replaced with:

    CompoundAdd(ref C.N()[123].f, 10);

    Basically, the "ref local" feature would allow you to inline the CompoundAdd method.

  • Don't worry mutable structs. I still love you.

  • If mutable structs would require a ref local for fields, why do properties work for the same issue? Wouldn't they have the same problem?

  • @Eric

    > Ah, but it is legal in the CLR for a method to return a ref. Now, it should not be verifiable for a method to return a ref to a local that is going away, as you note. But it is possible to determine when a method *definitely* returns a ref to a local that is going away, or *possibly* returns a ref to a local that is going away, and make that code not verifiable.

    Interestingly enough, if looking at this strictly from Ecma-335 perspective, it's legal for a method to be declared as returning a ref in verifiable code, but it's not legal to call one. P III, 1.8.1.2.1 "Verification types":

    "A method can be defined as returning a managed pointer, but calls upon such methods are not verifiable. When returning byrefs, verification is done at the return site, not at the call site.

    [Rationale: Some uses of returning a managed pointer are perfectly verifiable (e.g., returning a reference to a field in an object); but some not (e.g., returning a pointer to a local variable of the called method). Tracking this in the general case is a burden, and therefore not included in this standard. end rationale]"

    Now in practice, .NET extends CLI verification rules such that calling methods like that is verifiable, but the body of the method may or may not be depending on where the ref comes from. However, this is an extension, not part of the standard, so conforming assemblies may not rely on it.

    In fact, .NET rule for this is actually subtly incompatible with CLI verification rule, because, according to the latter, a byref-returning method that is never called would always be verifiable - which will not be the case with .NET.

  • I second Chad's motion for an "??=" operator. It's weird that C# has this inconsistency (that almost any operator except ?? can participate in compound assignment.) Of course I would also like a conditional dot operator: loyc-etc.blogspot.com/.../i-want-conditional-dot-operator.html - oh well.

Page 2 of 2 (28 items) 12