Compound Assignment, Part One

Compound Assignment, Part One

Rate This
  • Comments 28

When people try to explain the compound assignment operators += –= *= /= %= <<= >>= &= |= ^= to new C# programmers they usually say something like “x += 10; is just a short way of writing x = x + 10;”. Now, though that is undoubtedly true for a local variable x of type int, that’s not the whole story, not by far. There are actually many subtle details to the compound assignment operators that you might not appreciate at first glance.

First off, suppose the expression on the left hand side has a side effect or is expensive to call. You only want it to happen once:

class C
{
  private int f;
  private int P { get; set; }
  private static C s = new C();
  private static C M()
  { 
    Console.WriteLine("Hello");
    return s;
  }
  private struct Evil
  {
      public int f; // Mutable value type with a public field, evil!
      public int P { get; set; }
  }
  private static Evil[] evil = new Evil[2000];
  private static Evil[] N()
  {
    Console.WriteLine("Badness");
    return evil;
  }

If somewhere inside C you have M().f += 10; then you only want M’s side effect to happen once. This is not the same as M().f = M().f + 10;

What is it the same as then? How about this:

C receiver = M();
receiver.f = receiver.f + 10;

Is that right? It seems to be, but suppose we make it a bit more complicated. Suppose we have N()[123].f += 10; . Is this then

Evil receiver = N()[123];
receiver.f = receiver.f + 10;

Clearly not.  We've made a copy of the contents of variable N()[123] and we are now mutating the variable containing the copy but we need to be mutating the original.

Once more we see how much pure concentrated evil mutable value types are!

To express the real semantics concisely we need a feature that C# does not have, namely, “ref locals”. C# has ref-typed parameters, but not ref-typed locals. When you make a ref-typed parameter essentially you are saying “this parameter is an alias for this variable”:

    void N(ref int x) { x = 10; }
    …
    N(ref M().f);

That says “Evaluate the expression as a variable and then make the variable x refer to the same storage location as the variable”. Suppose we had the ability to do that with locals instead of just parameters. That is, we can make a local variable that is an alias for a (possibly non-local) variable. Then M().f += 10 would be equivalent to:

ref int variable = ref M().f;
variable = variable + 10;

And thus the side effect of M only happens once. Similarly  N()[123].f += 10; where the array is of mutable value type becomes

ref int variable = ref N()[123].f;
variable = variable + 10;

and the side effect of N only happens once, and we mutate the field of the correct variable.

C# does not have the “ref local” feature though we could implement it if we wanted to; the CLR supports it. I think we have higher priorities though.

What if instead of a variable we modified a property?

M().P += 10;

You might again think that this is a a syntactic sugar for

C receiver = M();
receiver.P = receiver.P + 10;

which is of course a syntactic sugar for:

C receiver = M();
receiver.set_P(receiver.get_P() + 10);

Again, we only want the side effect exercised once, though of course we have to call two different methods for the getter and the setter; that’s unavoidable.

But again, we have a problem if the receiver is a variable of value type. If we have  N()[123].P += 10; then we have to generate

ref Evil receiver = ref N()[123];
receiver.set_P(receiver.get_P() + 10);

So that we make sure that the mutable value type property we're invoking is on the right variable.

Similarly if we had an indexer defined on C:

M()[X()] += 10;

Now we have to keep track of both the receiver and the index to make sure they are not evaluated twice. That’s the same as:

C receiver = M();
int index = X();
receiver[index] = receiver[index] + 10;

and of course just as with properties, those too are just syntactic sugars for calls, and again, we need to make sure we get the refness right if the receiver is a variable of a mutable value type.

And similarly with += –= on events, though of course those are different because they are syntactic sugars for event add and remove methods.

Anyway, I don’t think I need to further belabour the point that side effects are only computed once and that determining the correct location to mutate is not as easy as you might think.

Another interesting aspect of the predefined compound operators is that if necessary, a cast – an allegedly “explicit” conversion – is inserted implicitly on your behalf. If you say

short s = 123;
s += 10;

then that is not analyzed as s = s + 10 because short plus int is int, so the assignment is bad. This is actually analyzed as

s = (short)(s + 10);

so that if the result overflows a short, it is automatically cut back down to size for you.

A final subtle point is that for the predefined operators if the assignment without the compounded operation would not have been legal, then the compound assignment is not legal either. If you say

int i = 10;
short s = 123;
s += i;

then that’s not legal because s = i is not legal.

Those design details are interesting in of themselves; next time we’ll see how some of these subtleties affect some proposed extensions to the language.

  • Forgive my ignorance, but how is

    | ref int variable = ref M().f;

    | variable = variable + 10;

    not equivalent to

    | C receiver = M();

    | receiver.f = receiver.f + 10;

  • I also didn't understand

  • I was going to ask the same thing as Patrick...

  • @Patrick: The resolutuion of receiver.f may have side effects, which would be run twice in your latter example, but only once in the former.

  • @Dave: Could you give an example of how the resolution of a field can have side effects?

  • @Eric - Another fascinating article - I never realized compound assignment was so complicated.  Keep up the great posts.

    I was a little confused by the first example also.  Why use the term "ref locals"?  Wouldn't "f" normally be referred to as a "field" of class C?

  • The only such situation I can think of is if f is not a field but a property, in which case nothing gets run twice - the getter runs once and the setter runs once.

  • @Patrick

    Say f is a property.

    receiver.f will call receiver.f.get()

    Which could do:

    get

    {

      FirePropertyAccessedEvent();

      return value;

    }

    In the first case I think this would be fired once, in the second case it would be fired twice.

  • @Random832

    You're right - it would only be run once - so ignore my previous post :)

  • private property P { get; set; }

    Should be:

    private int P { get; set; }

    Whoops, that was a silly editing error. Thanks. -- Eric

  • The last point (conversions) it interesting as it relates to the behaviour of dynamic. I'd be interested in knowing what readers would expect the result of this to be:

       dynamic d = (byte) 100;
       Console.WriteLine(d.GetType()); // Prints System.Byte
       d += 30;
       Console.WriteLine(d); // ?

    On a different matter, I make it a habit to guess what the second parts of "teaser" posts like this are going to contain. I suspect other readers do too. In this case I'm going to guess at tuple support:

       Tuple<int, int> tuple = Tuple.Create(10, 20);
       tuple += Tuple.Create(20, 30);

    That's more through a *hope* that tuples are going to get some love soon than any real insight though...

  • Apparently a number of people have been confused as to why there's a need for "ref locals" rather than simply capturing the receiver. I apologize; it was clear in my head but I neglected to explain it. The reason is of course the bane of our existence: mutable value types. Capturing the receiver turns a variable of value type into a different variable of value type, and therefore can change which copy is mutated. I'll update the text to explain this. Apologies for the unclarity. -- Eric

  • "that’s not legal because s = i is not legal." Not so. DateTime d = DateTime.UtcNow; TimeSpan s = new TimeSpan(1,0,0); d += s;

    I suspect that the _real_ reason it isn't legal is because the s += 10 case is some kind of special rule for integer literal expressions, similar to the rule that allows s = 10 itself.

    I was missing a word in the text; I should have noted that this rule only applies to the "predefined" operators that are built in to the language. It does not apply to user-defined operators, like that defined between DateTime and TimeSpan. Thanks for the note. -- Eric

  • "Capturing the receiver turns a variable of value type into a different variable of value type, and therefore can change which copy is mutated."

    What this doesn't clarify is when can you A) have side effects and B) not have already made a copy that won't outlive the expression.

    The answer is when you've got an expression with side effects that returns a reference type, which you then proceed to assign to a field within a [possibly multiply nested] valuetype field* of that object. Which suggests an obvious solution - capture that intermediate 'receiver'. If you use an expression with side effects to get an index to an array element, capture that too.

    *I am loosely defining array elements to be "fields" for the purpose of this.

    Sure, that would work. I note that I'm looking for a concise way to specify the rule. And of course, that concise thing is in practice what we actually emit during IL generation; we make a reference to the variable and put the ref on the stack for later. -- Eric

    **this ignores that the expression may involve a method that returns a value type by reference, but it's not clear that C# supports this any more than it does "ref locals", other than the one used internally for multidimensional arrays.

    Right, the feature is not generally supported but there are a small number of special cases where we take advantage of it. I wrote a prototype of C# a few years ago which did support this feature generally and it worked quite nicely, but I don't think it will make it into the language proper for quite some time, if ever. -- Eric

  • I hope that the proposed ??= operator we talked about a while ago will be discussed in your next article!

    x ??= y would be 'simply' defined as 'x = x ?? y'.

    I use that pattern somewhat often for properties, like so:

    private T _data = null;

    public T Data

    {

       get { return _data = _data ?? new T(); }

    }

Page 1 of 2 (28 items) 12