Ref returns and ref locals

Ref returns and ref locals

Rate This
  • Comments 60

"Ref returns" are the subject of another great question from StackOverflow that I thought I might share with a larger audience.

Ever since C# 1.0 you've been able to create an "alias" to a variable by passing a "ref to a variable" to certain methods:

static void M(ref int x)
{
    x = 123;
}
...
int y = 456;
M(ref y);

Despite their different names, "x" and "y" are now aliases for each other; they both refer to the same storage location. When x is changed, y changes too because they are the same thing. Basically, "ref" parameters allow you to pass around variables as variables rather than as values. This is a sometimes-confusing feature (because it is easy to confuse "reference types" with "ref" aliases to variables,) but it is generally a pretty well-understood and frequently-used feature.

However, it is a little-known fact that the CLR type system supports additional usages of "ref", though C# does not. The CLR type system also allows methods to return refs to variables, and allows local variables to be aliases for other variables. The CLR type system however does not allow for fields that are aliases to other variables. Similarly arrays may not contain managed references to other variables. Both fields and arrays containing refs are illegal because making it legal would overly complicates the garbage collection story. (I also note that the "managed reference to variable" types are not convertible to object, and therefore may not be used as type arguments to generic types or methods. For details, see the CLI specification Partition I Section 8.2.1.1, "Managed pointers and related types" for information about this feature.)

As you might expect, it is entirely possible to create a version of C# which supports both these features. You could then do things like

static ref int Max(ref int x, ref int y)
{
  if (x > y)
    return ref x;
  else
    return ref y;
}

Why do this? It is quite different than a conventional "Max" which returns the larger of two values. This returns the larger variable itself, which can then be modified:

int a = 123;
int b = 456;
ref int c = ref Max(ref a, ref b);
c += 100;
Console.WriteLine(b); // 556!

Kinda neat! This would also mean that ref-returning methods could be the left-hand side of an assignment -- we don't need the local "c":

int a = 123;
int b = 456;
Max(ref a, ref b) += 100;
Console.WriteLine(b); // 556!

Syntactically, 'ref' is a strong marker that something weird is going on. Every time the word "ref" appears before a variable usage, it means "I am now making some other thing an alias for this variable". Every time it appears before a declaration, it means "this thing must be initialized with an variable marked with ref".

I know empirically that it is possible to build a version of C# that supports these features because I have done so in order to test-drive the possible feature. Advanced programmers (particularly people porting unmanaged C++ code) often ask us for more C++-like ability to do things with references without having to get out the big hammer of actually using pointers and pinning memory all over the place. By using managed references you get these benefits without paying the cost of screwing up your garbage collection performance.

We have considered this feature, and actually implemented enough of it to show to other internal teams to get their feedback. However at this time based on our research we believe that the feature does not have broad enough appeal or compelling usage cases to make it into a real supported mainstream language feature. We have other higher priorities and a limited amount of time and effort available, so we're not going to do this feature any time soon.

Also, doing it properly would require some changes to the CLR. Right now the CLR treats ref-returning methods as legal but unverifiable because we do not have a detector that detects and outlaws this situation:

static ref int M1(ref int x)
{
  return ref x;
}

static ref int M2()
{
  int y = 123;
  return ref M1(ref y); // Trouble!
}
static int M3()
{
    ref int z = ref M2();
    return z;
}

M3 returns the contents of M2's local variable, but the lifetime of that variable has ended! It is possible to write a detector that determines uses of ref-returns that clearly do not violate stack safety. We could write such a detector, and if the detector could not prove that lifetime safety rules were met then we would not allow the usage of ref returns in that part of the program. It is not a huge amount of dev work to do so, but it is a lot of burden on the testing teams to make sure that we've really got all the cases. It's just another thing that increases the cost of the feature to the point where right now the benefits do not outweigh the costs.

If we implemented this feature some day, would you use it? For what? Do you have a really good usage case that could not easily be done some other way? If so, please leave a comment. The more information we have from real customers about why they want features like this, the more likely it will make it into the product someday. It's a cute little feature and I'd like to be able to get it to customers somehow if there is sufficient interest. However, we also know that "ref" parameters is one of the most misunderstood and confusing features, particularly for novice programmers, so we don't necessarily want to add more confusing features to the language unless they really pay their own way.

  • It seems I have misread pete.d's example; operand never aliases a loop-local variable. This doesn't change my argument, though.

  • I would use us for a custom implementation of an array of structs, when the normal implementation is troubled. An implementation of Binary Decision Diagrams on a 32 bit architecture would be a real world case. As far as BDD's can be seen as real world, that is. Besides that, i have never even felt the desire for ref locals or ref returns, and overall life is better without them. It would be just one more dangerous pit that Jr Programmer could fall into.

  • "But the aliasing ref local operand is declared outside the loop, which opens the question about what it refers to after the loop has terminated, given that it pointed to a loop-local variable."

    I'm concerned that two different readers do not seem to have understood the code example I posted.  The ref local does _not_ refer to the foreach variable (i.e. the "loop-local variable").  I agree that would be a problem, and it's the same problem as if a ref return value aliased a local variable from a called method that has returned.

    In my example, there are two ways "operand" is used: "operand = ref <foo>" and "operand = <foo>". Only the former case assigns the alias. The latter case (which is what's used with the loop-local variable) would dereference the alias and the assignment is made to the variable the ref local is aliasing, not the ref local itself (due to the lack of "ref" on the RHS of the assignment). This is consistent with the proposed syntax in Eric's article (perhaps that highlights yet another problematic aspect: making it clear in code whether one is creating a new alias, or using the existing one…with real pointers, the "*" accomplishes that, but in Eric's examples it's implicit according to usage, which can lead to misunderstandings).

  • Please do not put this in C# or VB.  It's applicable in only a small number of cases.  It's rarely used in C++.

  • Another thought: if we can have ref return values, can we also have "ref ref" method parameters? What about "ref ref locals and return values"?

    In C/C++, you can add as many levels of indirection as you like. Indeed, due to the explicitness of reference types in C/C++, it's quite common to have two levels of indirection, and three levels isn't exactly uncommon (e.g. pointer to a pointer to an array of pointers).

    It seems to me that C# currently strikes an effective balance between usefulness and simplicity. The "ref returns/locals" feature is way down at the bottom of the list of things that I as a programmer would appreciate seeing added to the language. It seems like it introduces a whole host of problems (including offering new, exciting ways for a programmer to make their code much more confusing), while addressing real-world, important utility issues in the language in only a tiny percentage of scenarios.

    We came to the same conclusion -- there are some narrow scenarios in which these techniques are extremely helpful, but they are sufficiently uncommon that we didn't want to take on the cost. If hypothetically we were to do something like this feature then we would probably not support multiple levels of refness. -- Eric

  • Ah, sorry pete. Chalk that one up to far too confusing a syntax for me then, clearly I didn't read the method properly.

  • Yeah, when I saw two different people misread the code, it was apparent that there was yet another issue with ref locals: the syntax (at least that presented here) is confusing!

    Presumably in the hypothetical case where this feature was implemented, a less-confusing syntax could be contrived (maybe requiring a keyword for dereferencing the alias, a la "*" but C#-ish). Somehow, I suspect it's something the C# language team doesn't have to worry about for quite a while. :)

  • >> Another thought: if we can have ref return values, can we also have "ref ref" method parameters? What about "ref ref locals and return values"?

    I don't think it's reasonable to treat "ref" as analogous to pointers in C++ - they're much more like references (&) in that they are bind-once with no ability to rebind, and implicitly dereferenced. And C++ doesn't have references-to-references either.

    Also, there's no good way to do so while keeping managed pointers (which is what "ref" is) verifiable. The moment you make a "ref ref", you create a possibility for a ref to local to escape the scope of that local. And if refs are no longer verifiable, then how are they different from unmanaged pointers?

    On the other hand, you can have as many levels of indirection as you want in C# already - int*** is a perfectly legal C# type.

  • Of course the most obvious use is foreach:

    foreach(ref var i in list)

     i++;

    This is already possible in C++/CLI. It would be nice to have it in C# as well.

  • If it was only allowed in an "unsafe" context then it would be less likely abused and the verifier wouldn't need to be improved. Using "unsafe" for other possibly confusing/complex operations supported by the CLR, yet not supported by C#, could also lower their cost of implementation and support. Things like uninitialized locals, (ref int)obj unboxing, etc.

  • I would probably"remove" this feature with static code review tools from the teams I am going to lead. I've been on several c++ projects were the lack of readability caused by this featured costed more than we could ever have gained from it

    In my opinion languages should aim at gathering related information so that you for each algorithm have one locus of knowledge. This feature would distribute the knowledge of what happens to a variable making it very hard to reason about the code.

    Entire paradigms, like DCI (by the grand father of MVC Trygve Reenskaug supported by other noteworthies) are build on the idea of keeping knowledge of a functionality located at one place in the code

  • Asssuming the implementation would extend to lamba functions. The feature would allow an implementation of

    Algol 60 Call by Name.

  • Not sure if I missed it in other comments.

    But the very common usage I would have for that feature is the following code

    double value;

    if(! map.TryGet(someKey, ref value))

    {

        value = map[someKey] = computeSemiExpensiveValue(...)

    }

    without ref return values, the hash lookup is done twice

  • Don't do it! Please don't screw our brains!

    your method:

    static ref int Max(ref int x, ref int y)
    {
     if (x > y)
       return ref x;
     else
       return ref y;
    }

    can be emulated by :

    static void Max(ref int x, ref int y, out int z)
    {
     if (x > y)
       z= x ;
     else
       z=y;
    }

    creating new rules for ref read only properties, or ref covariance/contravariance will make my head explode! ;)

    I don't understand how you believe that your "Max" and my "Max" are anything the same. Your Max doesn't even need the arguments to be ref; you've just written the standard implementation of "Max" with an out parameter. -- Eric

  • @dmihailescu:  Your function does not emulate the original function at all.  The original function returns a reference to the larger variable that can be used to modify that variable.  Your function just places the value of the highest variable into a third output variable, which is really no better than just returning the value.

    I don't really understand why people are so opposed to this feature being included in the language.  If you can't wrap your head around it, you don't have to use it!  Just like pointers, sockets, strings, arrays or any other programming feature that some arbitrary programmers can't wrap their heads around.  But that doesn't mean that the feature wouldn't benefit programmers who *can* wrap their heads around it.

Page 3 of 4 (60 items) 1234