Ref returns and ref locals

Ref returns and ref locals

Rate This
  • Comments 60

"Ref returns" are the subject of another great question from StackOverflow that I thought I might share with a larger audience.

Ever since C# 1.0 you've been able to create an "alias" to a variable by passing a "ref to a variable" to certain methods:

static void M(ref int x)
{
    x = 123;
}
...
int y = 456;
M(ref y);

Despite their different names, "x" and "y" are now aliases for each other; they both refer to the same storage location. When x is changed, y changes too because they are the same thing. Basically, "ref" parameters allow you to pass around variables as variables rather than as values. This is a sometimes-confusing feature (because it is easy to confuse "reference types" with "ref" aliases to variables,) but it is generally a pretty well-understood and frequently-used feature.

However, it is a little-known fact that the CLR type system supports additional usages of "ref", though C# does not. The CLR type system also allows methods to return refs to variables, and allows local variables to be aliases for other variables. The CLR type system however does not allow for fields that are aliases to other variables. Similarly arrays may not contain managed references to other variables. Both fields and arrays containing refs are illegal because making it legal would overly complicates the garbage collection story. (I also note that the "managed reference to variable" types are not convertible to object, and therefore may not be used as type arguments to generic types or methods. For details, see the CLI specification Partition I Section 8.2.1.1, "Managed pointers and related types" for information about this feature.)

As you might expect, it is entirely possible to create a version of C# which supports both these features. You could then do things like

static ref int Max(ref int x, ref int y)
{
  if (x > y)
    return ref x;
  else
    return ref y;
}

Why do this? It is quite different than a conventional "Max" which returns the larger of two values. This returns the larger variable itself, which can then be modified:

int a = 123;
int b = 456;
ref int c = ref Max(ref a, ref b);
c += 100;
Console.WriteLine(b); // 556!

Kinda neat! This would also mean that ref-returning methods could be the left-hand side of an assignment -- we don't need the local "c":

int a = 123;
int b = 456;
Max(ref a, ref b) += 100;
Console.WriteLine(b); // 556!

Syntactically, 'ref' is a strong marker that something weird is going on. Every time the word "ref" appears before a variable usage, it means "I am now making some other thing an alias for this variable". Every time it appears before a declaration, it means "this thing must be initialized with an variable marked with ref".

I know empirically that it is possible to build a version of C# that supports these features because I have done so in order to test-drive the possible feature. Advanced programmers (particularly people porting unmanaged C++ code) often ask us for more C++-like ability to do things with references without having to get out the big hammer of actually using pointers and pinning memory all over the place. By using managed references you get these benefits without paying the cost of screwing up your garbage collection performance.

We have considered this feature, and actually implemented enough of it to show to other internal teams to get their feedback. However at this time based on our research we believe that the feature does not have broad enough appeal or compelling usage cases to make it into a real supported mainstream language feature. We have other higher priorities and a limited amount of time and effort available, so we're not going to do this feature any time soon.

Also, doing it properly would require some changes to the CLR. Right now the CLR treats ref-returning methods as legal but unverifiable because we do not have a detector that detects and outlaws this situation:

static ref int M1(ref int x)
{
  return ref x;
}

static ref int M2()
{
  int y = 123;
  return ref M1(ref y); // Trouble!
}
static int M3()
{
    ref int z = ref M2();
    return z;
}

M3 returns the contents of M2's local variable, but the lifetime of that variable has ended! It is possible to write a detector that determines uses of ref-returns that clearly do not violate stack safety. We could write such a detector, and if the detector could not prove that lifetime safety rules were met then we would not allow the usage of ref returns in that part of the program. It is not a huge amount of dev work to do so, but it is a lot of burden on the testing teams to make sure that we've really got all the cases. It's just another thing that increases the cost of the feature to the point where right now the benefits do not outweigh the costs.

If we implemented this feature some day, would you use it? For what? Do you have a really good usage case that could not easily be done some other way? If so, please leave a comment. The more information we have from real customers about why they want features like this, the more likely it will make it into the product someday. It's a cute little feature and I'd like to be able to get it to customers somehow if there is sufficient interest. However, we also know that "ref" parameters is one of the most misunderstood and confusing features, particularly for novice programmers, so we don't necessarily want to add more confusing features to the language unless they really pay their own way.

  • @nonoitall

    I beg to disagree. the out z in my function is the equivalent of the return ref in the original function.

    As one who has done C++ and C++/CLI, I'll take the simplicity and cohesion of C# any day over the C++ syntax and concepts. The only place where C++ is a must is native code or asm blocks, the rest can be emulated by .net constructs.

  • @dmihailescu: Actually, nonoitall has a point here. Look at the lines "Max(ref a, ref b) += 100;" and especially "ref int c = ref Max(ref a, ref b);" in the original post. In both cases, you get a *variable* that will be aliased to another by the Max method (its "identity" is "switched" for you). This is not the same as putting a *value* in z, and then working with z. In Eric's code, you will continue to work with either a or b and you don't know which until the Max() method completes.

  • @dmihailescu:  Yeah, the original function *returns* a reference to A or B that can be used by the caller to modify said variable.  Your function accepts a third reference as *input*, but simply places a value into the variable that it points to.  It does not return a reference that the caller can use to modify one of the two original variables.  There's really no reason for the first two parameters in your function to be refs because the function only reads the values that they point to, and does nothing with the references themselves.

  • @Alan

    You are right. My z will be an alias  to another variable on the stack not a reference to an existing variable.

    I was thinking about reference objects not value types. My bad!

  • How about for databinding w/in C#:

    public BoundMember<T>

    {

     private ref T _GetValueRef = ref default(T);

     private ref T _SetValueRef = ref default(T);

     private T _GetValue;

     prviate T _SetValue;

     public BoundMember(ref T value, BindingMode mode = BindingMode.TwoWay)

     {

       Mode = mode;

       _GetValue = value;

       _SetValue = value;

       if(mode != BindingMode.OneWayToSource)

          _GetValueRef = ref _GetValue;

       if(mode == BindingMode.TwoWay || mode == BindingMode.OneWayToSource)

         _SetValueRef = ref _SetValue;

     }

     public T Value

     {

       get{return _GetValue;}

       set{_SetValue = value;}

     }

     public BindingMode{get;set;}

    }

    public enum BindingMode

    {

     TwoWay,

     OneWay,

     OneWayToSource

    }

    Maybe even add syntatical sugar:

    int x = 0;

    var boundX :-: x; <=> new BoundMember(ref x);

    var boundX1W :- x; <=> new BoundMember(ref x, BindingMode.OneWay);

    var boundX1W2S -: <=> new BoundMember(ref x, BindingMode.OneWayToSource);

    What do you guys think?

  • I would not use it since I have never felt the need for this. At the moment, I can't think of a case where I would want it - out and ref work perfectly for me when I need to do something fancy.

    Indeed, I would be glad to not see this feature, as I feat it would invite people to invent new "clever" (= bad and unintelligible) coding patterns.

  • I believe you summed up the answer to your question in "Compound Assignment, Part One" ( Tue, Mar 29 2011 2:24 PM ):

    "we are now mutating the variable containing the copy but we need to be mutating the original."

    I tend to use return by reference when I want:

    * to return an lvalue.  a[5] = 0;  // a.operator[](5) = 0

    * to return this; i.e., to construct a filter chain.  See "method chaining".

    * to act on a singleton.

    * to act on someone else's storage, without having to know that their storage is stack, heap, or TEXT, or that their storage is actually *someone else's* storage, or that the storage is in some complicated container (so I have to be able to consume any random iterator or handle type), or that the storage is in some packed container (like an array).

    * to mutate someone else's consts.  "Yeah...  *You're* not allowed to change it..."

    * to act on a lazily constructed value type without forcing its entire construction at copy to stack.

    * to act on just-in-time data without forcing redundant just-in-time loads to recur at every call in a complex expression tree.

    * to get as close to an lvalue reference optimization as I can.  (I.e.  re-use the shallowest caller's return slot in their stack frame as the return slot for each callee all the way down to the deepest callee.)

    * recently, to avoid repeatedly copying multi-MB frames of video to the stack in recursive de-noising operations.  (Which, thanks to a 3rd party library writer are all value types.)  (This is more important on phones and tablets...)

    Admittedly, some of these overlap.  And, yes, anything can be worked around.  (Your Turing complete language is nor more powerful than *my* Turing ocmplete language...)  But, once one has understood the value of writing functions as filters on someone else's data, rather agnostically on how that data's stored, where that data's stored, or what operations the caller is allowed to perform on the data, it's rather cruel to take that away...  So, yeah, I think references would be a good idea for C#.

    Of course, sometimes, I'd really, really, really, like to know if I've been called with an lvalue or rvalue reference.  See thbecker.net/.../section_01.html for a reasonable explanation.

  • @Jason Lind - your example has ref fields in a class, which isn't allowed in the CLR. But ultimately a reference to a "variable" of any kind can be treated as two operations, a getter and a setter. And thanks to lambdas you can express operations very neatly, and they can refer to local variables, so you can already simulate that kind of binding very easily, see smellegantcode.wordpress.com/.../pointers-to-value-types-in-c

  • @Eric Towers - Based on the examples you're giving (e.g. realtime video processing), you're not really describing C#'s sweet spot. If safe-mode C# absorbed all the features it needed to replace C as a "portable assembly language", it would be no different from unsafe-mode C#.

    Although I confess that as a recovering long-term C++ user, I've found it shamefully entertaining to learn about rvalue references. But I'm going to keep it as pure entertainment. I think it's going to work best that way for me!

  • It seems a little confusing to me that we already have ref returns in array indexers:

    Consider following example:

    struct S

    {

      int i;

      public void Increment() {i++;}

    }

    S[] s = new S[1];

    s[0].i++; // s[0].i equals to 1 because s[0] returns managed pointer to interior array element

    IList<S> si = s;

    si[0].i++; //compile time error, because we can't modify temporary variable

    si[0].Increment(); // compiled successfully but increment local copy

    It's clear why we have ref returns on arrays (it improves performance) but why we have ref returns in arrays but do not have ref returns in BCL collections? It seems like we could add this behavior to existing collections without adding any new features to C# compiler at all.

    P.S. Actually I can't find specification section that stated about returning "managed reference" from arrays indexer. I know that array indexer implemented with its own IL instruction called ldelema, but I can't find official rational about it.

  • Actually, upon further consideration I find myself liking the variadic-delegate-callthrough approach more, and the idea of "heap ref" parameters less.  I would suggest something like the following syntax:

    int foo {ref {

     do_some_stuff();

     ref return x;

     do_more_stuff();

    }}

    The "ref return" statement could use any legitimate lvalue (including automatic variables of the property, if desired) but the routine would have to be written in such a way that execution could not escape the function except via an Exception, without performing exactly one "ref return".  What would be necessary to make this work would be a means of declaring a function which could accept a delegate consisting of a fixed-type ref parameter and an arbitrary number of additional ref parameters, and be callable using its fixed parameters and the appropriate number of additional ref parameters to satisfy the delegate.  For example:

    void foo(long thing1, String thing2, ActionByRef<ref int, ...> theAction, ...)

    {

     do_some_stuff();

     theAction(x, ...);

     do_more_stuff();

    }

    might expand as:

    void foo(long thing1, String thing2, ActionByRef<ref int, ref T1, ref T2, ref T3, ref T4, ref T5> theAction,

       ref T1 r1, ref T2 r2, ref T3 r3, ref T4 r4, ref T5 r5)

    {

     do_some_stuff();

     theAction(x, r1, r2, r3, r4, r5);

     do_more_stuff();

    }

    if there needed to be five reference parameters.  If one could have a function that could expand itself as needed with the appropriate number of reference parameters, one could achieve the benefits of being able to return references, and some more besides.  Among other things, one could wrap the "ref return" in a Try/Catch/Finally block, allowing check-out/check-in semantics to be enforced even in the presence of exceptions.

    Incidentally, while it might in some cases be nice to allow for arbitrary-expanded value parameters, I don't think restricting the expanded area to reference parameters would be a problem.  Even if the function one wanted to call would expect reference parameters, the compiler which was generating the code to use a reference property could generate an appropriate static wrapper function to do the conversion.  The stack required to call a function of N parameters of which M were reference properties would be O(N*M).  If all the stack items in question are variable references (8 bytes for x64), the limit even for M=N=30 would be about 4K of stack space.  If the items were 200-byte value types, though, things could get ugly.

  • I would use it! I need it right now

    Can you give us more details? -- Eric

  • I feel like it would be useful to have ref returns if you could combine this with ref parameters for operators as well. It would allow doing math operations for example on complex types (Matrices) without having to worry about it copying the thing so many times.

    This can of course be inlined by hand, but inling so many adds and multiplies begs for a better way to do it.

    Working with XNA in C# 4 it can be a burden to do Vector / Matrix math when worrying about performance as well, especialy with very complex matrix hierarchies and blended animations using 100's or 1000's or matrix multiplies per model (with many many value copies as well)

  • Since I see a lot of people saying "Don't add this!" I feel compelled to add my vote to the "yes" side.

    I think of C# as being a better C++. I would love nothing better than to stop using C++ and use C# exclusively, because the safety guarantees, rich BCL, improved intellisense, shorter code, faster compile times (etc.) of C# are invaluable. Unfortunately, I still have to use C++ extensively because C# is not fast enough---especially on .NET Compact Framework which is sometimes more than 10 times slower than equivalent C# code (see www.codeproject.com/.../BenchmarkCppVsDotNet.aspx ).

    As mentioned earlier, ref-types could let you get a reference to a value in a Dictionary and modify that value with a single lookup, and ref-types would allow you to write more concise code in some cases (and I agree with Yaron Minsky about the value of concision, see queue.acm.org/detail.cfm ), such as the stated example that you want to modify the result of Max(ref x, ref y).

    Even if this is never allowed in C#, surely it should be officially supported by the CLR standards (not just MS' implementation) so that compilers for 3rd-party languages can "safely" include the feature. (Same argument goes for covariant return types.)

  • If implemented I would use it today, and have used it last night.

    I desire to access the STL/CLR but cannot do so in C# because all the interfaces have prototypes with TValue% return types.  Yet, given the knowledge that all C# reference types and boxed value types are on the managed heap and that the STL/CLR is using 'by-value' semantics, it is safe to convert these to TValue%. The only missing piece is for the C# compiler to recognize this and do its magic, recognizng that the C# implementation

    public  TValue  get_ref() { return _container[_bias]; }

    satisfes the interface prototype

    public  TValue%  get_ref();

    In fact the MSDN documentation here

    msdn.microsoft.com/.../bb302608.aspx

    states that this will happen, but it doesn't.

    Pieter

Page 4 of 4 (60 items) 1234