Ref returns and ref locals

Ref returns and ref locals

Rate This
  • Comments 60

"Ref returns" are the subject of another great question from StackOverflow that I thought I might share with a larger audience.

Ever since C# 1.0 you've been able to create an "alias" to a variable by passing a "ref to a variable" to certain methods:

static void M(ref int x)
{
    x = 123;
}
...
int y = 456;
M(ref y);

Despite their different names, "x" and "y" are now aliases for each other; they both refer to the same storage location. When x is changed, y changes too because they are the same thing. Basically, "ref" parameters allow you to pass around variables as variables rather than as values. This is a sometimes-confusing feature (because it is easy to confuse "reference types" with "ref" aliases to variables,) but it is generally a pretty well-understood and frequently-used feature.

However, it is a little-known fact that the CLR type system supports additional usages of "ref", though C# does not. The CLR type system also allows methods to return refs to variables, and allows local variables to be aliases for other variables. The CLR type system however does not allow for fields that are aliases to other variables. Similarly arrays may not contain managed references to other variables. Both fields and arrays containing refs are illegal because making it legal would overly complicates the garbage collection story. (I also note that the "managed reference to variable" types are not convertible to object, and therefore may not be used as type arguments to generic types or methods. For details, see the CLI specification Partition I Section 8.2.1.1, "Managed pointers and related types" for information about this feature.)

As you might expect, it is entirely possible to create a version of C# which supports both these features. You could then do things like

static ref int Max(ref int x, ref int y)
{
  if (x > y)
    return ref x;
  else
    return ref y;
}

Why do this? It is quite different than a conventional "Max" which returns the larger of two values. This returns the larger variable itself, which can then be modified:

int a = 123;
int b = 456;
ref int c = ref Max(ref a, ref b);
c += 100;
Console.WriteLine(b); // 556!

Kinda neat! This would also mean that ref-returning methods could be the left-hand side of an assignment -- we don't need the local "c":

int a = 123;
int b = 456;
Max(ref a, ref b) += 100;
Console.WriteLine(b); // 556!

Syntactically, 'ref' is a strong marker that something weird is going on. Every time the word "ref" appears before a variable usage, it means "I am now making some other thing an alias for this variable". Every time it appears before a declaration, it means "this thing must be initialized with an variable marked with ref".

I know empirically that it is possible to build a version of C# that supports these features because I have done so in order to test-drive the possible feature. Advanced programmers (particularly people porting unmanaged C++ code) often ask us for more C++-like ability to do things with references without having to get out the big hammer of actually using pointers and pinning memory all over the place. By using managed references you get these benefits without paying the cost of screwing up your garbage collection performance.

We have considered this feature, and actually implemented enough of it to show to other internal teams to get their feedback. However at this time based on our research we believe that the feature does not have broad enough appeal or compelling usage cases to make it into a real supported mainstream language feature. We have other higher priorities and a limited amount of time and effort available, so we're not going to do this feature any time soon.

Also, doing it properly would require some changes to the CLR. Right now the CLR treats ref-returning methods as legal but unverifiable because we do not have a detector that detects and outlaws this situation:

static ref int M1(ref int x)
{
  return ref x;
}

static ref int M2()
{
  int y = 123;
  return ref M1(ref y); // Trouble!
}
static int M3()
{
    ref int z = ref M2();
    return z;
}

M3 returns the contents of M2's local variable, but the lifetime of that variable has ended! It is possible to write a detector that determines uses of ref-returns that clearly do not violate stack safety. We could write such a detector, and if the detector could not prove that lifetime safety rules were met then we would not allow the usage of ref returns in that part of the program. It is not a huge amount of dev work to do so, but it is a lot of burden on the testing teams to make sure that we've really got all the cases. It's just another thing that increases the cost of the feature to the point where right now the benefits do not outweigh the costs.

If we implemented this feature some day, would you use it? For what? Do you have a really good usage case that could not easily be done some other way? If so, please leave a comment. The more information we have from real customers about why they want features like this, the more likely it will make it into the product someday. It's a cute little feature and I'd like to be able to get it to customers somehow if there is sufficient interest. However, we also know that "ref" parameters is one of the most misunderstood and confusing features, particularly for novice programmers, so we don't necessarily want to add more confusing features to the language unless they really pay their own way.

  • I agree with most of the other comments here - I don't think I'd use this if it were implemented, and I don't think it's really necessary.

  • I was under the impression that the general trend was moving away from mutable types and operations. I've seen ref abused quite thoroughly; mainly by those that don't (and still don't) understand the difference between reference and value types. But that's still not a good reason for rejecting a feature I'll admit.

    I'm more interested in the 'message' such a feature would send. "We're encouraging side-effects".

  • I have wanted a subset of this occasionally - ref returns of array elements or fields transitively of a ref type (eg "ref Ref.Struct.Struct" would be ok) would be quite useful, and (even unreturnable) local ref variables would be nice, but I wouldn't know how useful until I used this. Remember, for better or worse, in a lot of cases the alternative is public mutable fields.

  • This would definitely open up the way to some nice performance improvements in collection indexers.  For example, right now if you want to increment a value in a dictionary, you effectively have to look up the same key twice.  This can be extremely expensive  in performance-critical areas.  With ref variables though, you could just return a ref on the first lookup and increment the variable that it refers to.  Yes, refs are a little trickier to use than normal variables, but it certainly wouldn't be the most complex feature to graze the C# language.  (I mean, it already has *unmanaged* references and pointers.  It seems a little backwards that *managed* references would be omitted.)

  • Although I see its uses, I don't know if this is something I'd use because of people having troubles maintaining the code, however, if it were up to me, I'd add this feature anyway.

  • I wrote something like this recently:

    public class SomeClass

    {

    bool processed1;

    bool processed5;

    bool processed8;

    AdditionalData additionalData1/* = ... */;

    AdditionalData additionalData5/* = ... */;

    AdditionalData additionalData8/* = ... */;

    ProcessedData processedData1;

    ProcessedData processedData5;

    ProcessedData processedData8;

    public void ProcessSomething(ThingKind thingKind, ThingData data)

    {

    AdditionalData additionalData;

    TypedReference processedDataField;

    TypedReference processedField;

    // Initialization

    switch (thingKind)

    {

    case ThingKind.Thing1:

    additionalData = additionalData1;

    processedDataField = __makeref(additionalData1);

    processedField = __makeref(flag1);

    break;

    case ThingKind.Thing5:

    additionalData = additionalData5;

    processedDataField = __makeref(additionalData5);

    processedField = __makeref(flag5);

    break;

    case ThingKind.Thing8:

    additionalData = additionalData8;

    processedDataField = __makeref(additionalData8);

    processedField = __makeref(flag8);

    break;

    default:

    throw new NotSupportedException();

    }

    // Actual processing...

    // Throw exceptions if there are problems

    // Then finish

    __refvalue(processedDataField, ProcessedData) = /* something */;

    __refvalue(processedField, bool) = true;

    }

    }

    This code would benefit from official C# "ref locals"…

    (Of course, this specific method could be implemented in other ways not requiring the use of references, but most of them would end up requiring a lot more code (arrays, reflection, delegates…).

    Calling an external (private) method in each case label would however, perfectly work in this simple case, but would not work for more complex ones.)

    I felt a little shameful using those "famous" undocumented keywords, but to me the code feels much cleaner than any other way I could have ended up with.

    Other than this, "ref properties"as mentionned by Sam, would be quite an useful feature.

    The point of regular properties usually is to encapsulate a field, and do additional processing (e.g range checking, …) before setting the value.

    However, one may sometimes not need to verify what is written to the encapsulated field, but return an *efficient* reference to said field instead.

    Being able to write something like this might be interesting:

    public ref byte this[bool b, int i] { ref get { return (b ? array1 : array2)[index - 50]; } }

    (I chose to write "ref get", as it emphasizes the fact that it is part of a "ref property")

    Aditionally, considering you propose to support expressions such as:

    Max(ref a, ref b) += 100;

    Am I right assuming that this means expressions such as this one:

    (boolean ? ref int1 : ref int2) = 12; // would also work ? (Because this would be great to have sometimes…)

    Anyway, I would really like to have such a feature in the language, but I admit I wouldn't use it everyday.

  • It is common in C++ classes to have methods which returns CValueType& (for read-write) or const CValueType& (for read-only). And I did wanted that when I was learning C#. But now I am a C# guy and I changed my mind.

    Firstly, returning CValueType& somewhat breaks encapsulation. It makes the field modifiable to any value by other code at any time without notifying about it. It does not matter for simple case like List<>, but matters in case of Collection<T> derivatives which may need to do some action on change.

    Secondly, as long as CValueType& (ref return) is supported, we'd want const CValueType& (const ref return) to avoid arbitrary modification, which means introducing the tedious const-correctness thing of C++ to C#.

    Thirdly, it adds another way to implement read-only and read-write properties.

    So, I second the opinion that let the compiler/jitter to optimize out the value type overhead instead.

  • I don't mind the feature a great deal. I can't see myself using it much if at all, but it's plenty clear what's going, what with all the ref's around the place few people would ever be caught out surprised.

    And I have to say.. I do actually like Fabian's example above:

    (boolean ? ref int1 : ref int2) = 12;

    Despite that for performance reasons I imagine the compiler would ideally turn that into the equivelant if/else without references.

  • Please don't add this. As someone who's worked on commercial .Net systems since .Net v1, I cannot imagine a case where I'd need this; neither would I wish to maintain such a system.

  • Well, as long as you're doing the survey… :)

    Honestly, this is a feature I can do without. However, if it existed I would probably use it occasionally. Oddly enough, not the ref return type so much as the ref locals.

    In fact, just today I was cleaning up some argument parsing code where I was thinking it would be nice to be able to do something like:

     string hasoperandvalue = null, otherwithopvalue = null;
     ref string operand = null;
     foreach (string arg in args)
     {
       if (operand != null)
       {
         operand = arg;
         continue;
       }
       switch (arg)
       {
         case "someflag":
           someflag = true;
           break;
         case "hasoperand":
           operand = ref hasoperandvalue;
           break;
         case "otherwithop":
           operand = ref otherwithopvalue;
           break;
       }
     }

    That sort of thing. The general idea being that I've got code that will want to modify some variable, based on some prior condition, and I want to leave the modifying code general-purpose.

    The above isn't the only example of this coming up, but I have to admit, it doesn't come up very often. The other thing is that while the above is perceived by me as more elegant, I suspect that for at least some others it would just make it harder to understand the code. Indirection has always been a bear for many programmers.

    Workarounds include:

     • Using an interface with a property to describe the variable that needs updating. This is overkill unless you're already dealing with somewhat complex types, but in that case it can work fine.

     • Using anonymous methods to capture and set the variables you want to refer to. Code-wise, this actually doesn't look too bad, but it still has the same sort of indirection-caused code-confusion potential as other ways of aliasing.

     • Just do the damn assignment at the point where you can tell which variable to assign. In the above example that means that you need to iterate over the collection by index, so that within the loop you can retrieve more than one value (i.e. the current value tells you the switch, but then you need to advance to the next one before continuing in the loop), with of course range-checking to make sure you don't go out of the indexed collection index range if the arguments provided were incorrect (i.e. missing argument at the end).

    (You can also use a "foreach" loop in that last scenario if you maintain the "prev" value, and do the switch on that one instead of the current one, but that's arguably at least as confusing a way to write the loop as using indirection of some sort).

    Anyway, the bottom line is that while I occasionally do find myself a bit wistful about not having ref locals, fact is the code is never really all that much worse without it, and frankly in terms of readability and maintainability, I'd say it's arguable that the code is _better_ without (C# already offers me plenty of opportunities to be too clever for my own good :) ).

    Interesting example; thanks for posting it. In my prototype this would not be allowed because we ensured that ref locals are always initialized to a valid reference to a variable. There was no "null ref". -- Eric

  • (and yes, I forgot to set "operand" to null before continuing in the code above…I hope that does not detract from the comprehensibility of the example :) )

  • Pete.d's example actually shows a lovely boundary case. By working on the foreach variable you have either:

    Broken the (user) immutability of the variable.

    Added yet another confusing bug trap a la trapping them with closures

    Made the compiler have to spot this case and warn/refuse.

    Admittedly they seem to be trying to change the semantics of the variable in hypothetical vNext, but none the less as it stands I see no way this could end well.

  • I'd love this feature. Reference handling is much useful for multimedia apps.

  • I have wanted something similar in the past, when trying to modify fields inside stricts inside property-reads (such as the value of a given entry in a dictionary). Obviously, there are other ways of getting the same effect, but they aren't always as clean.

  • Although I usually like every low-level feature, I would not like to see ref locals in C#. Looking at pete.d's example, one can easily spot a lot of issues with this feature: Observe that the example relies - as most usages would - on the ability to change the variable that the reference is an alias to. But the aliasing ref local operand is declared outside the loop, which opens the question about what it refers to after the loop has terminated, given that it pointed to a loop-local variable. From the CLR point of view, this is safe, but from the C# point of view it isn't. It also concerns details like whether a loop-local variable lives in the same location in different loop iterations - this is normally true, but a simple lambda referring to the variable can change that.

    I think this shows that introducing ref locals in this way opens a can of worms which should remain closed.

    Init-once ref locals are another issue though - these would resemble C++ references rather than C++ pointers (% rather than interior_ptr if you speak C++/CLI, & rather than * if not). However, this would require a special treatment of variable initialization which C# doesn't have right now ("type local = expr;" is currently equivalent to "type local; local = expr;" - implicitly typed locals aside); I'd prefer not to change the current intuitive semantics.

     

    Indeed, you make an excellent point which I did not call out in my sketch of the feature. In the prototype I wrote up I ensured that ref locals were "init only". There are certainly pros and cons of both ways. -- Eric

     

    Also note that such ref local variables are of rather limited use as long as ref returns are not introduced as well, which have their own issues. Nevertheless, I somewhat agree with nonoitall regarding his comment on collection indexers; I have always found the STL indexers with their ref-returning behaviour a bit superior to the .NET approach with completely distinct getters and setters. However, I doubt that this small use case warrants such a major language change.

    I am glad that C++/CLI offers full support for managed pointers via % and interior_ptr; I have found this absolutely useful at times, especially when doing pointer arithmetic in arrays without needing to pin them. However, this is not really the C# way of doing things, so all in all I'd prefer not seeing managed pointer support in C# be extended beyond the current state of affairs (ref and out arguments).

    The only thing that slightly worries me is that this means that methods may be uncallable for C#; if I want to call a managed-pointer-returning method right now in C# 4.0, I simply get "'Class.Method()' not supported by the language". This is a bit sad, given that C# has always strived to expose every CLR feature, but I do not see a way around this.

Page 2 of 4 (60 items) 1234