Ref returns and ref locals

Ref returns and ref locals

Rate This
  • Comments 60

"Ref returns" are the subject of another great question from StackOverflow that I thought I might share with a larger audience.

Ever since C# 1.0 you've been able to create an "alias" to a variable by passing a "ref to a variable" to certain methods:

static void M(ref int x)
{
    x = 123;
}
...
int y = 456;
M(ref y);

Despite their different names, "x" and "y" are now aliases for each other; they both refer to the same storage location. When x is changed, y changes too because they are the same thing. Basically, "ref" parameters allow you to pass around variables as variables rather than as values. This is a sometimes-confusing feature (because it is easy to confuse "reference types" with "ref" aliases to variables,) but it is generally a pretty well-understood and frequently-used feature.

However, it is a little-known fact that the CLR type system supports additional usages of "ref", though C# does not. The CLR type system also allows methods to return refs to variables, and allows local variables to be aliases for other variables. The CLR type system however does not allow for fields that are aliases to other variables. Similarly arrays may not contain managed references to other variables. Both fields and arrays containing refs are illegal because making it legal would overly complicates the garbage collection story. (I also note that the "managed reference to variable" types are not convertible to object, and therefore may not be used as type arguments to generic types or methods. For details, see the CLI specification Partition I Section 8.2.1.1, "Managed pointers and related types" for information about this feature.)

As you might expect, it is entirely possible to create a version of C# which supports both these features. You could then do things like

static ref int Max(ref int x, ref int y)
{
  if (x > y)
    return ref x;
  else
    return ref y;
}

Why do this? It is quite different than a conventional "Max" which returns the larger of two values. This returns the larger variable itself, which can then be modified:

int a = 123;
int b = 456;
ref int c = ref Max(ref a, ref b);
c += 100;
Console.WriteLine(b); // 556!

Kinda neat! This would also mean that ref-returning methods could be the left-hand side of an assignment -- we don't need the local "c":

int a = 123;
int b = 456;
Max(ref a, ref b) += 100;
Console.WriteLine(b); // 556!

Syntactically, 'ref' is a strong marker that something weird is going on. Every time the word "ref" appears before a variable usage, it means "I am now making some other thing an alias for this variable". Every time it appears before a declaration, it means "this thing must be initialized with an variable marked with ref".

I know empirically that it is possible to build a version of C# that supports these features because I have done so in order to test-drive the possible feature. Advanced programmers (particularly people porting unmanaged C++ code) often ask us for more C++-like ability to do things with references without having to get out the big hammer of actually using pointers and pinning memory all over the place. By using managed references you get these benefits without paying the cost of screwing up your garbage collection performance.

We have considered this feature, and actually implemented enough of it to show to other internal teams to get their feedback. However at this time based on our research we believe that the feature does not have broad enough appeal or compelling usage cases to make it into a real supported mainstream language feature. We have other higher priorities and a limited amount of time and effort available, so we're not going to do this feature any time soon.

Also, doing it properly would require some changes to the CLR. Right now the CLR treats ref-returning methods as legal but unverifiable because we do not have a detector that detects and outlaws this situation:

static ref int M1(ref int x)
{
  return ref x;
}

static ref int M2()
{
  int y = 123;
  return ref M1(ref y); // Trouble!
}
static int M3()
{
    ref int z = ref M2();
    return z;
}

M3 returns the contents of M2's local variable, but the lifetime of that variable has ended! It is possible to write a detector that determines uses of ref-returns that clearly do not violate stack safety. We could write such a detector, and if the detector could not prove that lifetime safety rules were met then we would not allow the usage of ref returns in that part of the program. It is not a huge amount of dev work to do so, but it is a lot of burden on the testing teams to make sure that we've really got all the cases. It's just another thing that increases the cost of the feature to the point where right now the benefits do not outweigh the costs.

If we implemented this feature some day, would you use it? For what? Do you have a really good usage case that could not easily be done some other way? If so, please leave a comment. The more information we have from real customers about why they want features like this, the more likely it will make it into the product someday. It's a cute little feature and I'd like to be able to get it to customers somehow if there is sufficient interest. However, we also know that "ref" parameters is one of the most misunderstood and confusing features, particularly for novice programmers, so we don't necessarily want to add more confusing features to the language unless they really pay their own way.

  • I don't really see myself ever using something like this. I'm sure others would find it useful, but I imagine the majority would not.

  • How would such a detector work?

    Specifically, how would it detect this as invalid but not _also_ detect the same M1 and M3 as invalid if  M2 were:

    class intbox { public int value };
    ref int M2()

      intbox y = new intbox { value = 123 };
      return ref M1(ref y.value);
    }

    Or if it's M2 that would be detected as invalid, what if M1 returned an (unrelated) intbox.value?

    If it's conservative enough that it will reject it even with either [or both] of these changes, what cases _will_ it accept?

    Good questions. When researching this prototype we did a sketch of how such a detector might work. In the scenario you describe there is no problem because no ref to y is ever passed to M1, and therefore M1 cannot possibly return a ref to y. y.Value is a variable on the heap somewhere, so it is perfectly safe to pass around arbitrarily. The actually dangerous scenario is

    struct intbox { public int value }; // NOW A STRUCT
    ref int M2()

      intbox y = new intbox { value = 123 };
      return ref M1(ref y);  // Now M1 takes a ref intbox.
    }

    M1 might be returning a ref to y.value, which is on the stack and about to die.

    The detector would have to keep track of what local variables of value type were being passed by ref, and if the returns coming back could possibly be interior to those locals. If you had:

    ref double M2() // now returns a ref double

      intbox y = new intbox { value = 123 };
      return ref M1(ref y);  // Now M1 takes a ref intbox and returns a ref double
    }

    then no problem; there's no way M1 could be returning a ref double that came out of the storage of y, because intbox doesn't have any field of type double.

    Basically you just need to do a little local flow analysis on every ref local, and see if any ref return can possibly be returning the local or a portion of it. It's not that hard a problem. -- Eric

  • A little off-topic...

    On a desktop system, rich of resources, I'd cut the "ref" accessor. I don't see any good reason to keep, especially now that the world is going toward async, and immutability is getting more importance.

    Anyway, C# can run even on a compact and micro frameworks, where the resources are like the water in the desert.

    Now, consider an array of structs (e.g. Point) and a loop to translate all of them of an offset (also a Point):

    for (int i=0; i<N; i++)

    {

     pt[i].X += offset.X;

     pt[i].Y += offset.Y;

    }

    Well, in this trivial case the ref is important, and would be even important if I were able to use "inline". That is because that loop is poorly performing: it has to access twice an indexer.

    If I add this helper:

    function Adder(ref Point pt, ref Point offset)

    {

     pt.X += offset.X;

     pt.Y += offset.Y;

    }

    the performance rises a lot more, because there's only one indexing, and none is copied.

    My question is: would be a valuable task the ability to inline-"ref" a struct of an array without having to write a separate function?

    Thanks a lot.

  • Please don't add this to C#.  I don't want to maintain code that uses it.

    I hear you, but I should point out that we get that feedback from customers for pretty much every single feature we propose adding. People basically say "Well I will know how to use this feature correctly, but my idiot coworkers are going to mess it all up and then I'm going to have to clean up their godawful code, so please don't give those bozos any more power." We got that feedback for generics, LINQ, dynamic, async, you name it.

    We take very seriously the fact that features can be misused; we want C# to be a "pit of quality" language, where the language naturally leads you to write the high-quality soluition and you really have to climb out of the pit to write something low-quality. But we also trust our customers to follow good practices and to learn how a powerful tool works before they start building with it.

    What scares me about this feature is that it makes it easier to write programs that have lots of variable aliasing in them. Aliasing is hard on the compiler because it greatly complicates analysis. And if the compiler is having a hard time, humans are going to have a hard time as well. But if hypothetically we did this -- and like I said, we're probably not going to -- it's not like we're going to go down the C/C++ road and allow you to pass back references to dead variables. The feature will still be memory-safe. -- Eric

  • I'm with Jeff; I think maintaing this on methods might be hard.

    However, I wouldn't mind it on properties; it would be nice to be able to return a struct (such as Point) and you can change a property of that struct without having to copy the struct to a local, change the local and then set it back to the original property.

  • A passionate plea to NOT add this.

    @Mario, there are a number patterns/use cases where "ref" parameters really make good sense. For example if you are implementing an immutable system and need to update the caller with multiple new instances. Also it is a very handy paradigm when initializing read only fields and you want to factor this out of the constructor body itself. [although I wish the definition of readonly was changed...but after 5 years, I have given up even asking]

  • I can see how it appeals to those who like tricky unreadable code, but we're not counting every byte in application as much as we did back when C was invented.

    I prefer having readable code and letting the compiler/Jitter working out the "tricks".

    I find that this feature would quickly result in bugs because I feel that it doesn't follow the principle of least surprise.

  • This sure would reduce the readability/maintainability of code. But that doesn't mean that it should not be implemented. Experts who need this kind of power may use it.

  • For the life of me I can't think of any uses for this in my day to day usage. I can't even think of uses for it in extreme performance scenarios when I would be willing to tolerate the conceptual complexity incurred.

    I'd therefore go with the 'no' option unless someone could point out some compelling use cases I think would benefit me :)

    The hit to the reflection/generic layer would also be quite unpleasant (especially since you don't even have the ultimate (if costly) fallback of treating it as a (possibly boxed) object.

  • Please don't add this, whenever I see a ref I'm very suspicious of what is going on.

    If you need C++ features use managed C++

  • I wonder what would be the practical differences between ref types as you discuss, and a generic class...

    public class ValueRef<T> { public T Value; } /* Untested. Ctor etc ommitted. */

    The Max function mooted would instead accept two ValueRef<int> object references and return one of those. There's only one copy of the value inside, as long as all access are via x.Value.

    Granted, this isn't as tidy as putting the word 'ref' next to the type name, and I'm also side-stepping the question of what practical uses such an object has.

    (I've not tested any of this or really thought it through. Please be nice.)

    billpg

    So how do you make a ValueRef<int> to the tenth element of an integer array, say?

    Your idea is not so farfetched though. Something I deliberately did not mention in this article is that something like the ValueRef type you propose actually exists! It is called TypedReference and it is a Very Special Type. It is used only for obscure interop scenarios where you need to be able to pass around a reference to a variable of type where the type is not known at compile time. This is a subject for another day. -- Eric

  • I agree with Jeff.

    I can't think of any situations where I would actually want this.  In the strange event that I need something like this, I'll either maybe use something like Eric's ref class ( stackoverflow.com/.../2982037 ).  If someone trying to port C++ code hits an issue that requires something like this, I'd prefer solutions that avoid introducing extra syntax; it only only serves to add more ways for other people to make code less maintainable.  One of the things I like about all the Marshalling functionality is that it's mostly off to the side and ignorable until actually needed.

    I do see Sam's point and *have* encountered situations where it would have made my code slightly simpler, but every time I've hit such a situation I was able to work around it very easily.  I think supporting even that much would add more problems than it would solve.

  • Eric, I hope you'll blog about TypedReference. Not long ago, I had to write the equivalent of htmlTextWriter._attrList[index].value = someValue using Reflection. Because _attrList is an array of RenderAttribute structures, I had to get and set the entire array element just to set the value field. It seems like using FieldInfo.SetValueDirect could have made this a little more efficient.

  • Meh.

    As a long (long, long) time C++ user, I tend to avoid aliases. They're very difficult for the compiler/optimizer to reason with, not to mention humans. If I had a dollar for every bug... (including compiler bugs; I found and developed minimal repros for a couple dozen from Borland, a handful from GCC, and countless ones from Microsoft - no offense).

    Your Max() example triggers neurons in my brain associated with C preprocessor macros - that's what it mentally feels like. Maybe a better example would show how this would be useful, but I can't think of any case where I'd use this (however, I stayed up all night with my sick 23-month-old, so my brain isn't exactly 100% at the moment).

    I'd rather pass everything by value, even going so far as suggesting a Python-esque multiple-return-value syntax:

     (resultA, resultB) = Func();

    so that the "out" keyword is no longer necessary. Though I guess "ref" could still be used if somebody *really* had to pass a large mutable value type (not something I've ever seen recommended. Or something I've ever done after my first week in C#.). You could even treat this as a syntax-only change, converting additional return values to reference parameters under the covers.

    This would be nudging the language in the opposite direction of the "ref return" idea, but to my mind it would result in more clear code.

  • > doing it properly would require some changes to the CLR. Right now the CLR treats ref-returning methods as legal but unverifiable

    In fact, the story is more subtle than that. While Ecma-335 does say that any return by reference is unverifiable, .NET implementation of the spec does a more stringent analysis. In particular, it is verifiable to return a managed pointer to a field of a reference type (i.e. ldflda immediattely followed by ret).

    VC++ actually implements such checks during compilation if compiling with /clr:safe. So:

    ref class Foo

    {

    public:

       int x;

       int% GetX() { return x; }

       int% GetY(int% y) { return y; }

    };

    GetX() will compile successfully and produce verifiable code, but the compiler will bark on GetY().

Page 1 of 4 (60 items) 1234