References and Pointers, Part One

References and Pointers, Part One

Rate This
  • Comments 15

Writing code in C# is really all about the programmatic manipulation of values. A value is either of a value type, like an integer or a decimal, or it's a reference to an instance of a reference type, like a string or an exception. Values you manipulate always have a storage location that stores the value; those storage locations are called "variables". Often in a C# program you manipulate the values by describing which variable you're interested in.

In C# there are three basic operations you can do to variables:

* Read a value from a variable
* Write a value to a variable
* Make an alias to a variable

The first two are straightforward. The last one is accomplished by the "ref" and/or "out" keywords:

void M(ref int x)
{
    x = 123;
}
...
int y = 456;
M(ref y);

The "ref y" means "make x an alias to the variable y". (I wish that the original designers of C# had chosen "alias" or some other word that is less confusing than "ref", since many C# programmers confuse the "ref" in "ref int x" with reference types. But we're stuck with it now.) While inside M, the variable x is just another name for the variable y; they are two names for the same storage location.

There's a fourth operation you can do to a variable in C# that is not used very often because it requires unsafe code. You can take the address of a fixed variable and put that address in a pointer.

unsafe void M(int* x)
{
    *x = 123;
}
...
int y = 456;
M(&y);

The purpose of a pointer is to manipulate a variable itself as data, rather than manipulating the value of that variable as data. If x is a pointer then *x is the associated variable.

Clearly pointers are very similar to references, and in fact references are implemented behind the scenes with a special kind of pointer. However, you can do things with pointers that you cannot do with references. For example, this doesn't do anything useful:

int Difference(ref double x, ref double y)
{
    return y - x;
}
...
double[] array = whatever;
difference = Difference(ref array[5], ref array[15]);

That's illegal; it just takes the difference of the two doubles and tries to convert it to an int. But with pointers you can actually figure out how far apart in memory the two variables are:

unsafe int Difference(double* x, double* y)
{
    return y - x;
}
...
double[] array = whatever;
fixed(double* p1 = &array[5])
  fixed(double* p2 = &array[15])
    difference = Difference(p1, p2); // 10 doubles apart

You can do arithmetic on pointers, but you can't do that on refs because in C# there is no way to say to a ref "I want to manipulate the storage location itself, rather than its contents". With pointers, again, the pointer represents the storage location itself; dereferencing the pointer with * gives you access to the variable that lets you get or set the value of that storage location.

Similarly, you can compare a pointer to null, but you can't compare a ref to a variable to null; comparing the ref to null just checks to see if the contents of the variable are null; there is no such thing as a "null ref".

Another thing you can do with pointers is treat them as arrays; you can't do that with refs:

unsafe double N(double* x)
{
  return x[10];
}
...
double[] array = whatever;
fixed(double* p1 = &array[5])
  q = N(p1); // returns array[15];

All of this is of course fraught with peril. We make you mark the code as "unsafe" for a reason; it is not safe to do any of this stuff. When you use pointers directly you are turning off the safety system and taking responsibility yourself for ensuring that every operation on a pointer is one that makes sense. For example, suppose we had passed interior pointers from two difference arrays into Difference above. What would have happened? The result would not have been sensible; it doesn't make any sense to ask how many items are between two elements of two different arrays. It only makes sense to ask that question within one array. Suppose in the code above we had passed the address of array[5] and the array only had 7 elements. What happens when we try to get the fifteenth element? The managed safety system is turned off, so you would not get an array-index-out-of-bounds exception with pointers, you just get garbage or a crashed runtime.

Furthermore, note that the array has to be "fixed" before you can take an interior pointer to it. Fixing an array tells the garbage collector "someone has an interior pointer to this thing; do not move it during compaction until it is unfixed". That causes all kinds of problems. First, it can really mess up the ability of the GC to efficiently manage memory, because now there is a chunk of memory it is not allowed to move. And second, again, you are responsible for doing things safely; if you leave a copy of the pointer lying around and dereference it after the fixed statement has completed then there is no guarantee that the array is still there! You could be dereferencing any old thing.

It's a bit unfortunate that it is such a pain to use interior pointers in an array in C#, because doing so is often useful. We have many situations in the compiler where we would like to pass around locations of variables that are interior to arrays, compare their locations, and so on. Do we have to use unsafe code and fix the array in place to do so? Fortunately no!

Next time: How to make a safe interior pointer to an array that you can still treat as a pointer, more or less.

  • would msdn.microsoft.com/.../system.reflection.emit.opcodes.ldelema.aspx have anything to do with it?

  • @Shuggy: ldelema is what happens already when you call e.g. "Foo(ref a[0])". It's not fundamentally different from obtaining a ref to an object field. It doesn't really help you, though, because of the restrictions on the type itself (you cannot do arithmetic on managed pointers - they become unmanaged when you do that; and this is a CLR restriction, not a C# one).

  • struct Pointer<T> { T[] array; int index; } ?

    You have anticipated my denouement. -- Eric

    ---

    double[] array = whatever;
    fixed(double* p1 = &array[5])
     fixed(double* p2 = &array[15])
       difference = Difference(p1, p2); // 10 doubles apart

    In your this code, you fix both array[5] and array[15]. Is this redundant and only done because it is good form to fix everything you take a pointer for, or does the second fixed actually do anything? Also, shouldn't you just fix the array, as such?

    double[] array = whatever;
    fixed (array) {
     double* p1 = &array[5];
     double* p2 = &array[15];
     difference = Difference(p1, p2);
    }

    Either way is fine. I don't have strong opinions one way or the other. -- Eric

  • "Either way is fine." Oh, good. I thought I was doing it wrong!

  • /me chirps C++ iterators.

    /me curses configurator for doing semantically equivalent with Pointer<T>.

  • And this is why I feel the C memory model is simpler to learn, though harder to use (correctly, all the time). Every variable is a value, but values can have operations to get non-variable values (pointers, arrays). Although you wern't talking about reference types, I've found several people who find it difficult to understand how differently "int a = 1, b = a;" and "Stream a = File.Open("foo.txt"), b = a;" work - and variable refs are doubly confusing, expecially with refs to ref types. C++ confuses it slightly with refs and now r-value refs, but those can be explained functionally once pointers are understood (I know they are fairly differrent in *usage* and *purpose*, but in *function* they are quite similar). Again, to be clear, I'm not recommending that C# should have used C's model, but I am recommending programmers should be learning C - at least after they learn basic structured programming - vars, assignments, branches, loops and procedures.

  • @Pavel

    I was assuming usage with interior pointers in some limited context useful to Eric in common compiler scenarios. Obviously the Pointer<T> options is far more sensible as a general solution, I just wondered if it was going in a radically different direction.

  • @Simon

    Another difference is that one *has* to learn the C memory model (or there will be trouble), but the same isn't true for C#. You might right less efficient code in C#, but it's unlikely to set the app on fire.

  • @Scott: I mentioned that with "harder to use", so I agree about C, but I'm not so sure about "unlikely" to set the app on fire. I've found people who learned Java, C#, etc.. first find it difficult to understand how references work, for example, expecting assigning to a variable initialized with a property to change the property value.

  • >> And this is why I feel the C memory model is simpler to learn, though harder to use (correctly, all the time). Every variable is a value, but values can have operations to get non-variable values (pointers, arrays). I've found several people who find it difficult to understand how differently "int a = 1, b = a;" and "Stream a = File.Open("foo.txt"), b = a;" work

    What you said applies equally to C# object references, however. The fact that you write "Stream a" rather than "Stream* a" is purely a syntactic thing - it's just that C# only has the notion of type of reference, and none for type of object itself, for reference types. The equivalent in C would be declaring all your structs as "typedef struct { ... } *Stream" - then you'd also write "Stream a =  File_Open("foo.txt"); b = a;" etc with same effect.

    So it's not really a "memory model" thing. C++/CLI has the same memory model as far as CLI types are concerned, but it has explicit syntax for object references ("Stream^ s").

    >> C++ confuses it slightly with refs and now r-value refs, but those can be explained functionally once pointers are understood

    But so can be C# "ref" and "out". Indeed, so far as I can see, semantically they're identical to C++ references in all respects, except for where they can be used.

  • "Make an alias to a variable"

    Why isn't this also considered making an alias?

    object a = new Object();

    object alias = a;

    Thanks

    /Jonas

  • @Jonas: because variable "alias" is distinct from variable "a" (they represent different storage locations). If you, say do this:

       alias = null;

    then variable "a" does not also become "null", and therefore "alias" is not an alias for "a".

    The _values_ of those two distinct variables are the same, yes, and refer to the same object. But it's a different thing.

  • I just found this this evening and it's wonderful even for VB. Im a former devlopment engineer from Digital and I refuse to use C.

    This is the only thing that did not traslate:

    Public Overrides Function Equals(ByRef x As Object) As Boolean

               If x Is Nothing Then

                   Return Me.source Is Nothing

               End If

               Dim ptr = TryCast(x, System.Nullable(Of ArrayPtr(Of T)))

               If Not ptr.HasValue Then

                   Return False

               End If

               Return Me = ptr.Value

           End Function

    Could you help me a bit?

    Renee

  • @Pavel: If the restriction on pointer arithmetic using managed pointers is inherited from the CLR, and not C# specific, please explain how interior_ptr works.  msdn.microsoft.com/.../y0fh545k.aspx says "An interior pointer is permitted to perform the same set of operations as native pointers, including comparison and pointer arithmetic."

  • Love the post :) good explanation of it all!

Page 1 of 1 (15 items)