Dead C Hacks: Reallocating or Changing an object's type in place in .NET? [Brian Grunkemeyer]

Dead C Hacks: Reallocating or Changing an object's type in place in .NET? [Brian Grunkemeyer]

  • Comments 3

I was recently asked if there was a way to reallocate an object on top of another one, or to change the type of an object at runtime. This must have been a somewhat common practice in native C or C++ programs, perhaps something that C++'s placement new facilitated. I'm not a big fan of the idea in general, and it can't be done in the CLR. Here was my response, with additional ramblings:

The only way to change the type of an object is to find the call to "new A()" and replace it with "new B()". This can be made easier by providing factory methods in some cases, so all your client code goes through a wrapper you provide and you allocate the appropriate data yourself on their behalf. But fundamentally we don't have anything like C++'s placement new to allocate B's on top of A's. That would be difficult to implement considering that B's are larger than A's and we've used the space after A for other data structures.

A similar approach we've taken to replacing an allocated object with another one is Array's Resize<T> method, which we could only properly design once we added generics into the language in Whidbey (generics provided very necessary syntactic sugar, helping you avoid a cast). See the code below. Look at how we use the ref parameter here - you might find a technique like this useful. However, ref parameters only affect the individual reference to an object at that call site - if you have multiple references pointing to the same object, they will still point to the old object, not the new one.

[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)] 
public static void Resize(ref T[] array, int newSize) {
   if (newSize < 0)
       throw new ArgumentOutOfRangeException("newSize", Environment.GetResourceString("ArgumentOutOfRange_NeedNonNegNum"));

   if (array == null) {
       array = new T[newSize];
       return;
   }
   if (array.Length != newSize) {
       T[] newArray = new T[newSize];
       Array.Copy(array, 0, newArray, 0, array.Length > newSize? newSize : array.Length); array = newArray;
   }
}

We had to design Array.Resize<T> like this because there's no way to reallocate an object _in place_ in the GC heap. Even if there were with some constraints like it must be the same type but of a different length, it could lead to security vulnerabilities. Imagine a race where someone read the length of an array as 1 million elements then started indexing in that array, and on a second thread someone reallocates that array to 0 elements long then starts allocating other objects in that newly-freed memory. Or if you change the type from one object to another of the same size, you could theoretically run into a race where another thread is calling a method, and suddenly the memory used by its "this" pointer changed to another type, (possibly) meaning virtual method calls called into another object! So even if we did expose this functionality, it would be very dangerous to use. I suspect we'll never build anything along these lines as a general purpose tool. (I've experimented with a technique like this once as a possible perf optimization for StringBuilder to shrink a String instance, but our GC architect was very, very upset. And the code probably didn't work correctly on IA64 due to the processor's weak memory model.)

Along similar lines, we also have added Interlocked.CompareExchange<T> (finally!). One interesting property is we can't make this work for arbitrarily sized value types, so we constrained T to be a reference type (note the "where T : class" syntax below).

[ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)] 
public static T CompareExchange<T>(ref T location1, T value, T comparand) where T : class;

Another interesting technique is allocating memory on the stack or in a block of unmovable heap memory (either memory mapped from a file, a pinned byte[], or something similar), taking the address of that memory, then casting that to a pointer to a value type. We use this a lot for loading our NLS+ data tables (character sorting information, etc). While this doesn't allow you to reallocate something in place, it will save you some memory copies from time to time, and is a generally useful technique if you like C#'s "unsafe" code. (The word unsafe bothers me, because our more complicated P/Invoke code is also unsafe, even though it doesn't explicitly have to use the unsafe keyword.  But it is still a useful flag to force someone to code review your work later in more detail.) 

The complication with this pointer into some fixed block of memory is controlling the lifetime of the memory - you must make sure your memory stays around until your objects using that memory are freed. If you don't do this, you're back to all the dangling pointer problems you may run into in native code, and these problems will be slightly more insidious to debug if you have interesting finalizers, since the finalizer thread will be actively injecting race conditions in your poorly-written code. And the biggest limitation is arrays - unlike C, our arrays are objects and must be allocated in the GC heap. To put data into a managed byte[], you must do a copy. I've been wanting us to explore both a byte[] and a String representation where the data can be in another heap since 1999, but frankly I don't think we can do the String optimization now. (Perhaps one day we'll spend some effort on the byte[] problem, but probably not for quite a while.) Perversely, C#'s fixed statement on Strings locked us into an implementation where the character data is a (somewhat) constant offset from the pointer to the String, meaning that enabling fast pointer manipulation of Strings for external users prohibits us from doing the most elaborate pointer-related tricks with Strings internally.

I hope this helps clarify some of how the CLR is designed, and the limitations it can place on issues. Note that most of these restrictions still existed in C and C++, but you could simply break the rules and your app would mostly work most of the time on some platforms. Hopefully our added discipline (or handicaps) will make debugging a vastly simpler task.

  • "Perversely, C#'s fixed statement on Strings locked us into an implementation where the character data is a (somewhat) constant offset from the pointer to the String, meaning that enabling fast pointer manipulation of Strings for external users prohibits us from doing the most elaborate pointer-related tricks with Strings internally."

    As a big fan of CLR optimization, I hate to see this sort of thing. It would be nice if there was a way an application could indicate to the CLR that it's not doing anything dangerous with strings. When run from something less than full trust, this can be assured implicitly.
  • Unfortunately, there is no easy way of doing this. You could imagine, essentially, that we would implement this as a subclass of String, let's tentatively call it NonGCHeapString. (Whether this is just an implementation detail of String or subclass is not important for this discussion.) If your component is not using the fixed keyword on Strings, then we could probably use this NonGCHeapString class. However, let's say you're building an application that runs plugins for other apps. You start using a NonGCHeapString class, but then you pass this to a newly-loaded plugin written by someone else, and they used the fixed statement on this String. Suddenly we can't do anything.

    One possible approach for fixing this would be to change the codegen for the fixed statement to emit a call to a helper method, passing in a String instance & getting a string that is known to be in the GC heap. Or more directly, we could change the way the fixed statement gets the pointer from a String to using a method call on the String instance itself instead of the current *pString + RuntimeHelpers.OffsetToStringData).

    However, this means there would be some older binaries (compiled against V1 or V1.1, and depending on the timing of the codegen change, possibly V2 as well) that we couldn't use. I don't know how common the fixed statement is on Strings, but this is a little worrisome. I'm of the mindset that people will recompile their apps and move to Whidbey once it comes out, but not everyone can do this easily or quickly in all circumstances.
  • "I don't know how common the fixed statement is on Strings, but this is a little worrisome."

    I've never seen it used outside of examples of how to corrupt the intern table. While use of "fixed" is merely uncommon, using it on strings seems to me it would be extremely rare.

    I agree that there is probably a real application out there that uses fixed on strings for something important, so whatever you come up with would have to handle that situation.

    Still, there are enough possible optimizations to strings that it merits investigation. For example, assuming string is basically an int32 length and wchar*, an optimization to Substring is to make the pointer go into the source string and use a shorter length value. There are GC considerations for this (the new string would probably need to hold a reference to the original), but it's still a very beneficial thing :)
Page 1 of 1 (3 items)