A few days ago on news:comp.lang.c++.moderated, Nicola Musatti wrote:

  • As for GC, pure implementations exist.
    [that add no new extensions to ISO C++]

Not for a pure definition of "pure," they don't. :-)

To explain why C++ pointers are insufficient (unless their semantics were to be changed at least a little, which would mean breaking existing code), consider two counterexamples:

1. Not for a compacting GC. Certainly a bald pointer can't point directly to an object that moves around in memory, because C++ pointers are required to be stable, to always have the same value while pointing to the same object. Changing the semantics of a pointer to make it track will break lots of code, starting with set<T*>, because such tracking pointers cannot be ordered (their values will after all be changed arbitrarily at unpredictable times by the GC). There are also other restrictions, but that's one of the most noticeable. [Aside: Such a tracking pointerlike abstraction is needed, and is provided in C++/CLI. It just can't be spelled * without fundamentally scuttling ISO C++ conformance, is all.]

2. Not for a non-compacting GC, either. This case can be got a lot closer, but even Great Circle / Boehm style collectors impose restrictions that break some conforming C++ programs. In particular, they restrict, if only slightly, the operations that Standard C++ allows on pointers. Consider the following well-formed ISO C++ program with well-defined semantics:

  int* pi = new int(42); // line 1
  pi = (int*)((int)pi ^ 0xaaaaaaaa);

  // ... do other work ...

  pi = (int*)((int)pi ^ 0xaaaaaaaa);
  cout << *pi; // perfectly ok, prints "42", won't crash
  delete pi; // ok

Add-on GCs can't see such disguised pointers, and are liable to reclaim the memory allocated in line 1 before its later use, resulting in an attempt to access freed memory. Boom.

This isn't perverse or theoretical, by the way. Consider "two-way pointers" as one example of a well-known implementation technique where two pointers are XOR'd together like this for a perfectly reasonable and legal use. In particular, a motivation behind two-way pointers is that you can have a more space-efficient doubly linked list if you store only one (not two) pointer's worth of storage in each node. But how can the list still be traversable in both directions? The idea is that each node stores, not a pointer to one other node, but a pointer to the previous node XOR'd with a pointer to the next node. To traverse the list in either direction, at each node you get a pointer to the next node by simply XORing the current node's two-way pointer value with the address of the last node you visited, which yields the address of the next node you want to visit. For more details, see:

  "Running Circles Round You, Logically"
  by Steve Dewhurst
  C/C++ Users Journal (20, 6), June 2002

I don't think the article is available online, alas, but Steve's website has some source code demonstrating the technique.

This perfectly standards-conforming and useful technique won't work correctly with any GC implementation I know of that does not extend the language so that pointers can retain their full standard meaning.

Steve's technique works perfectly fine and unbroken, however, under C++/CLI. It works because C++/CLI preserves exactly the full semantics of * pointers without any limitations. To do so, C++/CLI needed to add a new abstraction for GC semantics instead of pretending that raw pointers are by themselves a complete solution for safe use in a GC environment (they aren't, only because they were never designed to be).

For more about the design motivations behind the ^ declarator (aka a "handle"), see also Brandon Bray's excellent blog entry Behind the Design: Handles posted earlier today.