This blog entry investigates issues concerning the deterministic garbage collection strategy that is employed in Microsoft Dynamics AX 2012. The AX garbage collector (GC) is compared to the Microsoft .NET Framework GC, in both strategy and performance.
The basic strategy of the AX 2012 GC is the same as it was even before AX 4. AX uses a deterministic GC to manage the lifetime of X++ objects and table buffers. The GC is deterministic in the sense that objects are guaranteed to be disposed from the C++ memory space at the earliest possible moment after they are no longer needed.
The GC employs a reference counting algorithm that is enforced by the X++ interpreter. Whenever a reference is made to an object, the reference count is increased. Whenever a reference is removed the reference count is decreased. A reference can be removed when either (a) another value is assigned to a variable that had been holding the reference or (b) the variable that holds the reference drops out of scope.
Below is a diagrammed example of the reference count functionality in the AX GC.
The first line of X++ job code constructs an instance of MyClass and assigns the object reference to the stack frame variable mc. In heap memory on the right, the blue circle represents the MyClass object. The first blue circle has one incoming arrow which represents the reference to the object. The first blue circle is labeled ‘1’ to show the count of references to the object so far.
Next, the assignment mc1=mc creates a second reference to the blue circle object. Now the blue circle labeled ‘2’ shows the new total of two references to the object. (Note the arrow labeled ‘mc2’ should say ‘mc1’.)
Next, the assignment mc1 =null removes one of the references, making the state again like the blue circle labeled ‘1’.
Finally the job ends and the stack frame holding the variable mc goes out of scope. The blue circle labeled ‘0’ is marked with a symbol to indicate the AX GC removes the object from the heap at this moment.
Figure 1 Object deletion through reference counting.
Unfortunately there are more complex scenarios that reveal the weakness of the reference count strategy. Scenarios that involve reference loops require more operations by the GC. Consider the following case of two objects that refer to each other:
Figure 2 Mutually dependent objects, requiring loopcount
In this case the objects should be removed because there are no external references to either object. To handle reference loops, the GC also maintains a loop count. The loop count indicates the number of loops in which the object currently takes part. When the difference of the reference count and the loop count is 0, the memory occupied by the object is removed.
The algorithm works across the client/server boundary as illustrated below:
Figure 3 Object graph spanning the client server boundaries.
The algorithm guarantees that storage is reclaimed as soon as it is possible to do so, and the application programmer has no hooks into this. It is not possible to implement a method that is called when the object is reclaimed (akin to implementing the IDisposable interface in .NET). It is a common misconception that in AX the finalize method can be used for this purpose, yet this method has no special meaning to the AX system.
In this distributed scenario the loop/ref information is maintained though the exchange of suitable messages over the network, causing considerable network traffic and performance degradation.
The AX GC uses a similar algorithm for table buffers. However, this algorithm is simpler for the GC because table buffers cannot hold references to each other, and there is no loop count to maintain.
Microsoft .NET Framework GC
The garbage collection strategy used in .NET is very different from what we just described. It relies on a mark-and-sweep algorithm where the set of objects are examined periodically to check if references are held to them. When no references are found to a given object, the object is removed. This strategy emphasizes that no guarantees are made about whether or when an unreferenced object might be removed. A complex set of criteria are considered before the .NET GC is invoked, and the GC itself has discretions. This is why the .NET garbage collection strategy is said to be non-deterministic.
As an interesting aside, the .NET garbage collection comes in two different flavors. One GC is used on the server side, and the other is used for client side applications. The server GC is optimized for throughput on multiprocessor machines, and the client workstation flavor is optimized for machines with one processor. If you are interested in the differences between the two flavors, the following webpage is a good place to get educated: http://msdn.microsoft.com/en-us/library/s973858.aspx.
The basic problem with the AX GC algorithm described above is that sometimes it scales poorly. The GC performs poorly in the face of an increasing number of object having loops in the object graph. The reason is that any assignment to an object may require extensive analysis to update the loop counts for all the objects in the structure. Plus this can cause significant network traffic when the objects are distributed.
Customers and internal Microsoft developers have seen a few issues that have been caused by the weakness of the AX GC strategy. The scenarios always involve structures that are self-referential containing many objects. Typically a programmer writes code that generates structures in which the objects are a monotonously increasing function of the number of records in a database table. This may work fine for test purposes, but might fail in production cases.
There are workarounds that can be applied to reduce the problem. One workaround is to manage the reference loops through lookups in suitable structures. For example, instead of a standard direct reference, the code could store a GUID, and have an auxiliary structure to map the GUIDs onto references.
Performance Graph of Experimental Results
To illustrate the problems with the AX GC, we have provided the contents of an X++ .XPO file at the end of this blog. You can import the .XPO into your test AX system. This code sample was deliberately designed to demonstrate the worst case behavior, and it does not represent code that is likely to be used in real-life applications.
The X++ sample builds a circular linked list of elements. After the list is built it is dismantled by taking each element out of the loop one at a time. Each time a link is removed from the list, the AX GC must rescan the whole remaining list. In this case all objects are created and destroyed on the same tier, namely the client tier.
The following graph reports the wall clock time that the AX GC consumed as we varied the number of elements our circular linked list. The test was run on a standard laptop computer. The run consumed one core 100%.
Figure 4 Performance of Deterministic GC relative to .NET GC
The x-axis is the number of elements in the circular list. The y-axis is the time spent in milliseconds on the algorithm in Appendix A.
The steeply curved green line is the performance of the deterministic AX GC. The performance degraded exponentially as the number of link nodes increased linearly.
The completely horizontal blue line is the performance achieved when the X++ code is translated into MSIL using the AX 2012 feature. For technical reasons it was not possible to sample the data with resolution of more than 1 second. The execution never managed to register a reading of more than 1 second, even when the number of nodes exceeded 350,000. Note that the x-axis extends only to 30,500.
AX GC Problems Stack Up
The AX deterministic GC has one other flaw. Its traversal of the object graph is recursive. This means that it (in very rare situations) can run out of C++ stack space, which invariably results in a crash on the tier doing the garbage collection (either the client or the server). We have not seen this in production code, but it is a potential problem, even for graphs that do not contain loops.
The weakness of the AX GC is serious enough that the exponential performance degradation shown by our experiment can occur even in less severe X++ samples. The example used in this test was chosen to show the problems in the AX 2012 GC. The example consists of object allocation and de-allocation for loops, plus some glue to make it run. Our X++ sample can be seen as pathological in that it will produce an extreme performance benefit from being converted into MSIL. Use of the MSIL feature can be a workaround in a few cases. However, most realistic X++ code in production AX applications includes frequent reads and writes of data with the database, and those parts of your X++ code do not run any faster in MSIL mode.
The following X++ source code contains two classes:
Please find the X++ source code attached.