In the previous three posts, we managed to double the speed of file loading time of CLRProfiler through profile-guided optimization in three simple steps. Now let's take a look at reducing CLRProfiler's memory consumption, making it more useful to real world applications.
I managed to create a 10-Gb profile using a performance test. The test program creates 19.7 million managed objects, averaging 195 bytes each, consuming a total of 3.85 memory, with 32 garbage collections. CLRProfiler loads up the file in 83 secounds with 208Kb private working set.
Using sos's DumpHeap -stat commands, we can easily see what is really consuming memory in CLRProfiler:
002f4cc4 2 1040028 CLRProfiler.TimePos618ff9ac 16048 1371112 System.String618eebd4 2 4194336 System.UInt1661902938 40227 4754996 System.Int32002f2190 7870007 188880168 CLRProfiler.SampleObjectTable+SampleObject
The most expensive data type in memory is SampleObjectTable::SampleObject. Actually, there are 7.87 million instances of them, occupying 24 bytes each. The SampleObject class itself has 3 integer fields and one pointer inside. These four fields should consume only 16-bytes, but CLR adds 4 byte for method table pointer and 4 more bytes for sync object.
internal class SampleObject
internal int typeIndex;
internal int changeTickIndex;
internal int origAllocTickIndex;
internal SampleObject prev;
internal SampleObject(int typeIndex, int changeTickIndex, int origAllocTickIndex, SampleObject prev)
this.typeIndex = typeIndex;
this.changeTickIndex = changeTickIndex;
this.origAllocTickIndex = origAllocTickIndex;
this.prev = prev;
If we store SampleObject in an array form, we could convert the previous sample object pointer into an index into that array. Now we can declare it as a structure, and pack them together in a big array, thus removing the 8-byte object overhead. In most cases, typeIndex, changeTickIndex, and OrigAllocTickIndex are small integers which can be stored using 16-bit integers, instead of 32-bit integers. The last field prev, which references to previous SampleObject, could be quite large depending on the problem we're profiling. But normally, the current object and the previous object are not so far apart; that is their differences could be stored as 16-bit integers. To reduce the impact on other code which uses SampleObject, we need to provide a method to reconstruct SampleObject given an index:
/// Create SampleObject when given an index into storage
internal SampleObject GetSampleObject(int index)
UInt16 chunk = m_sampleChunks[index / SampleObjectChunkSize] as UInt16;
int p = index % SampleObjectChunkSize;
UInt16 w0 = chunk[p];
if ((w0 & bit_small) != 0)
if ((w0 & bit_noprev) != 0)
obj = new SampleObject(w0 & 0x3FFF, chunk[p + 1], chunk[p + 2], 0);
obj = new SampleObject(w0 & 0x3FFF, chunk[p + 1], chunk[p + 2], index - chunk[p + 3]);
obj = new SampleObject(
(((int) chunk[p ]) << 16) + chunk[p + 1],
(((int) chunk[p + 2]) << 16) + chunk[p + 3],
(((int) chunk[p + 4]) << 16) + chunk[p + 5],
index - (((int) chunk[p + 6] << 16) + chunk[p + 7]));
Here is what DumpHeap -stat shows after the change:
609bf9ac 11663 1242188 System.String00584a80 4 1808060 CLRProfiler.TimePos609c2938 40176 5176228 System.Int32609aebd4 935 84447264 System.UInt16
188.8 Mb of SampleObject is replaced by 80.2 Mb increase in UInt16 objects (42% of the original size). The 7.87 SampleObjects are packed in 16-bit integer arrays. There will be more saving when running on 64-bit machines.