Recently, I encountered an OOM issue in our load runs. It took me a while to repro it and created a full dump. But once we got the dump, it was pretty straightforward to find the "leak".
1. Is this a native memory leak or .NET memory leak?
0:072> !address -summary
-------------------- Usage SUMMARY -------------------------- TotSize ( KB) Pct(Tots) Pct(Busy) Usage 390e8000 ( 934816) : 44.58% 65.56% : RegionUsageIsVAD 28f79000 ( 671204) : 32.01% 00.00% : RegionUsageFree 9c2f000 ( 159932) : 07.63% 11.22% : RegionUsageImage 30cd000 ( 49972) : 02.38% 03.50% : RegionUsageStack 49000 ( 292) : 00.01% 00.02% : RegionUsageTeb 11246000 ( 280856) : 13.39% 19.70% : RegionUsageHeap 0 ( 0) : 00.00% 00.00% : RegionUsagePageHeap 1000 ( 4) : 00.00% 00.00% : RegionUsagePeb 1000 ( 4) : 00.00% 00.00% : RegionUsageProcessParametrs 2000 ( 8) : 00.00% 00.00% : RegionUsageEnvironmentBlock Tot: 7fff0000 (2097088 KB) Busy: 57077000 (1425884 KB)
-------------------- Type SUMMARY -------------------------- TotSize ( KB) Pct(Tots) Usage 28f79000 ( 671204) : 32.01% : <free> 9eb7000 ( 162524) : 07.75% : MEM_IMAGE 1c9c000 ( 29296) : 01.40% : MEM_MAPPED 4b524000 ( 1234064) : 58.85% : MEM_PRIVATE
-------------------- State SUMMARY -------------------------- TotSize ( KB) Pct(Tots) Usage 407c5000 ( 1056532) : 50.38% : MEM_COMMIT 28f79000 ( 671204) : 32.01% : MEM_FREE 168b2000 ( 369352) : 17.61% : MEM_RESERVE
RegionUsageIsVAD is big and VirtualAlloc goes here (e.g., .NET heap).
0:072> !EEHeap -gcNumber of GC Heaps: 4------------------------------...------------------------------GC Heap Size: Size: 0x2611db9c (638704540) bytes.
Ok, .NET memory usage is pretty high at 630 MB.
2. Which .NET objects are we leaking
0:072> !dumpheap -stat------------------------------Heap 0total 0 objects------------------------------Heap 1total 0 objects------------------------------Heap 2total 0 objects------------------------------Heap 3total 0 objects------------------------------total 0 objectsStatistics: MT Count TotalSize Class Name7aef7c48 1 12 System.Collections.Generic.SortedList`2+KeyList[[System.String, mscorlib],[System.Object, mscorlib]]...
79b57ee8 84541 117150884 System.Object[]1d6af048 15013412 420375536 Microsoft.VisualStudio.TestTools.Common.TimerResult
Ok, there are 15 millions of Microsoft.VisualStudio.TestTools.Common.TimerResult objects. !gcroot confirmed they are rooted.
3. Where are we leaking it?
Use "!dumpheap -mt" and "!do" to examine a few of them:
0:072> !dumpheap -mt 1d6af048 ------------------------------Heap 0 Address MT Size009aa8e8 1d6af048 28 009aabf8 1d6af048 28
...
0133f0e8 1d6af048 28 0133f104 1d6af048 28
072> !dumpobj 0133f0e8 Name: Microsoft.VisualStudio.TestTools.Common.TimerResultMethodTable: 1d6af048EEClass: 1e969b88Size: 28(0x1c) bytesFile: D:\WINDOWS\assembly\GAC_MSIL\Microsoft.VisualStudio.QualityTools.Common\10.0.0.0__b03f5f7f11d50a3a\Microsoft.VisualStudio.QualityTools.Common.dllFields: MT Field Offset Type VT Attr Value Name79b9d470 40003d2 4 System.String 0 instance 08aa3eb4 m_name79ba5af0 40003d3 8 System.DateTime 1 instance 0133f0f0 m_startTime79ba4a94 40003d4 10 System.TimeSpan 1 instance 0133f0f8 m_duration0:072> !dumpobj -nofields 08aa3eb4 Name: System.StringMethodTable: 79b9d470EEClass: 798d5ec0Size: 58(0x3a) bytesFile: D:\WINDOWS\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dllString: _TWI_GetRandomWorkItem
Almost all TimerResult have the same name "_TWI_GetRandomWorkItem". And this is enough for us to track down the "leak". It turns out that in our test code, we have a poorly written algorithm to get a random sequence of items from a list. And it might leads to a "infinite" loop and keeps calling an method that calls TestContext.BeginTimer/TestContext.EndTimer. So each call will create a new Timer Result object. This will eventually cause the app to crash due to OutOfMemoryException.
We can also drill down further on these 15 million objects. Are they stored in a few collections? Yes, 4 big object[] are holding about 11 million of them.
0:022> !dumpheap -mt 79b57ee8 -min 1000000------------------------------Heap 0 Address MT Size11863098 79b57ee8 16777232 total 0 objects------------------------------Heap 1 Address MT Size129d3cf8 79b57ee8 30433544 58fc0038 79b57ee8 33554448 total 0 objects------------------------------Heap 2 Address MT Size54420038 79b57ee8 33554448 total 0 objects------------------------------Heap 3 Address MT Sizetotal 0 objects------------------------------total 0 objectsStatistics: MT Count TotalSize Class Name79b57ee8 4 114319672 System.Object[]Total 4 objects
Spot check some entries in these huge object[] and verified that they are the same TimerResult objects with the same name _TWI_GetRandomWorkItem:
0:022> !dumparray -start 10000 -length 2 -details 11863098 Name: Microsoft.VisualStudio.TestTools.Common.TimerResult[]MethodTable: 79b57ee8EEClass: 798d69a8Size: 16777232(0x1000010) bytesArray: Rank 1, Number of elements 4194304, Type CLASSElement Methodtable: 1d6af048[10000] 0c7fab28 Name: Microsoft.VisualStudio.TestTools.Common.TimerResult MethodTable: 1d6af048 EEClass: 1e969b88 Size: 28(0x1c) bytes File: D:\WINDOWS\assembly\GAC_MSIL\Microsoft.VisualStudio.QualityTools.Common\10.0.0.0__b03f5f7f11d50a3a\Microsoft.VisualStudio.QualityTools.Common.dll Fields: MT Field Offset Type VT Attr Value Name 79b9d470 40003d2 4 System.String 0 instance 08aa3eb4 m_name 79ba5af0 40003d3 8 System.DateTime 1 instance 0c7fab30 m_startTime 79ba4a94 40003d4 10 System.TimeSpan 1 instance 0c7fab38 m_duration[10001] 0c7fab44 Name: Microsoft.VisualStudio.TestTools.Common.TimerResult MethodTable: 1d6af048 EEClass: 1e969b88 Size: 28(0x1c) bytes File: D:\WINDOWS\assembly\GAC_MSIL\Microsoft.VisualStudio.QualityTools.Common\10.0.0.0__b03f5f7f11d50a3a\Microsoft.VisualStudio.QualityTools.Common.dll Fields: MT Field Offset Type VT Attr Value Name 79b9d470 40003d2 4 System.String 0 instance 08aa3eb4 m_name 79ba5af0 40003d3 8 System.DateTime 1 instance 0c7fab4c m_startTime 79ba4a94 40003d4 10 System.TimeSpan 1 instance 0c7fab54 m_duration
0:022> !do -nofields 08aa3eb4 Name: System.StringMethodTable: 79b9d470EEClass: 798d5ec0Size: 58(0x3a) bytesFile: D:\WINDOWS\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dllString: _TWI_GetRandomWorkItem
Anyway, we fixed the issue by switching to a better algorithm of picking a random sequence of items from a collection.
Have a good day.
Jiange