|
|
CLR internals, Rotor code explanation, CLR debugging tips, trivial debugging notes, .NET programming pitfalls, and blah, blah, blah...
-
Some time ago I saw a problem from a partner team in Microsoft that an InvalidOperationException is thrown from WeakReference.IsAlive. WeakReference wraps weak GC handle implemented in CLR's Execution Engine (GC handle is also exposed by System.Runtime.InteropServices.GCHandle which supports not only weak handles, but other types too). A weak GC handle will be allocated and assigned to the WeakReference object when the WeakReference object is created. As described by Jeffrey Richiter, the weak GC handle contains pointer to an object, if the object is collected by GC, the GC handle will be cleared to NULL. Most of time WeakReference.IsAlive returns true or false to indicate whether the tracked object is alive. The check is based on whether the underlying GC handle contains a non-NULL pointer or NULL. Similarly, WeakReference.get_Target will return a valid object reference or null. But after the WeakReference object itself becomes unrooted and finalized, the underlying GC handle will be destroyed and any call to IsAlive or get_Target on the WeakReference object will throw InvalidOperationException in V1.X.
How would any method being called on a finalized object? Well, if one object O has a field WR as WeakReference, when O become unrooted and there are no other roots for WR, both objects are considered to be dead and will be put into F-reachable queue for finalization(check also check Jeffrey's article). Since there are no guarantee about order of finalizers (things are a little bit different for critical finalizer), when O's finalizer is executed, WR may already be finalized. Thus inside O's finalizer or after O is resurrected, it could call methods on the finalized WR. In the example I mentioned at the beginning, the problem is some object's finalizer is calling IsAlive on its WeakReference field.
The guideline for finalization says not to use any finalizable field in finalizer, so it's fair for WeakReference's properties to throw exception if they are called in finalizer. But it might be hard for people to understand why IsAlive needs to throw. After all it's only used to check status and doesn't need to access the tracked object if it's already collected. I think the reasoning is that IsAlive is meant to check whether the underlying GC handle tracks a live object, but if the GC handle is already gone during WeakReference's finalization, we can't answer the question.
The most interesting part is to look at history of this design decision. In V1.X, both IsAlive and get_Target property throws exception after the WeakRerence object is finalized; in Beta2 of V2.0, IsAlive still throws, but get_Target won't throw, it will return null after finalization; after Beta2, we made a change so that IsAlive won't throw either, it will return false after finalization. Partly because there are too many people calling WeakReference.IsAlive in finalizers, and it is not a really dangerous thing to do. Note that WeakReference.set_Target always throws after finalization.
So on CLR V2.0 offical released build, you could safely use WeakReference in finalizer.But it is still good practice not to use finalizable objects in finalizer, including WeakReference.
|
-
I got email asking me to explain !Threads output in details. I think this is a good question and a good topic for another installment to the series.
Here is an example I'll use for this post:
0:055> !threads ThreadCount: 202 UnstartedThread: 95 BackgroundThread: 1 PendingThread: 0 DeadThread: 47 PreEmptive GC Alloc Lock ID ThreadOBJ State GC Context Domain Count APT Exception 0 0xed0 0x0014f260 0x2000020 Enabled 0x00000000:0x00000000 0x00149aa0 1 Ukn 1 0xa3c 0x00157d28 0x2001220 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn (Finalizer) XXX 0 0x00166378 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn 4 0x12cc 0x00166540 0x2001020 Enabled 0x00000000:0x00000000 0x00149aa0 0 STA 5 0x12dc 0x00166708 0x2001020 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn 3 0xe7c 0x00175b70 0x2001020 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x00175d38 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x00175f00 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x001760c8 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x00176290 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn System.InvalidOperationException XXX 0 0x00176458 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x00176620 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x001767e8 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn 12 0x7e0 0x001769b0 0x2001020 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn 13 0x15e8 0x00178008 0x2001020 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn 14 0x4d0 0x001781d0 0x2001020 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x00178398 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x00178560 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x00178728 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x001788f0 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x00178ab8 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x00178c80 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x00178e48 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn 21 0x14f0 0x00179010 0x2001020 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn 22 0x1708 0x001791d8 0x2001020 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn 23 0x11f8 0x001793a0 0x2001020 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn 24 0x224 0x00179568 0x2001020 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x00179730 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x001798f8 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn XXX 0 0x00179ac0 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn System.InvalidOperationException XXX 0 0x00179c88 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn System.InvalidOperationException XXX 0 0x00179e50 0x1820 Enabled 0x00000000:0x00000000 0x00149aa0 0 Ukn ...
First !Threads gives some statistics about Thread Store.
ThreadCount: number of total C++ Thread objects in Thread Store.
UnstartedThread: number of C++ Thread objects marked as unstarted. Recall I mentioned in previous blog, if a user creates a C# Thread object, CLR will create an "unstarted" C++ Thread object. When Thread.Start is called on the C# object, CLR will create an OS thread and remove "unstarted" flag from the C++ Thread object.
BackgroundThread: number of C++ Threads (and the corresponding OS threads) considered as background. Being background simply means CLR won't wait the thread for shutting down. Threads created explicitly by using System.Threading.Thread.Start are by default foreground threads; whereas threads wandering into CLR from unmanaged world are by default background threads (Rotor: SetupThread in vm/threads.cpp calls SetBackground(TRUE)). However, whether a thread is background could be changed by using IsBackground property in C# Thread object.
PendingThreads: If an OS thread is created but its ThreadProc hasn't be executed to the place to decrement unstarted counter in Thread Store, the thread is considered to be pending. Number of this type of threads should be quite low.
DeadThreads: Number of C++ Thread objects whose OS threads are already dead but the C++ objects themselves are not deleted yet.
In Rotor, all the five numbers are actually stored in ThreadStore (vm/threads.h) object as its fields.
Then it comes a table of all C++ Thread objects in Thread Store. Let me explain each field.
The first column doesn't have a header. It is the OS thread ID given by debugger just for debugging readability. Because the numbers only exist in debugger process, not the debuggee process, you may see the number being different when you look at a live session than when you debug a dump taken from the same live session. For a "dead" or "unstarted" thread, this column is "XXX".
ID: this is the thread ID assigned by OS, it remains consistent during debugger sessions, but OS could recycle it.
ThreadOBJ: address of C++ Thread object. You could see contents of the object by "dt mscorwks!Thread <address>" if you have symbols for mscorwks.dll.
State: one of the most important fields of the table. For Rotor, it is the C++ Thread's m_State field. It is combination of bit masks to indicate what the status the Thread currently is. All possible states (bit masks) are defined as enum ThreadState in vm/Threads. We already covered several states like TS_Background, TS_Unstarted, and TS_Dead. More states include TS_AbortRequested (this thread is requested to be aborted), TS_AbortInitiated (abort process is already started for this thread), TS_GCSuspendPending (GC is trying to suspend this thread), and etc.
Preemptive GC: also very important. In Rotor, this is m_fPreemptiveGCDisabled field of C++ Thread class. It indicates what GC mode the thread is in: "enabled" in the table means the thread is in preemptive mode where GC could preempt this thread at any time; "disabled" means the thread is in cooperative mode where GC has to wait the thread to give up its current work (the work is related to GC objects so it can't allow GC to move the objects around). When the thread is executing managed code (the current IP is in managed code), it is always in cooperative mode; when the thread is in Execution Engine (unmanaged code), EE code could choose to stay in either mode and could switch mode at any time; when a thread are outside of CLR (e.g, calling into native code using interop), it is always in preemptive mode.
GC Alloc context: allocate context GC might use when it tries to allocate object for this thread. In Rotor, it is m_alloc_context in C++ Thread object.
Domain: which AppDomain the thread is currently in (Rotor: m_pDomain field of C++ Thread class). You could use !DumpDomain or "dt mscorwks!AppDomain" to dump details of the domain. A thread can only be in one domain at a time, but it could switch into different domains. Speical marks will be put on thread's stack to when it transit to another domain.
Lock count: how many locks this thread has taken (Rotor: m_dwLockCount field of C++ Thread class). The locks it tracks include the managed monitors (taken by lock(obj) in C#), BCL's ReaderWriterLock, and certain locks inside CLR's unmanaged code.
APT: COM apartment for the thread, whether the thread is in a single-threaded apartment (STA), multithreaded apartment(MTA) or unknown.
Exception: the last managed exception thrown from this thread. It is saved in a GC handle in the C++ Thread object (Rotor: m_LastThrownObjectHandle).
The last column also indicates which special thread this thread is. However, !Threads only recognize a limited type of special threads for this field, including Finalizer thread, GC thread, Threadpool Worker thread, and Threadpool Completion Port thread. And for special threads which doesn't have a C++ Thread object (a special thread doesn't need to run managed code like debugger helper thread and server GC thread), they can not be displayed here. In Whidbey, a "-special" option is added to !Threads command which will show all special threads in the process as a separate list. Here is a sample output:
0:007> !threads -special ThreadCount: 4 UnstartedThread: 0 BackgroundThread: 3 PendingThread: 0 DeadThread: 0 Hosted Runtime: no PreEmptive GC Alloc Lock ID OSID ThreadOBJ State GC Context Domain Count APT Exception 0 1 828 0029a030 a020 Disabled 06907c38:069081d4 0f59e038 2 MTA 4 2 16fc 0029e980 b220 Enabled 0690424c:069061d4 0021f4a8 0 MTA (Finalizer) 5 3 1c1c 002e71e8 1220 Enabled 028f20f8:028f3f94 0021f4a8 0 Ukn 6 4 1244 0f6fa778 80a220 Enabled 00000000:00000000 0021f4a8 0 MTA (Threadpool Completion Port)
OSID Special thread type 1 e20 DbgHelper 2 1e1c GC 3 1ed4 GC 4 16fc Finalizer 5 1c1c ADUnloadHelper 6 1244 Timer
This posting is provided "AS IS" with no warranties, and confers no rights.
|
-
With knowledge in my previous blog, we could avoid some mistakes in .NET programming.
A C++ Thread is very resource heavy. It is associated with a lot of dynamically allocated memory and some OS handles. So it had better to be cleaned up ASAP after its corresponding OS thread dies. C++ Thread class has a reference count. For its object to be deleted, the ref count has to be dropped to 0 (Rotor: Thread::DecExternalCount in vm\threads.cpp). One interesting point is that the C# Thread object actually keeps a reference to its associated C++ Thread, so a live C# Thread object could keep its C++ Thread from being deleted even if the OS thread is already dead. (On the other hand, C++ Thread also has a reference to C# Thread, but it will break the circle when its own ref count drops to 1). Because C# Thread is a managed object, its lifetime is mostly determined by users. Plus, C# Thread class has a finalizer, so its lifetime will be extended at least one GC. So if user code caches the C# Thread objects or have some ill-behaved finalizers (in another blog entry, I mentioned wrong-doing finalizer on one object could prevent all other object's fianlizer from running), "dead" C++ Thread objects may accumulate over time and some "memory leak" will be observed.
I have an example here to demo the problem and how to debug it using windbg + SOS. In this process, there are 202 C++ Thread objects. Among which 160 are "dead", meaning their associated OS threads are dead. Number of total threads in Thread Store and dead/unstarted threads are showed in "!threads" output. For a "live" thread, OS and debugger thread ID are printed out for the entry, for a "dead" thread, "XXX" is marked at beginning of the line:
0:043> !threads ThreadCount: 202 UnstartedThread: 0 BackgroundThread: 1 PendingThread: 0 DeadThread: 160 PreEmptive GC Alloc Lock ID ThreadOBJ State GC Context Domain Count APT Exception 0 0x1138 0x0015a298 0x20 Enabled 0x00000000:0x00000000 0x00149ac8 1 Ukn 1 0x1148 0x00152530 0x1220 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn (Finalizer) 3 0x114c 0x00177548 0x2001020 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn 4 0x1150 0x00177878 0x2001020 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn 5 0x1154 0x00177c08 0x2001020 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn … 42 0x11e4 0x00180460 0x2001020 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn 22 0x11e8 0x00180838 0x2001020 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn XXX 0 0x00180c10 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn XXX 0 0x00180fe8 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn XXX 0 0x001813c0 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn XXX 0 0x00181750 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn XXX 0 0x00181b28 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn XXX 0 0x00181f00 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn XXX 0 0x001822d8 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn … //continue with a huge list
Now I want to find out why all the "dead" C++ Thread objects are still around. First I could check its ref count if I have symbols for mscorwks.
//0x00180c10 is a dead ThreadOBJ I picked from !Threads output 0:043> dt mscorwks!Thread 0x00180c10 m_ExternalRefCount +0x0cc m_ExternalRefCount : 1
Since the ref count is 1, if this C++ Thread object has a C# Thread object associated with it, the C# object must be the last reference. I could verify if that is the case by checking the C++ object's m_ExposedObject field. It is a weak GC handle (a unmovable pointer to GC reference which doesn't counted as root of the GC object), so dereference it will get the managed object. As mentioned before, C++ Thread object also has a strong handle (m_StrongHndToExposedObject field) to the C# object, but it already cleared the strong handle when ref count drops to 1 to avoid circular reference.
0:043> dt mscorwks!Thread 0x00180c10 m_ExposedObject +0x0c0 m_ExposedObject : 0x00a71054
0:043> dp 0x00a71054 l1 00a71054 00c5c714
0:043> !do 00c5c714 Name: System.Threading.Thread MethodTable 0x79bb8384 EEClass 0x79bb85b0 Size 60(0x3c) bytes GC Generation: 0 mdToken: 0x020000eb (c:\windows\microsoft.net\framework\v1.1.4322\mscorlib.dll) FieldDesc*: 0x79bb8614 MT Field Offset Type Attr Value Name 0x79bb8384 0x4000330 0x4 CLASS instance 0x00000000 m_Context 0x79bb8384 0x4000331 0x8 CLASS instance 0x00000000 m_LogicalCallContext 0x79bb8384 0x4000332 0xc CLASS instance 0x00000000 m_IllogicalCallContext 0x79bb8384 0x4000333 0x10 CLASS instance 0x00000000 m_Name 0x79bb8384 0x4000334 0x14 CLASS instance 0x00000000 m_ExceptionStateInfo 0x79bb8384 0x4000335 0x18 CLASS instance 0x00000000 m_Delegate 0x79bb8384 0x4000336 0x1c CLASS instance 0x00000000 m_PrincipalSlot 0x79bb8384 0x4000337 0x20 CLASS instance 0x00000000 m_ThreadStatics 0x79bb8384 0x4000338 0x24 CLASS instance 0x00000000 m_ThreadStaticsBits 0x79bb8384 0x4000339 0x28 CLASS instance 0x00000000 m_CurrentCulture 0x79bb8384 0x400033a 0x2c CLASS instance 0x00000000 m_CurrentUICulture 0x79bb8384 0x400033b 0x30 System.Int32 instance 2 m_Priority 0x79bb8384 0x400033c 0x34 System.Int32 instance 1575952 DONT_USE_InternalThread 0x79bb8384 0x400033d 0 CLASS shared static m_LocalDataStoreMgr >> Domain:Value 0x00149ac8:0x00c05338 <<
Then I want to check root of the C# Thread object to see who keeps it alive:
0:043> !gcroot 00c5c714 Scan Thread 0 (0x1138) ESP:12f69c:Root:0xc5b3f4(System.Object[])->0xc5c714(System.Threading.Thread) …
So there is an array who keeps a reference to a "dead" C# Thread. This looks interesting. I could check all other C# Thread objects in the process using !DumpHeap command. !DumpHeap could dump objects in GC heap for a particular type specified by "-type" option:
0:043> !DumpHeap -type System.Threading.Thread Address MT Size Gen 0x00c054c8 0x79bb8384 60 1 System.Threading.Thread 0x00c5b730 0x79bc81d4 28 2 System.Threading.ThreadStart 0x00c5b74c 0x79bb8384 60 2 System.Threading.Thread 0x00c5b7bc 0x79bc81d4 28 2 System.Threading.ThreadStart 0x00c5b7d8 0x79bb8384 60 2 System.Threading.Thread 0x00c5b820 0x79bc81d4 28 2 System.Threading.ThreadStart 0x00c5b83c 0x79bb8384 60 2 System.Threading.Thread 0x00c5b884 0x79bc81d4 28 2 System.Threading.ThreadStart 0x00c5b8a0 0x79bb8384 60 2 System.Threading.Thread 0x00c5b8e8 0x79bc81d4 28 2 System.Threading.ThreadStart 0x00c5b904 0x79bb8384 60 2 System.Threading.Thread 0x00c5b94c 0x79bc81d4 28 2 System.Threading.ThreadStart 0x00c5b968 0x79bb8384 60 2 System.Threading.Thread 0x00c5b9b0 0x79bc81d4 28 2 System.Threading.ThreadStart …//long list total 401 objects
Because !DumpHeap match type by string, so it also dumps ThreadStart objects. Because every C# Thread object created by user code always has a ThreadStart object (but C# Thread created by System.Thread.CurrentThread may not have a ThreadStart), so they show up as a pair. among 401 such objects, 200 are C# Thread objects, roughly match the number of C++ Thread objects (the number doesn't have to be the same because not every C++ Thread object has a C# counterpart created). Generation for most of C# Thread objects are 2, meaning they already survive at least 2 GCs. When I track roots of those C# Thread objects, they all point to the array. In this case, we need to look closely to the source to see whether it is necessary to cache all the C# Thread objects in an array.
Another related topic is that CLR relies on DLL_THREAD_DETACH notification to mscorwks.dll's DllMain (Rotor: EEDllMain in vm\ceemain.cpp) to know an OS thread is dead, thus detach the related C++ Thread. Using TerminateThread API is already notoriously bad in unmanaged programming, here we see another reason not to call it in managed code: if TerminateThread is called on a managed OS thread, among other bad effect (e.g. back out code not executed), CLR will not get thread detach notification. Because C++ Thread object has references to OS thread's stack address, failing to detach it from the OS thread will cause crash at random place.
This posting is provided "AS IS" with no warranties, and confers no rights.
|
-
If you use SOS’s !Threads command during debugging a lot, you should be familiar with such output:
0:003> !threads PDB symbol for mscorwks.dll not loaded Loaded Son of Strike data table version 5 from "C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\mscorwks.dll" ThreadCount: 12 UnstartedThread: 5 BackgroundThread: 1 PendingThread: 0 DeadThread: 5 PreEmptive GC Alloc Lock ID ThreadOBJ State GC Context Domain Count APT Exception 0 0xb74 0x0014f230 0x20 Enabled 0x00000000:0x00000000 x00149aa8 1 Ukn 2 0xb58 0x00157cf8 0x1220 Enabled 0x00000000:0x00000000 x00149aa8 0 Ukn (Finalizer) XXX 0 0x001665f0 0x1820 Enabled 0x00000000:0x00000000 x00149aa8 0 Ukn XXX 0 0x0016d348 0x1400 Enabled 0x00000000:0x00000000 x00149aa8 0 Ukn XXX 0 0x0016d510 0x1820 Enabled 0x00000000:0x00000000 x00149aa8 0 Ukn XXX 0 0x0016d9d0 0x1400 Enabled 0x00000000:0x00000000 x00149aa8 0 Ukn XXX 0 0x0016db98 0x1820 Enabled 0x00000000:0x00000000 x00149aa8 0 Ukn XXX 0 0x0016e248 0x1400 Enabled 0x00000000:0x00000000 x00149aa8 0 Ukn XXX 0 0x0016e410 0x1820 Enabled 0x00000000:0x00000000 x00149aa8 0 Ukn XXX 0 0x0016e740 0x1400 Enabled 0x00000000:0x00000000 x00149aa8 0 Ukn XXX 0 0x0016e908 0x1820 Enabled 0x00000000:0x00000000 x00149aa8 0 Ukn XXX 0 0x0016ec98 0x1400 Enabled 0x00000000:0x00000000 x00149aa8 0 Ukn
Have you ever wondered what exactly are in the list? Why the number of threads listed here doesn’t match the number of "real" threads in the process (in the example above, I only have 4 threads in the process, but !Threads shows 12)? Why some "real" threads have entries here, some don't? Maybe they are managed System.Threading.Thread objects (but the number in the list might not match number of Thread objects either)? Answers to those questions are tied to how CLR implements threads and manages thread information. CLR provides classes in System.Threading namespace as threading APIs. As you know, threading is implemented by utilizing native threads in underlying OS (Windows, in CLR case). CLR just piggybacks managed threads to native OS threads. Users use BCL System.Threading.Thread objects (I’ll call it as C# Thread below) to control managed threads just as thread HANDLE is used to control to windows native threads. If you ever checked contents of a C# Thread object, you will find it is quite small (here I use SOS !DumpObj command, you could also see it using Visual Studio or other managed debuggers):
0:003> !DumpObj 0x00c03248 Name: System.Threading.Thread MethodTable 0x79bb8384 EEClass 0x79bb85b0 Size 60(0x3c) bytes GC Generation: 0 mdToken: 0x020000eb (c:\windows\microsoft.net\framework\v1.1.4322\mscorlib.dll) FieldDesc*: 0x79bb8614 MT Field Offset Type Attr Value Name 0x79bb8384 0x4000330 0x4 CLASS instance 0x00000000 m_Context 0x79bb8384 0x4000331 0x8 CLASS instance 0x00000000 m_LogicalCallContext 0x79bb8384 0x4000332 0xc CLASS instance 0x00000000 m_IllogicalCallContext 0x79bb8384 0x4000333 0x10 CLASS instance 0x00000000 m_Name 0x79bb8384 0x4000334 0x14 CLASS instance 0x00000000 m_ExceptionStateInfo 0x79bb8384 0x4000335 0x18 CLASS instance 0x00000000 m_Delegate 0x79bb8384 0x4000336 0x1c CLASS instance 0x00000000 m_PrincipalSlot 0x79bb8384 0x4000337 0x20 CLASS instance 0x00000000 m_ThreadStatics 0x79bb8384 0x4000338 0x24 CLASS instance 0x00000000 m_ThreadStaticsBits 0x79bb8384 0x4000339 0x28 CLASS instance 0x00000000 m_CurrentCulture 0x79bb8384 0x400033a 0x2c CLASS instance 0x00000000 m_CurrentUICulture 0x79bb8384 0x400033b 0x30 System.Int32 instance 2 m_Priority 0x79bb8384 0x400033c 0x34 System.Int32 instance 1498008 DONT_USE_InternalThread 0x79bb8384 0x400033d 0 CLASS shared static m_LocalDataStoreMgr
CLR actually needs much more information about a thread than fields of the C# Thread class, and such information is needed in CLR's unmanaged part where it's not easy to access managed objects. So inside Execution Engine (so far Execution Engine is still written in unmanaged code), there is an unmanaged C++ class also called Thread (let me call it C++ Thread) to keep all information for an OS native thread (OS thread). In Rotor, this class is defined in vm\threads.h. CLR needs to create a C++ Thread object for every OS thread EE knows of, that is, every thread which ever ran managed code. Such threads could either be (a) created explicitly by users using C# Thread.Start, (b) an unmanaged OS thread who has ever visited managed world, e.g, through interop, or (c) a special OS thread in CLR which might run managed code. For case a, when a user creates a C# Thread object, CLR will create a C++ Thread object, link it to the C# Thread object, and mark it as unstarted (Rotor: SetupUnstartedThread in vm\threads.cpp). Once Start method is called on the C# Thread object, CLR will create an OS thread, in the OS thread’s ThreadPproc (Rotor: ThreadNative::KickOffThread in vm\ComSynchronizable.cpp), CLR will save the C++ Thread object’s address to the OS thread's TLS (Thread Local Storage) and mark it to be started (Rotor: ThreadStore::TransferStartedThread in vm\threads.cpp); For case b, at entry point for an unmanaged OS thread to managed world, CLR will create a C++ Thread object and also use TLS to associate it with the OS thread (Rotor: SetupThread in vm\threads.cpp); Case c is similar to case b, except CLR might set up C++ Thread earlier. In any case, the C++ Thread object is the primary place for CLR to store information regarding to a managed thread. When CLR needs to access the information, it will just fetch the object from the OS thread's TLS. In that sense, the C# Thread is more like a managed proxy for the C++ Thread. In fact, for case b and c, there will be no C# Thread objects created unless users call System.Threading.Thread.CurrentThread. C++ Thread tracks its corresponding C# Thread using a GCHandle (m_ExposedObject field in C++ Thread object); C# Thread tracks its C++ Thread using a native pointer(DONT_USE_InternalThread field in C# Thread). You could verify this circular reference in debugger if you have symbols for mscorwks.dll:
0:003> !do 0x00c03248 Name: System.Threading.Thread … MT Field Offset Type Attr Value … 0x79bb8384 0x400033c 0x34 System.Int32 instance 1498008 DONT_USE_InternalThread … 0:013> dt mscorwks!Thread 0n1498008 … +0x0c0 m_ExposedObject : 0x00a710e4 … 0:013> dp 0x00a710e4 l1 00a710e4 00c03248
CLR uses a data structure called Thread Store (Rotor: ThreadStore in vm\threads.h) to keep all C++ Threads. What you see in output of !Threads command is actually list of C++ Threads in Thread Store. The “ThreadOBJ” field is address of a C++ Thread object. Other fields are important information about the C++ Thread, including OS thread ID for the corresponding OS thread. Here you could see not every live OS thread has a C++ Thread, which could be explained by the fact that CLR only creates a C++ Thread for an OS thread which ever runs managed code. Meanwhile you may also find some C++ Threads don't have an OS thread associated with them (the ID fields are “XXX”). They are either (a)unstarted, (b) failed to started, (c)used to represent an live OS thread which is now dead. For case (a), CLR will wait until C# Thread.Start is called and create an OS thread for it then; for (b) and (c), the C++ Threads object will be deleted some time later.
This posting is provided "AS IS" with no warranties, and confers no rights.
|
-
Question: How many threads does a typical managed process have when it just starts to run?
Answer: regardless how many threads the user creates, there are at least 3 threads for a common managed process after CLR starts up: a main thread which starts CLR and run user's Main method, CLR debugger helper thread which provides debugging service for interop debuggers like Visual Studio, and the finalizer thread which runs finalizers for unreachable objects. Depends on what the program does, CLR might create more threads to perform special tasks.
Sometimes it is important to know what "special" threads would be created in CLR so we could understand better the implicit impact of our managed programs. Here is a list of most common special threads:
1. Finalizer thread. The thread is to run finalizers for "dead" objects. This thread is created when GC heap is initialized during EE start up. In Rotor, the thread proc for the thread is GCHeap::FinalizerThreadStart in vm\gcee.cpp. Because GC is undeterministic and finalizers are executed in a separate thread, you can't predict when exactly an object will be finalized. Because there is only one thread to run all finalizers, if one finalizer is blocked, no other finalizers could run. So it is discouraged to take any lock in finalizer. Also see Maoni Stephens's blog for details about finalizer thread.
2. Debugger helper thread. As its name suggests, this thread helps debuggers (mixed mode and managed debugger, but not pure unmanaged debug like windbg) to get information of the managed process and to execute certain debugging operations. The thread is created when EE initializes debugger during start up. In Rotor, the thread proc for this thread is DebuggerRCThread::ThreadProcStatic (debug\ee\Rcthread.cpp). Also see Mike Stall's blog about impact of this helper thread。
3. Concurrent GC thread (doesn't exist in Rotor). As explained in Maoni and Chris Lyon's blog, concurrent GC is a special GC mode which allows garbage to be collected while managed threads are running simultaneously. To achieve this goal, CLR creates a thread to perform GC concurrently with user threads. The thread is only created when CLR decides to do a concurrent GC (even when concurrent GC mode is on, not every GC is concurrent, read Maoni's blog for details) and will be recycled when there are no concurrent GC work to do.
4. Server GC threads (doesn't exist in Rotor). Maoni and Chris also explained Server GC mode where on multi-process machine CLR creates one GC heap for each CPU and one thread to do GC for each heap. When Server GC mode is enabled, server GC threads will be created at EE start up time when GC heaps are initialized.
5. App Domain unload helper thread. In CLR V1.X, when a thread requests to unload an App Domain and the thread is in that App Domain itself, it needs to create a worker thread to do the unloading work. The worker thread will be dead once the target AD is unloaded. In Rotor, the thread starts with UnloadThreadWorker.ThreadStart (bcl\system\Appdomain.cs). In Whidbey, all AD unload work is performed in a special thread regardless whether the requesting thread is in the unloading domain. The helper thread is created when first non-default App Domain is created (default domain is never unloaded) and will stay alive since then. Also see Chris Brumme's blog about details of AD unload.
6. Threadpool threads. Depends on how a program use CLR threadpool, CLR might create threads of a varieties of types. There is only one thread for some thread type. For other types, number of threads is related to number of CPUs, the work load, and some user configurable settings. The thread types including wait threads (threads to perform asynchronized wait, could be more than one); worker threads (threads to execute user work item, could be more than one); Completion port threads (threads wait for completion port IO in Windows, could be more than one, doesn't exist in Rotor); Gate thread (thread help to monitor status of completion port threads and worker threads, only one); Timer thread (thread manages timer queue, only one).
|
-
I changed the program in previous post to use new Whidbey syntax.
using namespace System;
ref class RefT { public: RefT () {Console::WriteLine ("RefT::RefT");} ~RefT () {Console::WriteLine ("RefT::~RefT");} !RefT () {Console::WriteLine ("RefT::!RefT");} };
value class ValueT { //constructor is not allowed for value type //destructor is not allowed for value type };
int main() { { //1. finalizer will be called in asynchronizied fashion RefT ^ rrt = gcnew RefT; }
{ //2. Dispose is called at “delete” and finalizer is suppressed RefT ^ rrt = gcnew RefT; delete rrt; }
{ //3. Dispose is called at end of the block and finalizer is suppressed RefT rt; } { ValueT vt; } { ValueT ^ pvt = gcnew ValueT; }
{ ValueT * pvt = new ValueT; } return 0; }
First thing to notice is __gc and __value are replaced by ref and value, this is definitely clearer; then RefT now could have another method !RefT. What does this new methods do and how is related to the other ones? Having checked IL generated by Whidbey cl, I found RefT is translated into something like:
class RefT : IDisposable { //constructor RefT () { Console.WriteLine (“RefT::RefT”); }
//Dispose methods Dispose (bool disposing) { if (disposing) { ~Ref(); } else { try { !Ref(); } finally { Object.Finalize(); } } }
Dispose () { Dispose (true); SuppressFinalize (this); }
//finalizer Finalize () { Dispose (false); }
//body of Dispose and Finalize ~RefT () { Console.WriteLine (“RefT::~RefT”); }
!RefT () { Console.WriteLine (“RefT::!RefT”); }
}
Basically a reference type with destructor (method starts with ~) implements IDisposable interface. Its destructor becomes Dispose method; we could also define finalizer for a reference type using the “!” syntax. According to my test, those two are independent to each other. If I only define “~” function, not the “!” one, the class will only have Dispose methods, no Finalize will be generated although Dispose still call SuppressFinalize; if I only define “!” method without the “~” one, I got a compiler warning and a class which has a finalizer but doesn't implement IDisposable.
Other new things include using tracking handle (“^”) instead of pointer for reference to GC type and gcnew to indicate the memory is allocated in managed heap.
All the new designs make CLR concepts (reference/value type, Finalize, Dispose, managed heap) first class citizen in C++. As a CLR team member, I think this is much clearer than special annotation or mapping new concepts to existing features which have different semantics. However, from a C++ user's point of view, the changes seem to draw C++ closer to C#. I'm not sure if everyone would love them.
Back to finalization: for tracking handles, things are still similar to pointers in V1.X. If delete is not called on a tracking handle (like the first block in main), some time in the future finalizer will run and GC will collect the object; if "delete" is called (like the 2nd block in main), Dispose (~ function) is called and finalizer is suppressed, the object will still be GCed later.
The 3rd block in main is the most interesting one: we could create reference type object “on stack” (of course, the object is still in heap, we just save a reference in stack) and this “stack object” has the traditional C++ finalization semantics: when the variable goes out of scope, its destructor (Dispose method) will be called automatically if it has one. The generated IL for block 3 looks to be something like this:
RefT r = new RefT; try { } finally { ((IDisposable)r).Dispose (); }
This looks almost same as C#'s using statement. Implementing C++'s automatic destruction by Disposable pattern is a brilliant idea. I just worry the new “stack object” syntax will create new confusion about where the object really lives. I tend to think the stack object is a holder, similar to auto_ptr, which holds reference to an object in heap and will delete the object when it goes out of scope. I did have question at the beginning about whether this holder tracks ownership of the object (like auto_ptr) or does ref counting (like traditional smart pointer) to make sure if multiple holders reference the same object, only the last one dispose the object. But there seems to be no way to make two holders to point to the same object. E.g: this code doesn't compile with compliant “operator=” isn't available for RefT, even if you define operator= for RefT, it only applies to the object in heap, not the holder on stack:
RefT rt1; { RefT rt2; rt1 = rt2; } //I assumed rt2 would reference to a disposed object here
After all this “holder” is more of syntax sugar other than a real smart pointer class, it's easy to guarantee they don't represent the same object in heap. My worry might be unnecessary, but it's just an example how it could be confused.
With so many changes, it's inappropriate to call it managed extension to C++ anymore. Now people refer the new language mostly as C++/CLI. I really feel sorry for those who have to rewrite their C++ code for .NET platform again and again. But I think Whidbey lays down a foundation which could last for generations.
|
-
As a C++ fan, I'm a long time admirer for deterministic finalization. I think introduction of garbage collection to C style language by Java and .Net is a huge improvement. However, I found lose of deterministic destructor is almost unacceptable when I first enter Java/.Net world. Of course I'm used to it now, but it's still quite confusing to me for C# to use C++ destructor syntax for Finalizer. And in managed extension of C++, destructor becomes something totally different for managed data type. I bet a lot of experienced C++ developers make mistake to use finalizer as if it was destructor when they first try .Net. So when I read the new changes in Whidbey version of C++ from Stan Lippman's blog, I'm very excited and can't wait to give it a try.
But before we look into the new features, let's go over how old version of managed C++ handles destructors. I wrote this simple program:
#using <mscorlib.dll>
using namespace System;
__gc class RefT { public: RefT () {Console::WriteLine ("RefT::RefT");} ~RefT () {Console::WriteLine ("RefT::~RefT");} };
__value class ValueT { ValueT () {Console::WriteLine ("ValueT::ValueT");} //destructor is not allowed for value type };
int main() { { //1. auto-generated finalizer will be called in asynchronizied fashion RefT * prt = new RefT; }
{ //2. Dispose is called at “delete” and finalizer will be suppressed RefT * prt = new RefT; delete prt; }
{ ValueT vt; } { //value type can't be created in GC heap ValueT * pvt = __nogc new ValueT; delete pvt; } return 0; }
I compiled it with V1.1 C++ compiler and checked generated IL code using ildasm. RefT is compiled to something like this:
class RefT { RefT () { Console.WriteLine (“RefT::RefT”); }
void Finalize () { Console.WriteLine (“Ref::~Ref”); }
void __dtor () { GC.SuppressFinalize (this); Finalize (); } }
Here we could see that C++ destructor is mapped to CLR finalizer and a method “__dtor” is added to call finalizer and SupressFinalize.
In main, the first block creates a RefT object in heap and leaves it as garbage. Sometime later, finalizer (~RefT) will run and the object will be collected. In the second block, we “delete” the object. This is translated into a call to __dtor in IL. So “delete” acts more like Dispose method recommended by IDisposable pattern: the object is not freed but the contents are disposed and finalizer won't run on the object later.
|
-
One day I was debugging a problem where a Waston dialog popped up on a process. What surprised me was that on the stack where Waston was triggered, there was a unmanaged C++ function with a try-catch(…) block. To my understanding, this block should catch any user mode exception thrown in Windows, including exception from RaiseException call (e.g, C++ exceptions), AV, stack overflow, and etc. Why an exception could escape such a block and become unhandled (thus Waston showed up)? I found the exception was a debug break. In X86, it is triggered by opcode 0xCC or “int 3”. When I debugged into VCRT’s EH code, I found catch (…) deliberately let debug break go. It does make sense: debug break is meant to stop the debugger so source code should never handle it. I just never realized it before.
Another interesting part is where this debug break was from, the code of the process never calls DebugBreak. After I debugged more, the problem turned out to be the bug I mentioned in my previous blog entry: a premature GC issue. managed code passed a Delegate to unmanaged code without telling GC to extend its lifetime. When unmanaged code called the callback, the managed Delegate object was already collected so unmanaged code called into garbage memory. The memory happened to be filled with 0xCC so when the process tried to execute this code, it fired int 3, then Waston kicked in.
This posting is provided "AS IS" with no warranties, and confers no rights.
|
-
There is a bug in this program below, try to see if you could catch it.
Test.cs (compiled to DelegateExample.exe):
using System; using System.Threading; using System.Runtime.InteropServices;
class Test { delegate uint ThreadProc (IntPtr arg);
private uint m;
public Test (uint n) { m = n; }
uint Reflect (IntPtr arg) { Console.WriteLine (m); return m; }
static void Main () { Test t = new Test (1); ThreadProc tp = new ThreadProc (t.Reflect); NewThread (tp); Thread.Sleep (1000); }
[DllImport("UsingCallback")] static extern void NewThread (ThreadProc proc); }
UsingCallback.cpp (compiled to UsingCallback.dll):
_stdcall void NewThread (LPTHREAD_START_ROUTINE cb) { DWORD id = 0; CreateThread (NULL, 0, cb, NULL, 0,&id); }
Yes, here is the problem: in the cs file, the managed code passes a Delegate object to unmanaged code which will create a new thread to call the delegate. Since unmanaged code has no way to tell CLR how it plans to use the object, from CLR's point of view, there are no live roots for the Delegate object after the line “NewThread (tp);”. Thus the object is eligible to be garbage collected (GC) after the call, even if the new thread might not start yet. So it's possible for the Delegate to become trash before unmanaged code invokes it and cause unspecified failure. A fix is to add GC.KeepAlive before Main returns:
Test t = new Test (1); ThreadProc tp = new ThreadProc (t.Reflect); NewThread (tp); Thread.Sleep (1000); GC.KeepAlive (tp);
One thing annoying about this kind of bug is that GC is nondeterministic, the program could work just fine 99% of time, but only crashes under stress situation. Another thing make the problem hard to find is that it's not very intuitive by looking at the source. You might know the theory that when a variable is not used anymore, it will not be reported as a live root; but it would take quite some time to figure out at which point which variable is dead, plus there's no way to verify if CLR agrees with your analysis.
I'll show you how to use SOS.dll to check the internal data structure used by CLR to determine variable lifetime. When JIT compiles code, it generates such variable aliveness information (GC info) for each method and saves it along with the machine code of the method. When GC happens, it will check GC info for every method in the stack to find out which variable is alive and will use the live variables as object roots. GC info is highly compacted, but SOS.dll has “!GCInfo” command to crack it and show it in a human-readable way. This approach only works in assembly level, so stop reading if you are not interested in assembly language. :)
I compiled test.cs above using Visual studio to a "Debug" build, and launched the program under Windbg. After the Main method is JITted, I could disassemly the generated native code of the method using "!SOS.u" command:
0:000> !u 02f00058 Will print '>>> ' at address: 02f00058 Normal JIT generated code [DEFAULT] Void Test.Main() Begin 02f00058, size 60 >>> 02f00058 55 push ebp 02f00059 8bec mov ebp,esp 02f0005b 83ec08 sub esp,0x8 02f0005e 57 push edi 02f0005f 56 push esi 02f00060 53 push ebx 02f00061 33ff xor edi,edi 02f00063 33db xor ebx,ebx 02f00065 b9e850ad00 mov ecx,0xad50e8 (MT: Test) 02f0006a e8a91fbcfd call 00ac2018 (JitHelp: nc) 02f0006f 8bf0 mov esi,eax 02f00071 8bce mov ecx,esi 02f00073 ba01000000 mov edx,0x1 02f00078 ff152051ad00 call dword ptr [00ad5120] (Test..ctor) 02f0007e 8bfe mov edi,esi 02f00080 b9c451ad00 mov ecx,0xad51c4 (MT: Test/ThreadProc) 02f00085 e88e1fbcfd call 00ac2018 (JitHelp: nc) 02f0008a 8bf0 mov esi,eax 02f0008c 689350ad00 push 0xad5093 02f00091 8bd7 mov edx,edi 02f00093 8bce mov ecx,esi 02f00095 ff152452ad00 call dword ptr [00ad5224] (Test/ThreadProc..ctor) 02f0009b 8bde mov ebx,esi 02f0009d 8bcb mov ecx,ebx 02f0009f ff152c51ad00 call dword ptr [00ad512c] (Test.NewThread) 02f000a5 b9e8030000 mov ecx,0x3e8 02f000aa ff155084bb79 call dword ptr [mscorlib_79990000+0x228450 (79bb8450)] (System.Threading.Thread.Sleep) 02f000b0 90 nop 02f000b1 5b pop ebx 02f000b2 5e pop esi 02f000b3 5f pop edi 02f000b4 8be5 mov esp,ebp 02f000b6 5d pop ebp 02f000b7 c3 ret
SOS's "!u" is similar to Windbg's "u", but it shows more data because managed code is self-describable. For example, for those indirect calls, if the target is a managed function, "!u" could tell us the function name.
To check its GC info, we need to get the method's MethodDesc first (Method descriptor, CLR's data structure to keep all information about one method). We could use "!ip2md" to find out the method desc from any instruction pointer in the method:
0:000> !ip2md 02f00058 MethodDesc: 0x00ad50a8 Jitted by normal JIT Method Name : [DEFAULT] Void Test.Main() MethodTable ad50e8 Module: 151ad0 mdToken: 06000003 (D:\projects\DelegateExample\bin\Debug\DelegateExample.exe) Flags : 10 Method VA : 02f00058
This command shows some important information about the method. To check GC info, we only need to pass the MethodDesc pointer itself to “!GCInfo”:
0:000> !gcinfo 0x00ad50a8 Normal JIT generated code Method info block: method size = 0060 prolog size = 9 epilog size = 7 epilog count = 1 epilog end = yes saved reg. mask = 000F ebp frame = yes fully interruptible=yes double align = no security check = no exception handlers = no local alloc = no edit & continue = yes varargs = no argument count = 0 stack frame size = 2 untracked count = 0 var ptr tab count = 0 epilog at 0059 60 E5 C0 45 |
Pointer table: F0 7B | 000B reg EDI becoming live 5A | 000D reg EBX becoming live F0 42 | 0017 reg EAX becoming live 72 | 0019 reg ESI becoming live 4A | 001B reg ECX becoming live F0 03 | 0026 reg EAX becoming dead 08 | 0026 reg ECX becoming dead F0 44 | 0032 reg EAX becoming live 30 | 0032 reg ESI becoming dead 72 | 0034 reg ESI becoming live 57 | 003B reg EDX becoming live 4A | 003D reg ECX becoming live 06 | 0043 reg EAX becoming dead 08 | 0043 reg ECX becoming dead 10 | 0043 reg EDX becoming dead 4C | 0047 reg ECX becoming live 0E | 004D reg ECX becoming dead 30 | 004D reg ESI becoming dead F1 1B | 0060 reg EBX becoming dead 38 | 0060 reg EDI becoming dead FF |
Output of the command has 2 sections, the first part is method info block, which contains some basic information about the JITted code, like size of the method, size of the prolog, and etc. The 2nd part is pointer table, on which I'll spend most of time. Pointer table describes lifetime of every GC reference inside the method. It has 3 columns, the first one is byte encodings, which we don't need to care about; the second column is the offset in the JITted code. E.g, this method's code starts from 02f00058, so 000B means the instruction at 02f00058+B = 2F00063, “xor ebx,ebx”; the third column tells us change of lifetime for a GC pointer at that instruction. E,g, “000B reg EDI becoming live” means starts from 2F00063, register EDI is a live root; similiarly, we can see EDI becomes dead at offset 60, end of the method (0060 reg EDI becoming dead). So whatever variable EDI is used to store, its lifetime is from beginning to end of the method. To understand the pointer table, it's better to interweave the table with the JITted code. In Whidbey, SOS has "!u -gcinfo" to do the job; but for Everett, I have to manually put them together and show it along with the source code:
02f00058 55 push ebp 02f00059 8bec mov ebp,esp 02f0005b 83ec08 sub esp,0x8 02f0005e 57 push edi 02f0005f 56 push esi 02f00060 53 push ebx 02f00061 33ff xor edi,edi GCInfo: 000B reg EDI becoming live 02f00063 33db xor ebx,ebx GCInfo: 000D reg EBX becoming live
Test t = new Test (1);
02f00065 b9e850ad00 mov ecx,0xad50e8 (MT: Test) 02f0006a e8a91fbcfd call 00ac2018 (JitHelp: nc) GCInfo: 0017 reg EAX becoming live 02f0006f 8bf0 mov esi,eax GCInfo: 0019 reg ESI becoming live 02f00071 8bce mov ecx,esi GCInfo: 001B reg ECX becoming live 02f00073 ba01000000 mov edx,0x1 02f00078 ff152051ad00 call dword ptr [00ad5120] (Test..ctor) GCInfo: 0026 reg EAX becoming dead GCInfo: 0026 reg ECX becoming dead 02f0007e 8bfe mov edi,esi
ThreadProc tp = new ThreadProc (t.Reflect);
02f00080 b9c451ad00 mov ecx,0xad51c4 (MT: Test/ThreadProc) 02f00085 e88e1fbcfd call 00ac2018 (JitHelp: nc) GCInfo: 0032 reg EAX becoming live GCInfo: 0032 reg ESI becoming dead 02f0008a 8bf0 mov esi,eax GCInfo: 0034 reg ESI becoming live 02f0008c 689350ad00 push 0xad5093 02f00091 8bd7 mov edx,edi GCInfo: 003B reg EDX becoming live 02f00093 8bce mov ecx,esi GCInfo: 003D reg ECX becoming live 02f00095 ff152452ad00 call dword ptr [00ad5224] (Test/ThreadProc..ctor) GCInfo: 0043 reg EAX becoming dead GCInfo: 0043 reg ECX becoming dead GCInfo: 0043 reg EDX becoming dead
NewThread (tp);
02f0009b 8bde mov ebx,esi 02f0009d 8bcb mov ecx,ebx GCInfo: 0047 reg ECX becoming live 02f0009f ff152c51ad00 call dword ptr [00ad512c] (Test.NewThread) GCInfo: 004D reg ECX becoming dead GCInfo: 004D reg ESI becoming dead
Thread.Sleep (1000);
02f000a5 b9e8030000 mov ecx,0x3e8 02f000aa ff155084bb79 call dword ptr [mscorlib_79990000+0x228450 (79bb8450)] (System.Threading.Thread.Sleep) 02f000b0 90 nop 02f000b1 5b pop ebx 02f000b2 5e pop esi 02f000b3 5f pop edi 02f000b4 8be5 mov esp,ebp 02f000b6 5d pop ebp 02f000b7 c3 ret
GCInfo: 0060 reg EBX becoming dead GCInfo: 0060 reg EDI becoming dead
Let's assume a thread triggers GC when another thread is excuting code 02f0009d. From the table, we know at this time, register ESI, EBX, and EDI will be reported as live roots to GC. With some disassembly, we could see at this moment both ESI and EBX contain reference to Delegate tp, and EDI is variable t. It might surprise you that both EBX and EDI keep alive until the function ends. That means variable t and tp are actually alive for the whole function, so the code has no bug?!
The tricky point is that a variable is eligible to be dead once it's not used anymore. However, it's up to JIT to determine whether it really wants to report the variable to be dead. In fact, for debuggable code, JIT extends lifetime for every variable to end of the function.
To prove the premature GC bug really exists in the sample, I compiled it to "Release" build and repeated all the steps above:
0:000> !u 02df0058 Loaded Son of Strike data table version 5 from "C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\mscorwks.dll" Will print '>>> ' at address: 02df0058 Normal JIT generated code [DEFAULT] Void Test.Main() Begin 02df0058, size 46 >>> 02df0058 57 push edi 02df0059 56 push esi 02df005a b9e850ad00 mov ecx,0xad50e8 (MT: Test) 02df005f e8b41fcdfd call 00ac2018 (JitHelp: nc) 02df0064 8bf0 mov esi,eax 02df0066 c7460401000000 mov dword ptr [esi+0x4],0x1 02df006d b9bc51ad00 mov ecx,0xad51bc (MT: Test/ThreadProc) 02df0072 e8a11fcdfd call 00ac2018 (JitHelp: nc) 02df0077 8bf8 mov edi,eax 02df0079 689350ad00 push 0xad5093 02df007e 8bd6 mov edx,esi 02df0080 8bcf mov ecx,edi 02df0082 ff151c52ad00 call dword ptr [00ad521c] (Test/ThreadProc..ctor) 02df0088 8bcf mov ecx,edi 02df008a ff152c51ad00 call dword ptr [00ad512c] (Test.NewThread) 02df0090 b9e8030000 mov ecx,0x3e8 02df0095 ff155084bb79 call dword ptr [mscorlib_79990000+0x228450 (79bb8450)] (System.Threading.Thread.Sleep) 02df009b 5e pop esi 02df009c 5f pop edi 02df009d c3 ret
0:000> !ip2md 02df0058 MethodDesc: 0x00ad50a8 Jitted by normal JIT Method Name : [DEFAULT] Void Test.Main() MethodTable ad50e8 Module: 151ad0 mdToken: 06000003 (D:\projects\DelegateExample\bin\Release\DelegateExample.exe) Flags : 10 Method VA : 02df0058
0:000> !gcinfo 0x00ad50a8 Normal JIT generated code Method info block: method size = 0046 prolog size = 2 epilog size = 3 epilog count = 1 epilog end = yes saved reg. mask = 0003 ebp frame = no fully interruptible=no double align = no security check = no exception handlers = no local alloc = no edit & continue = no varargs = no argument count = 0 stack frame size = 0 untracked count = 0 var ptr tab count = 0 epilog at 0043 46 21 |
Pointer table: A9 | 001F call 0 [ ESI ] 07 | 0026 push CB | 0030 call 1 [ EDI ] FF |
Here is mixed source code, native code and pointer table:
02df0058 57 push edi 02df0059 56 push esi
Test t = new Test (1);
02df005a b9e850ad00 mov ecx,0xad50e8 (MT: Test) 02df005f e8b41fcdfd call 00ac2018 (JitHelp: nc) 02df0064 8bf0 mov esi,eax 02df0066 c7460401000000 mov dword ptr [esi+0x4],0x1
ThreadProc tp = new ThreadProc (t.Reflect);
02df006d b9bc51ad00 mov ecx,0xad51bc (MT: Test/ThreadProc) 02df0072 e8a11fcdfd call 00ac2018 (JitHelp: nc) GCInfo: 001F call 0 [ ESI ] 02df0077 8bf8 mov edi,eax 02df0079 689350ad00 push 0xad5093 GCInfo: 0026 push 02df007e 8bd6 mov edx,esi 02df0080 8bcf mov ecx,edi 02df0082 ff151c52ad00 call dword ptr [00ad521c] (Test/ThreadProc..ctor) GCInfo: 0030 call 1 [ EDI ]
EM>NewThread (tp);
02df0088 8bcf mov ecx,edi 02df008a ff152c51ad00 call dword ptr [00ad512c] (Test.NewThread)
Thread.Sleep (1000);
02df0090 b9e8030000 mov ecx,0x3e8 02df0095 ff155084bb79 call dword ptr [mscorlib_79990000+0x228450 (79bb8450)] (System.Threading.Thread.Sleep) 02df009b 5e pop esi 02df009c 5f pop edi 02df009d c3 ret
In release build, the JIIted code is smaller, and the pointer table in GC info is much smaller. I need to explain some new syntax in this pointer table: “call 0 [ESI]” means the method calls a function with 0 argument, and ESI is a live variable at this point; "push" just indicates change of the stack, which is important information for GC to unwind this frame, but doesn't affect pointer aliveness.
One thing interesting about this new pointer table is that it only reports GC references when the method calls into other methods. What if a GC happens somewhere else? For example, at line 02df0079, EDI contains object tp and ESI contains object t, if a GC happens while a thread is excuting 02df0079 but GCInfo doesn't report those two variables, will they be collected? The answer is GC can't happen at that place. There is an important field in method info block called "fully interruptable". In debug version, this field for the method is true but in release version the field is false. A method is fully interruptable means GC could stop a thread (and perform collection) at any point if the thread is executing this method. If a method is not fully interruptable, GC can't start at arbitrary point if a thread is executing this method. It has to wait until a point when the thread returns from a call (e.g, return from ThreadProc's constructor back to Main) or when the method calls into another method which allows GC to happen (calls into unmanaged code via PInvoke always allow GC to happen). That's why we only need to report varaibles at calls. For performance reason, such trivial methods like Test.Main is usually non-fully interruptable in release build.
With all the knowledge, now we know that if a GC happens when a thread is just returning from the call "new ThreadProc" at instruction 02df0072, GC knows ESI (variable t) is a live root; if a GC happens when a thread is just returning from the call to ThreadProc's constructor at 02df0082, GC knows EDI (variable tp) is a live root. Variable t is not reported this time(but it's kept alive by object tp). However if a GC happens when a thread returns from the call to NewThread at 02df008a, neither t nor tp will be reported so they could be collected by GC. If the new thread hasn't start then, it will have trouble when using the objects.
So next time if you suspect there might be some premature GC bug, you could try "!SOS.GCInfo" to see how exactly CLR thinks about the GC object's lifetime.
This posting is provided "AS IS" with no warranties, and confers no rights.
|
-
I didn't realize I've stopped blogging for 1 year. What a shame! Fortunately I didn’t waste the time: we ship Whidbey Beta1 and Beta2 in the past year! Now with Beta2 out of door, I have more spare time for blogging. :)
Today I want to talk about some interesting facts about Timer in CLR. There is an example for how to use timer in MSDN: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemthreadingtimerclasstopic.asp
This sample starts a timer and does certain things when the timer fires for certain times, like killing the timer. However, this sample has a bug which will cause trouble in stress scenario. To demonstrate the problem, I made a little change to the code:
using System; using System.Threading;
class TimerExample { static void Main() { AutoResetEvent autoEvent = new AutoResetEvent(false); StatusChecker statusChecker = new StatusChecker(100);
// Create the delegate that invokes methods for the timer. TimerCallback timerDelegate = new TimerCallback(statusChecker.CheckStatus);
Console.WriteLine("{0} Creating timer.\n", DateTime.Now.ToString("h:mm:ss.fff")); Timer stateTimer = new Timer(timerDelegate, autoEvent, 0, 10);
// start another thread to post work items to thread pool Thread t = new Thread (new ThreadStart (PostWortItem)); t.Start ();
// When autoEvent signals, dispose of // the timer. autoEvent.WaitOne(); stateTimer.Dispose(); Console.WriteLine("\nDestroying timer."); }
// a Thread proc which keeps posting work items to thread pool static void PostWortItem () { // Post some user work items to thread pool for (int i = 0; i < 1000; i++) { ThreadPool.QueueUserWorkItem (new WaitCallback (WorkItem)); Thread.Sleep (10); } }
// An nop work item for thread pool static void WorkItem (object o) { Thread.Sleep (500); } }
class StatusChecker { int invokeCount, maxCount; public StatusChecker(int count) { invokeCount = 0; maxCount = count; }
// This method is called by the timer delegate. public void CheckStatus(Object stateInfo) { Console.WriteLine("Checking status " + (++invokeCount));
if(invokeCount == maxCount) { //signal Main. AutoResetEvent autoEvent = (AutoResetEvent)stateInfo; autoEvent.Set(); } } }
Basically I added another thread to keep posting work items to threadpool, but the rest part is still expected to behave the same: when the timer fires the 100th time, it should set an event so the main thread would stop the timer.
In one of 5 runs in my machine, I got such output:
5:48:07.625 Creating timer.
Checking status 1 Checking status 2 Checking status 3 Checking status 4 … Checking Status 93 Checking Status 94 Checking Status 95 Checking Status 96 Checking Status 97 Checking Status 98 Checking Status 102 Checking Status 99 Checking Status 103 Checking Status 104 Checking Status 105 … Checking Status 698 Checking Status 700 Checking Status 701 Checking Status 703 Checking Status 703 Checking Status 704 Checking Status 705 …
^C
It seems that invokeCount never hits 100 thus the program doesn't stop and some other sequence in the output looks to be out of order.
How does this happen? First we need to understand how timer is implemented in CLR, who is executing the timer callbacks?
One simple idea would be putting all timers in a queue and having a dedicate thread doing something like this (pseudo code):
while (true) { foreach (Timer t in timer queue) { if (t.TimeToFire ()) { t.InvokeCallback (); } } Sleep(MinumInterval); }
However with this logic one lengthy timer callback would block all other timers. In CLR, we do have a timer queue and a dedicate timer thread. However the only job of timer thread is to maintain the timer queue, when a timer needs to fire, timer thread queue a work item to threadpool, then a thread pool's worker thread will pick up the work item and invoke the timer callback.
In Rotor's source, the timer thread's logic is in vm/Win32threadpool.cpp, the thread proc is ThreadpoolMgr::TimerThreadStart and ThreadpoolMgr::FireTimers does most of interesting work. The pseudo code looks like:
while (true) { foreach (Timer t in timer queue) { if (t.TimeToFire ()) { // put a work item to thread pool // to call timer cal back on t once WorkItem work = CallTimerCallbackOnce (t); ThreadPool.QueueWorkItem (work); } } //MinumInterval is minum of next firing interval // for all timers in the queue Sleep(MinumInterval); }
The timer thread only guarantees to put timer callback requests to a queue in thread pool (ThreadpoolMgr::QueueUserWorkItem) in order of timer firing. But timer callbacks are not called in a serialized way. If a timer fires twice and there are more than one worker thread in thread pool, there's no guarantee that the first callback will be finished before the next callback starts. Therefore, it's not thread safe for timer callbacks to access shared data without locking. That's why the example in MSDN breaks: when CheckStatus is executed in multiple threads, it's possible that "if(invokeCount == maxCount)" will never be satisfied. Changing the code to this would make it more robust:
public void CheckStatus(Object stateInfo) { int count = Interlocked.Increment (ref invokeCount); Console.WriteLine("Checking status " + count);
if(count == maxCount) …
Another interesting thing about timer implementation is that when a client thread creates a new timer, it doesn't insert the timer to timer queue directly. Instead, it queues a user APC to the timer thread (see ThreadpoolMgr::CreateTimerQueueTimer and InsertNewTimer). Similar thing is done for updating (ThreadpoolMgr::ChangeTimerQueueTimer and UpdateTimer) and deleting timer (ThreadpoolMgr:: DeleteTimerQueueTimer and DeregisterTimer). That way, client threads don't need to synchronize to access the shared timer queue. After all, the timer thread is sleeping (alertable) for most of time. PS: to make the race happen more easily, I did more tweaks to the MSDN sample than the threadpool workitems, it should be obvious to you. ;) This posting is provided "AS IS" with no warranties, and confers no rights.
|
-
I've been quiet for 3 months and probably won't have much time for blogs for next several months down the road. Today I got a chance to update my post OutOfMemoryException and Pinning to correct a mistake pointed out by our GC architect Patrick Dussud. Also I want to remind everyone that Michael Stanton posted a great article in April about how to traverse the GC heap using psscor.dll. This technique is quite useful to understand the internal of CLR's GC. Talking about GC heap, I want to tell another GC heap corruption story.
My test buddy Chris Lyon debugged a customer program with me recently and found a very interesting user bug. He already posted the problem in an internal blog, I think it's worthy to share it with the public community too. The problem is in this code:
SomeStruct[] arr = new SomeStruct[SomeSize]; GCHandle handle = GCHandle.Alloc(Marshal.SizeOf(typeof(SomeStruct)) * SomeSize, GCHandleType.Pinned); IntPtr fixedAddr = handle.AddrOfPinnedObject (); ... //pass fixedAddr to unmanaged code to filled the array
This code tries to create a pinned GC handle for an array then pass the array's address to unmanaged code to fill in. The code seems to assume GCHandle.Alloc has similiar semantics as malloc - tell it how much memory we need and it will allocate it for us. But it's not true. GCHandle.Alloc only allocates a (pointer-sized) GC handle for a given object without allocating memory for the object. The first argument for this method is supposed to be the Object which the handle will point to. This program passes an Int32 (size of the array) to the method, but the compiler won't give us any error because the nice handy auto-boxing feature. At runtime what would happen is CLR will box the integer to be a heap object, allocate a pinned handle for this integer object, then address of this object will be passed to unmanaged code. The unmanaged code will copy SomeSize * sizeof(SomeStruct) byte data to the 4 byte integer and overwrites other objects in the heap. We could just relax and wait the program to crash.
This example looks quite trivial, but it demos how easy GC heap could be corrupted by user error with help of unmanaged code. One suggestion: watch out any IntPtr you passed to unmanaged code!
This posting is provided "AS IS" with no warranties, and confers no rights.
|
-
Objects in CLR are usually managed by the runtime in GC heap; user code does not have direct access to the objects. CLR's reliability and type safety heavily rely on this fact. But CLR also support InterOp features like COM InterOp, IJW and PInvoke to allow unmanaged code to touch managed objects directly. It's very dangerous to use those InterOp techniques without caution. The potential problems include:
- Security. Unmanaged code could bypass the runtime to access any controlled resources and change any state of a managed program. That's why calling unmanaged code requires very high security permission.
- Reliability. Unmanaged code could access managed objects (thus GC heap) directly. So it could corrupt GC heap easily, which is fatal error for CLR. Ideally, in a pure managed world, all bugs in user code should result in an exception or error code; but with mistakes in an InterOp operation, you could find the program just crashes with no clues. Because the corruption happens deeply in CLR, the problem would be very hard to debug for users.
InterOp is quite a complicated topic, so it's better for programmers to have deep understanding before writing such code. I would highly recommend Adam Nathan's book “.NET and COM – The Complete Interoperability Guide”, which is the bible of CLR InterOp.
Among the three basic InterOp techs, PInvoke seems to be much simpler than the other two. Most time it requires only declaring and calling, like using any managed functions. But I want to remind people some catch-chas of PInvokes. Because:
- PInvoke calls into pure unmanaged DLL, which has almost no self-description data for CLR. So there's not much validation CLR could perform at compile and run time. (Performance is another reason for little validation at run time)
- It's so easy to use, sometimes people forget the danger built-in its unmanaged nature. :)
The first topic I want to cover is string's immutability in PInvoke. In one of his wonder blogs, Chris Brumme talked about why immutability is important for strings, especially interned strings. He gave an example about how a string's contents could be changed in place with a PInvoke call:
using System;
using System.Runtime.InteropServices;
public class Class1
{
static void Main(string[] args)
{
String computerName = "strings are always immutable";
String otherString = "strings are always immutable";
int len = computerName.Length;
GetComputerName(computerName, ref len);
Console.WriteLine(otherString);
}
[DllImport("kernel32", CharSet=CharSet.Unicode)]
static extern bool GetComputerName(
[MarshalAs (UnmanagedType.LPWStr)] string name,
ref int len);
}
This example demos that when you change contents of one interned string, another string might be changed unintentionally. I want to add one more point: string's hash code is calculated using its contents, so changing a string's contents would change its hash code, which violates the invariant that an object's hash code should not change over its lifetime. It's very hard to know if the string you are modifying is used as a hash key anywhere in CLR or user's code. As a matter of fact, in V1.0 and V1.1 CLR stores all interned strings in a hash table across AppDomains. So if an interned string's contents are changed, it could mess up the underlying hash table, and cause CLR to crash mysteriously during later AppDomain loading/unloading.
As mentioned in Chris's blog and Adam's book, the solution is to declare a PInvoke to use StringBuilder for any string parameters it might modify. A common pitfall here is that people might believe if an unmanaged function requires a LPTSTR as a parameter, the interface should be declared to use StringBuilder in managed code; if it's a LPCTSTR (or any other constant character pointer), it's safe to be declared as a String. This is not 100% correct because unmanaged code could always cast a constant pointer to be a mutable one. For example, Windows GDI's DrawText API is declared as:
int DrawText(
HDC hDC, // handle to DC
LPCTSTR lpString, // text to draw
int nCount, // text length
LPRECT lpRect, // formatting dimensions
UINT uFormat // text-drawing options
);
The lpString here looks to be constant, but description of MSDN says:
lpString
[in] Pointer to the string that specifies the text to be drawn. If the nCount parameter is –1, the string must be null-terminated.
If uFormat includes DT_MODIFYSTRING, the function could add up to four additional characters to this string. The buffer containing the string should be large enough to accommodate these extra characters.
In Appendices E of his book, Adam suggests managed declaration of this function could be:
[DllImport (“User32”, CharSet=Charset.Auto)]
static extern in DrawText (IntPtr hDC, string lpString, int nCount, ref RECT lpRect, uint uFormat);
This is right for most of time, but such a usage could cause problem:
string text = "Hello world!";
DrawText (hDC, text, text.Length, ref rect, DT_END_ELLIPSIS | DT_MODIFYSTRING);
If the function tries to append some additional characters to the string, it will write out of range of this string object and overwrite other objects in the managed heap, which would corrupt the heap and might crash CLR eventually.
The point I want to make is that PInvoke might look simple, but it's still dangerous (because all unmanaged code is dangerous for CLR :)). Now CLR can't check correctness of a PInovke signature automatically. Instead, we have some run time debugging tools like CDPs for some particular checking around PInvoke. Adam's blogs has very good coverage on them. In the long run we may be able to do some analysis using static tools but there's no specific plan at this time. So to avoid any PInvoke problem, users need to understand it very well and read the documents of the unmanaged function carefully. If you are experiencing any unexplainable crash in a .NET program, it might be worthy to check all the PInvokes in the code.
This posting is provided "AS IS" with no warranties, and confers no rights.
|
-
An exsample of FCall
My friend Joel Pobar had a great post to demo how to add new code to Rotor which exposes more EE(Execution Engine) internal information to managed world. This is a very good example covers both BCL and EE, and how the two parts interact with each other. As showed in this example, BCL code could call into EE by a special type of method called “FCall“, like this:
FCIMPL1(MethodBody *, COMMember::GetMethodBody, MethodDesc **ppMethod) { MethodDesc* pMethod = *ppMethod; METHODBODYREF MethodBodyObj = NULL; HELPER_METHOD_FRAME_BEGIN_RET_0(); GCPROTECT_BEGIN(MethodBodyObj);
TypeHandle thMethodBody(g_Mscorlib.FetchClass(CLASS__METHOD_BODY)); MethodBodyObj = (METHODBODYREF)AllocateObject(thMethodBody.GetMethodTable()); Module* pModule = pMethod->GetModule();
COR_ILMETHOD_DECODER MethodILHeader(pMethod->GetILHeader(), pModule->GetMDImport(), TRUE); MethodBodyObj->maxStackSize = MethodILHeader.GetMaxStack(); GCPROTECT_END(); HELPER_METHOD_POLL(); HELPER_METHOD_FRAME_END();
return (MethodBody*)OBJECTREFToObject(MethodBodyObj); } FCIMPLEND
As I said before I'd try to explain some Rotor code in my blog, so let me start by analyzing a small problem in that piece of code. I won't call it a bug because it happens to be harmless here. But it does violate some CLR coding rules.
Some basic bricks of an FCall
First let's take a glance of those amazing macros:
- FCIMPL1/FCIMPLEND: defined in vm/fcall.h. They just tweak calling convention between managed code (BCL) and unmanaged code (EE). You can see different flavors of FCIMPL, which serve calls with different argument numbers or types (to match the argument passing and enregistering rules in managed world).
- HELPER_METHOD_FRAME_BEGIN_RET_0/HELPER_METHOD_FRAME_END: defined in vm/fcall.h. For operations like GC (Garbage Collection) and EH (Exception Handling) to work correctly, some frames have to be set up in the stack, especially at the boundary between managed part and unmanaged part. Frames are a topic could take a blog entry itself, so I won't cover it too much here. What you need to know now is that for performance reason, an FCall doesn't set up a frame by default. So when an FCall wants to throw an exception or allow a GC to happen, it has to set up a HelperMethodFrame (vm/frames.h) first. This job is done by HELPER_METHOD_FRAME_BEGIN* macro.HELPER_METHOD_FRAME_END is to tear down the frame from stack. All GC or exception throwing has to happen in the range guarded by this frame. In the code above, some operations could trigger a GC (at least AllocateObject could do so, not sure about others. It would be a hard job to trace into each code path to find out whether a GC could happen), so a HelperMethodFrame has to be established before that call.
- GCPROTECT_BEGIN/GCPROTECT_END: defined in vm/frames.h. Similar to HELPER_METHOD_FRAME_BEGIN*, GCPROTECT_BEGIN is used to set up a GCFrame (vm/frames.h) and GCPROTECT_END pop the frame out. When a GC happens, it needs to find out all object references in stack to trace which objects in the managed heap are still alive, and when it moves the object (to compact the heap) it needs to update the references in stack with the new location of the objects. For managed code, JIT generates all information needed by GC. But for unmanaged part of CLR, the code whoever has references to managed objects is responsible to report all references itself. A GCFrame serves for this purpose. If an unmanaged method pushes a GCFrame to stack, the frame will report the protected reference (the argument to GCPROTECT_BEGIN) during GC. In our example, MethodBodyObj is an object reference so we set up a GCFrame for it.
- HELPER_METHOD_POLL: defined in vm/fcall.h. This macro is meant to do a GC poll in range of a HelperMethodFrame. GC poll is another complicated thing I don't want to talk here. Basically it allows GC to happen in another thread, without a poll another thread that wants to perform a GC might be blocked, thus all managed threads in the application will be blocked.
- OBJECTREFToObject: defined in vm/vars.hpp. It's used to take a pure object pointer out from an ObjectRef. An ObjectRef is a naked pointer in free build, but a wrapper with some very useful checking in debug build.
The problem
1. GC hole. In COMMember::GetMethodBody, a HELPER_METHOD_POLL is put after GCPROTECT_END. As I explained above, GCPROTECT_END will pop up the GC frame which is protecting the object reference, but HELPER_METHOD_POLL allows a GC to happen in another thread. So there are chances (although very small) that after the GC frame is popped up, another thread performs a GC. In such a GC, MethodBodyObj won't be reported. So GC might not know the object referenced by MethodBodyObj is still alive (if there's no other reference to the object) and collect it; or (if there are other references) GC might move the object but not update MethodBodyObj with the new address thus MethodBodyObj would hold a “stale” object pointer. Either case, COMMember::GetMethodBody will return a bogus object and the program might crash later in an unexpected way. We call this kind of errors “GC holes” in CLR. They are hard to detect because GC is non-deterministic. The funny thing is that nothing bad would happen in this method because HELPER_METHOD_POLL is actually defined as a no-op in this version:
// This is the fastest way to do a GC poll if you have already erected a HelperMethodFrame // #define HELPER_METHOD_POLL() { __helperframe.Poll(); INDEBUG(__fCallCheck.SetDidPoll()); }
#define HELPER_METHOD_POLL() { }
I don't know why we use an empty macro for HELPER_METHOD_POLL but I'm sure it's supposed to be the version which is commented out. In later versions we may uncomment the above line to make HELPER_METHOD_POLL take effect. So although this FCall doesn't cause any trouble for now, it might later. The corrected version should be:
HELPER_METHOD_FRAME_BEGIN_RET_0(); GCPROTECT_BEGIN(MethodBodyObj);
...
HELPER_METHOD_POLL(); GCPROTECT_END(); HELPER_METHOD_FRAME_END();
2. If you dig deep into the code of HELPER_METHOD_FRAME_BEGIN*, you will find that those macros do a GC poll themselves. So unless the FCall does some very time consuming work, there's no need for another poll. Thus a refined version of our sample would be:
HELPER_METHOD_FRAME_BEGIN_RET_0(); GCPROTECT_BEGIN(MethodBodyObj);
...
GCPROTECT_END(); HELPER_METHOD_FRAME_END();
3. Because setting up HelperMethodFrame usually means the code wants to allow GC, for convenience we have versions of HELPER_METHOD_FRAME_BEGIN* to protect object references. Then a GCFrame is not needed. So the FCall could be written this way:
HELPER_METHOD_FRAME_BEGIN_RET_1(MethodBodyObj);
...
HELPER_METHOD_FRAME_END();
I hope you already got a taste how FCall works in CLR by reading this blog. Actually most thing I talked here can be found in comments at the beginning of vm/fcall.h. And if you want to see more examples of FCall, just search FCIMPL in vm directory, you will get plenty of them. Then you will see how CLR build a beautiful object-oriented world by the old fashion and kinda dirty way.
This posting is provided "AS IS" with no warranties, and confers no rights.
|
-
I've seen people calls OS's ExitThread in managed applications via PInvoke to exit a managed thread, like this:
[DllImport( "Kernel32.dll")] public static extern void ExitThread(int exitCode);
public static void Run () { ... // calling OS's ExitThread to exit the current thread ExitThread (0); }
public static void Main () { ThreadStart threadStart = new ThreadStart(Run); Thread thread = new Thread(threadStart); thread.Start(); ... }
I guess when unmanaged code is ported to managed code, people tends to translate every system call to a PInvoke. But because CLR provides another layer over the OS, some low level system calls don't make sense in the managed world. ExitThread is one of them, because managed threads are not equivlent to OS's native threads. The above code is wrong for 2 reasons:
- CLR has some clean up work (like stack unwinding) to do when a managed thread exits. CLR knows a managed thread is exiting when the thread procedure (like threadStart) returns or a ThreadAbortException is thrown. Calling OS's ExitThread (or even worse, TerminateThread) would bypass all back out code on the stack (such as destructor and finally block) and leave the program in an unspecified state.
- There is no guarantee about how CLR maps a managed/logical thread to an OS/physical thread. For example, several managed threads could be mapped to one OS thread, calling ExitThread in one managed thread might kill other managed threads unintentionally.
IMHO, calling system's ExitThread in managed program is almost always a mistake. Instead, we could just call Thread.Abort.
This posting is provided "AS IS" with no warranties, and confers no rights.
|
-
As you all know, in CLR memory management is done by Garbage collector (GC). When GC can't find memory in preallocated memory chunk (GC heap) for new objects and can't book enough memory from the OS to expand GC heap, it throws OutOfMemoryException (OOM).
The problem
From time to time, I've heard complaints about OOM - people analyze code and monitor memory usage, find out that sometimes their .NET applications throw OOM when there's enough free memory. In most cases I've seen, the problems are:
-
The virtual address space of the OS is fragmented. This is usually caused by some unmanaged components in the application. This issue exists in unmanaged world for long time, but it could hit GC hard. GC heap is managed in unit of segments, whose size is 16MB for workstation version and 32MB for server version in V1.0 and V1.1. That means when CLR needs to expand GC heap, it has to find 32MB consecutive free virtual memory for a server application. Usually this is not a problem in a system with 2GB address space for user mode. But if there are some unmanaged DLLs in the application manipulating virtual memory without carefulness, the virtual address space could be divided into small blocks of free and reserved memory. Thus GC would fail to find a big enough piece of free memory although the total free memory is enough. This kind of problems could be found out by looking through the whole virtual address space to see which block is reserved by which component.
-
The GC heap itself is fragmented, meaning GC can't allocate objects in already reserved segments which actually have enough free space inside. I want to focus on this problem in this blog.
A glance of GC heap
Usually managed heap shouldn't suffer from fragmentation problem because the heap is compacted during GC. Blow shows an oversimplified model of CLR's GC heap:
|---------|
|free |
|_________|
|Object B |
| |
|_________|
|Object A |
|_________|
| ... |
|---------|
|free |
|_________|
|Object C |
|_________|
|Object B |
| |
|_________|
|Object A |
|_________|
| ... |
|---------|
|Object C | (marked)
|_________|
|Object B |
| |
|_________|
|Object A | (marked)
|_________|
| ... |
-
After GC, heap is compacted, live (reachable) objects are relocated, dead (unreachable) objects are swept out.
|---------|
|free |
|_________|
|Object C |
|_________|
|Object A |
|_________|
| ... |
Free space in GC heap
In above model, you can see that GC actually does a good job to defragment the heap. Free space is always at top of the heap and available for new allocation. But in real production, free space could reside among allocated objects. That is because:
-
Sometimes GC could choose not to compact part of the heap when it's not necessary. Since relocating all objects could be expensive, GC might avoid doing so under some conditions. In that case, GC will keep a list of free space in heap for future compaction. This won't cause heap fragmentation because GC has full control over the free space. GC could fill up those blocks anytime later when necessary.
-
Pinned objects are not movable. So if a pinned object survives a GC, it could create a block of free space, like this:
before GC: after GC:
|---------| |---------|
|Object C | (pinned, reachable) |Object C | (pinned)
|_________| |_________|
|Object B | (unreachable) | free |
| | | |
|_________| |_________|
|Object A | (reachable) |Object A |
|_________| |_________|
| ... | | ... |
How pinning could fragment GC heap
if an application keeps pinning objects in this pattern: pin a new object, do some allocation, pin another object, do some allocation ... and all pinned objects remain pinned for long time, a lot of free space will be created, showed below:
|---------|
|free |
|_________|
|Pinned 1 |
|_________|
|Object A |
|_________|
| ... |
|---------|
|free |
|_________|
|Pinned 2 |
|_________|
| ... |
|_________|
|Pinned 1 |
|_________|
|Object A |
|_________|
| ... |
|_________|
|Pinned n |
|_________|
| ... |
|_________|
|Pinned 2 |
|_________|
| ... |
|_________|
|Pinned 1 |
|_________|
|Object A |
|_________|
| ... |
|_________|
|Pinned n |
|_________|
| free |
|_________|
|Pinned 2 |
|_________|
| free |
|_________|
|Pinned 1 |
|_________|
| free |
|_________|
| ... |
Such a process could create a GC heap with a lot of free slots. Those free slots are being partially reused for allocation but when they are too small or when their remainder is too small, GC can’t use them as long as the objects are pinned. This would prevent GC from using the heap efficiently and might cause OOM eventually.
One thing makes the situation worse is that although a developer may not use pinned objects directly, some .Net libraries use them under the hood, like asynchronized IO. For example, in V1.0 and V1.1 the buffer passed to Socket.BeginReceive is pinned by the library so that unmanaged code could access the buffer. Consider a socket server application which handles thousands of socket requests per second and each request could take several minutes because of slow connection, GC heap could be fragmented a lot because of large amount of pinned objects and long lifetime some objects are pinned; then OOM could happen.
How to diagnose the problem
To determine if GC heap is fragmented, SOS is the best tool. Sos.dll is a debugger extension shipped with .NET framework which could check some underlying data structure in CLR. For example, “DumpHeap” could traverse GC heap and dump every object in the heap like this:
0:000>!dumpheap
Address MT Size
00a71000 0015cde8 12 Free
00a7100c 0015cde8 12 Free
00a71018 0015cde8 12 Free
00a71024 5ba58328 68
00a71068 5ba58380 68
00a710ac 5ba58430 68
00a710f0 5ba5dba4 68
...
00a91000 5ba88bd8 2064
00a91810 0019fe48 2032 Free
00a92000 5ba88bd8 4096
00a93000 0019fe48 8192 Free
00a95000 5ba88bd8 4096
...
total 1892 objects
Statistics:
MT Count TotalSize Class Name
5ba7607c 1 12 System.Security.Permissions.HostProtectionResource
5ba75d54 1 12 System.Security.Permissions.SecurityPermissionFlag
5ba61f18 1 12 System.Collections.CaseInsensitiveComparer
...
0015cde8 6 10260 Free
5ba57bf8 318 18136 System.String
...
In this example, “DumpHeap” shows that there are 3 small free slots (They appear as special “Free” objects) at the beginning of the heap, followed by some objects with size 68 bytes. More interestingly, the statistics shows that there are 10,260 bytes Free objects (free space among live objects), and 18,136 bytes of string totally in the heap. If you find the Free objects take a very big percentage of the heap, the heap is fragmented (in whidbey, "DumpHeap" would do more analysis about heap fragmentation). In this case, you want to check the objects nearby the free space to see what they are and who holds their roots, you could do it using “DumpObj” and “GCRoot”:
0:000>!dumpobj 00a92000
Name: System.Byte[]
MethodTable 0x00992c3c
EEClass 0x00992bc4
Size 4096(0x1000) bytes
Array: Rank 1, Type System.Byte
Element Type: System.Byte
0:000>!gcroot 00a92000
Scan Thread 0 (728)
Scan Thread 1 (730)
ESP:88cf548:Root:05066b48(System.IO.MemoryStream)->00a92000 (System.Byte[])
ESP:88cf568:Root:05066b48(System.IO.MemoryStream)->00a92000 (System.Byte[])
...
Scan HandleTable 9b130
Scan HandleTable 9ff18
HANDLE(Pinned):d41250:Root: 00a92000 (System.Byte[])
This shows that the object at address 00a92000 is a byte array, it's rooted by local variables in thread 1(to be precise, !GCRoot's output of roots in stack can't be trusted) and a pinned handle.
And the command "ObjSize" list all handles including pinned ones:
0:000>!objsize
...
HANDLE(Pinned):d41250: sizeof(00a92000) = 4096 ( 0x1000) bytes (System.Byte[])
HANDLE(Pinned):d41254: sizeof(00a95000) = 4096 ( 0x1000) bytes (System.Byte[])
HANDLE(Pinned):d41258: sizeof(00ac8b5b0) = 16 ( 0x10) bytes (System.Byte[])
...
Using those Sos commands, you could get a clear picture if the heap is fragmented and how. I believe Michael will have more blogs about details of Sos.
Solution
In Everett a lot of work is done in GC to recognize fragmentation caused by pinning and alleviate the situation, more work is already done in Whidbey. So hopefully, the problem won't show up in Whidbey. But besides change in the platform, user code could do something to avoid the issue too. From above analysis, we could tell:
-
If the pinned objects are allocated around same time, the free slots between each two objects would be smaller, and the situation is better.
-
If pinning happens on older objects, it could cause fewer problems. Because older objects live at bottom of heap but most of free space is generated on top of heap.
-
The shorter the objects are pinned, the easier GC could compact the heap
So if pinning becomes an issue which causes OOM for a .NET application, instead of creating new object to pin every time, developers could consider preallocating the to-be-pinned objects and reusing them. That way those objects would live close to each other in older part of GC heap and the heap won’t be fragmented that much. For example, if an application keeps pinning 1K buffers (consider the socket server case), we could use such a buffer pool to get the buffers:
public class BufferPool
{
private const int INITIAL_POOL_SIZE = 512; // initial size of the pool
private const int BUFFER_SIZE = 1024; // size of the buffers
// pool of buffers
private Queue m_FreeBuffers;
// singleton instance
private static BufferPool m_Instance = new BufferPool ();
// Singleton attribute
public static BufferPool Instance
{
get {
return m_Instance;
}
}
protected BufferPool()
{
m_FreeBuffers = new Queue (INITIAL_POOL_SIZE);
for (int i = 0; i < INITIAL_POOL_SIZE; i++)
{
m_FreeBuffers.Enqueue (new byte[BUFFER_SIZE]);
}
}
// check out a buffer
public byte[] Checkout (uint size)
{
if (m_FreeBuffers.Count > 0)
{
lock (m_FreeBuffers)
{
if (m_FreeBuffers.Count > 0)
return (byte[])m_FreeBuffers.Dequeue ();
}
}
// instead of creating new buffer,
// blocking waiting or refusing request may be better
return new byte [BUFFER_SIZE];
}
// check in a buffer
public void Checkin (byte[] buffer)
{
lock (m_FreeBuffers)
{
m_FreeBuffers.Enqueue (buffer);
}
}
}
This posting is provided "AS IS" with no warranties, and confers no rights. Use of included samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm"
|
|
|
|