In a previous post, I outlined to all you profiler writers how to modify your profiler so it can attach to running processes, and what sorts of limitations your profiler will have when it attaches. In this post, I answer the question, “My profiler is attached. What should it do next?”
A profiler that loads on startup of an application has the option to know the entire history of that application. By requesting the appropriate callback events, the profiler can know all the classes and modules that have loaded, functions that have JITted, objects that have been allocated, etc. However, a profiler that loads by attaching to an already-running application is a bit like Dorothy who lands in the middle of Oz and has no idea what’s going on. She doesn’t have the luxury of arriving at the beginning of time, and watching everyone from the moment of their birth. She runs into people after they’re fully grown, and is expected to deal gracefully—often by making friends with them. It would not be socially acceptable for Dorothy to encounter an access violation upon meeting someone new.
Drawing by Magdalena Hermawan
There are two fundamental ways your profiler can catch up on the current state of an application:
Lazy catch-up is fairly self-explanatory. For example, if your sampling profiler encounters an IP in a FunctionID you’ve never seen before, just look up whatever info you need about that FunctionID the first time you encounter it, rather than assuming you’d already built up a cache when the function was first JITted. And if you discover that FunctionID resides in a module you’ve never seen before, then just look up whatever info you need about that ModuleID at that point, rather than assuming you already have a complete cache of all modules. Many of you are already doing something like this today if you support sampling against regular NGENd images (since you don’t get JIT notifications of those functions anyway).
Enumeration, on the other hand, has some caveats and is worthwhile to describe in more detail.
Some kinds of IDs have new enumerator methods as part of the profiling API. In particular:
Your profiler calls these methods, and they return a standard enumerator you use to iterate through all of the currently-loaded IDs of that type. It’s worth noting that EnumJITedFunctions only enumerates FunctionIDs for which you would receive JITCompilationStarted/Finished events, and will not include FunctionIDs from NGENd modules.
The primary caveat with using these enumerators is that you’re iterating through a snapshot of the IDs while the process is active and running. (Imagine Dorothy looking through a copy of birth records while babies are still getting born in Oz, who weren’t yet in the copy of records Dorothy is reading.) This means there are races your profiler needs to be resilient to.
As you may recall, once your profiler is attached to the process, the CLR calls InitializeForAttach() on your profiler. After your profiler returns from InitializeForAttach(), the CLR turns on callbacks into your profiler. So if your profiler requested COR_PRF_MONITOR_MODULE_LOADS (by calling SetEventMask() at some point inside your implementation of InitializeForAttach), then as modules start loading and unloading after InitializeForAttach() returns, your profiler will receive the corresponding events. The thing is, “after InitializeForAttach() returns” is a vague phrase. And modules can load or unload at totally arbitrary times with respect to the timing of when your profiler attaches and calls EnumModules(). The thing to avoid here is a hole: a ModuleID your profiler does not find in the enumeration, and for which your profiler receives no ModuleLoad event. This can happen if your profiler calls the enumeration API too soon (i.e., before CLR has enabled event callbacks for your profiler).
Bad timeline (loading; enumerating too soon):
The problem is that the profiler calls EnumModules too early. If your profiler only calls EnumModules after CLR enables events, then you’re assured of either seeing a ModuleID via EnumModules or via a ModuleLoad event. In the above scenario, your profiler might as well have never done enumeration at all, since it will still not be notified of the ModuleID before it comes across that ModuleID in action later on. It gets even worse for modules that unload:
Bad timeline (unloading; enumerating too soon):
In the above case, the profiler discovers a ModuleID via EnumModules, but has no idea that the module is now in the process of unloading. So the profiler might query information about the stale ModuleID, potentially causing an AV. Again, this is caused because the profiler called the enumeration API too soon (i.e., before the CLR enabled event callbacks).
The solution is for the profiler to call enumeration APIs only after events have been enabled. Since events are enabled at some point “after InitializeForAttach() returns”, it was necessary for the CLR to provide a new API to notify the profiler that event callbacks have actually been enabled: ICorProfilerCallback3::ProfilerAttachComplete(). The best place for your profiler to call the enumeration APIs is inside its implementation of ProfilerAttachComplete. Since events are enabled just before the CLR calls ProfilerAttachComplete, your profiler is assured that events are enabled by the time it calls the enumeration API (from inside ProfilerAttachComplete). This eliminates any potential holes in catch-up information your profiler queries.
When your profiler calls the Enum* methods, the CLR creates a snapshot of all “enumerable” IDs of the specified type, and gives your profiler an enumerator over those. In the CLR we had a choice. We could either consider an ID to be “enumerable” before or after the corresponding load finished event (or JITCompilationFinished event) would normally be issued. Consider for a moment what we didn’t do. We didn’t consider IDs to be enumerable after the event. If so, that would have led to holes. A profiler could have attached and grabbed an enumeration in the middle and never been notified about the ID.
Bad timeline (loading):
Because 2 comes before 6, it’s possible for a profiler to attach and grab an enumeration in the middle, and thus never hear about a ModuleID (even though the profiler avoided Race #1 from the previous section). Again, an even worse problem occurs for module unloading. Suppose the CLR were to change an ID’s enumerable status to false after sending the unload event. That would also lead to holes:
Bad timeline (unloading):
Because 3 comes before 7, a profiler could attach in the middle, grab an enumeration, discover the ModuleID via the enumeration, and have no idea that module was in the process of unloading. If the profiler were to use that ModuleID later on, an AV could result. The above led to the following golden rule:
In other words, an ID becomes enumerable before the LoadFinished (or JITCompilationFinished) event. And an ID ceases to be enumerable before the UnloadStarted event. Or you can think of it as, “The event is always last”. This eliminates any potential holes. So to be even more explicit, here’s the enumerability vs. event ordering:
If an ID is present, the profiler will discover the ID via the enumerator or a LoadFinished event (or both). If an ID is not present, the profiler will either not see the ID via the enumerator or will see an UnloadStarted event (or both). In all cases, the event is more recent, and so the profiler should always trust an event over an enumeration that was generated prior. (More on that last point later.)
The astute reader will notice that what we’ve done here is trade one race for another. We’ve eliminated holes, but the cost is that the profiler must deal with duplicates. For example:
Good timeline (loading with duplicate):
At first it might seem a little strange. The enumerator contains the ModuleID, so the profiler sees that the module is loaded. But then the profiler receives a ModuleLoadFinished event, which might seem odd, since the enumerator implied the module was already loaded. This is what I mean by “duplicate”—the profiler is notified of a ModuleID twice (once via the enumeration, and once via the event). The profiler will need to be resilient to this. Although it’s a bit awkward, it’s better than the alternative of a hole, since the profiler would have no way to know the hole occurred. Unloading has a similar situation:
Good timeline (unloading with duplicate):
In step 6, the profiler does not see the unloading ModuleID (since it’s no longer enumerable). But in step 7 the profiler is notified that the ModuleID is unloading. Perhaps it’s a bit awkward that the profiler would be told that a seemingly nonexistent ModuleID is unloading. But again, this is better than the alternative, where a profiler finds an unloading ID in the enumeration, and is never told that the ModuleID got unloaded. One more case that’s worthwhile to bring out occurs when we move the profiler attach a bit earlier in the sequence.
Good timeline (unloading without duplicate):
Here the profiler discovers the ModuleID exists in step 5 (as the ModuleID is still enumerable at that point), but the profiler almost immediately after discovers that the module is unloading in step 7. As stated above, events are more recent, and should always take precedence over enumerations that were generated prior. This could get a bit tricky, though, as the profiler generates an enumeration before it iterates over the enumeration. In the above sequence, the enumeration is generated in step 5. However, the profiler could be iterating though the generated enumeration for quite some time, and might not come across the unloading ModuleID until after step 7 (multiple threads means fun for everyone!). For this reason, it’s important for the profiler to give precedence to events that occur after the enumeration was generated, even though iteration over that enumeration might occur later.
If you’re writing a memory profiler, you likely have code that responds to the various GC events. In order for your memory profiler to attach to a running process, it needs to deal gracefully with the fact that it does not yet have a cache of objects on the heap. One straightforward way for the profiler to deal with this is to force a GC at attach time.
Remember that your profiler attaches at a completely arbitrary point during process execution, possibly while a GC is already in progress. This means that, once the profiler has enabled callback events, the profiler may start seeing GC callbacks (e.g., MovedReferences, ObjectReferences) from the middle of that GC, without seeing a GarbageCollectionStarted() first. Your profiler should be resilient to this situation, preferably by ignoring GC callbacks until the first full GC and / or profiler-induced GC begins. The profiler can do this by ignoring all GC callbacks until it sees the first GarbageCollectionStarted() callback OR by ignoring all GC callbacks until it sees the first GarbageCollectionStarted() callback after calling ForceGC().
It may be beneficial to program your profiler such that, upon attaching to the process, the profiler induces a first full GC automatically—call this the “catch-up” GC. This will allow your profiler to use events like RootReferences2 and ObjectReferences during that initial “catch-up” GC, in order to build up its cache of objects from scratch. After that initial catch-up GC, your profiler should then be able to deal with successive GCs the usual way.
It’s worth reiterating a limitation I stated in the first attach post (linked above): the ObjectAllocated() callback is unavailable to profilers that attach to running processes. Therefore, any logic your profiler has that assumes it gets all the ObjectAllocated() callbacks will need to be addressed. Any objects newly allocated since the last GC may still be unknown to your profiler until it comes across their references via GC callbacks during the next GC (unless your profiler comes across those objects in other ways—example: as parameters to methods you hook with the Enter/Leave/Tailcall probes).
OK, that about covers the first steps your profiler should take once it attaches to a running process. It will either need to use lazy catch-up or the catch-up enumerations (or, quite likely, a combination of both). When using the enumerations, be careful to avoid holes (by calling the enumeration methods from inside ProfilerAttachComplete()), and be resilient to receiving information duplicated across the enumeration and the load / unload events. For memory profilers, be wary of GCs already in progress at the time your profiler attaches, and consider inducing your own GC at attach-time to build your initial cache of GC objects.