When introducing a new feature like IntelliTrace you’re bound to get a lot of incorrect information floating around about exactly what the feature is and how it works. In particular with IntelliTrace I see lots of confusion about exactly what data is collected by IntelliTrace while running. While I’ve mentioned what data IntelliTrace collects on and off in various blog posts I figured that it might be smart to get all the info codified into one blog post.
When customers first hear about IntelliTrace what is usually conjured to mind is the ability to step backward though their code, checking to see what happened previously with all the full features and information of normal live debugging. It would be wonderful if we could fully deliver on that vision with IntelliTrace but the restrictions of both keeping program execution time overhead and log file size down while still providing useful information prevent that. If your vision of IntelliTrace was collecting the whole world of data and being able to step back though it then the section below might be a bit of a disappointment. But we feel that the choices made really have given the best balance between speed and size of use and collection of valuable data for just about all users.
So now that we’ve set the expectation that IntelliTrace is not going to be collecting all the data that you have access to in the live debugger then what exactly is it collecting? Well the answer to that depends on if we are collecting data at an IntelliTrace event, at a debug stopping event or if we are collecting data at a method entry or exit in calls mode. While the details of what was collected are different in each case we do collect some data in common regardless of the mode. In particular we always collect system information when first starting collection, module load and unload and thread starting and ending events. With the module and the thread events we are able to keep the modules and threads debugger windows correctly updated when you are moving back into your program’s execution.
Another place that we always collect data at regardless of what mode we are running in is at debugger stopping points such as break points. At these point we will collect all basic data types (and all basic data types one level off of objects) that are examined or evaluated in the debugger. This is very handy when you examine a value, take a step forward and see that the value is changed but didn’t make a note of the previous value. Since you examined it (causing IntelliTrace to collect the data on that stopping point) just take a step back in time to see the variable at its previous value. In the example below I’ve taken a few steps forward in the debugger, notice the step events in the flat list on the right, then I’ve jumped back in time to one of the previous debugger steps. In the locals window you can see the variables that were collected at that point. Those will also show up correctly when hovering over those items for datatips or pinnable datatips.
When you hit an IntelliTrace event during your program’s execution you will collect data that has been specifically configured to be captured there. Inside the collectionplan.xml file IntelliTrace events can specify either the collection of basic local variables via DataQuery elements or provide classes that inherit IProgrammableDataQuery to perform more complex data retrieval. The upshot of this is that at IntelliTrace events we only collect a small amount of data that is custom tuned to be relevant to the specific event being examined. If you move back in time to an IntelliTrace event you will most likely just see the [IntelliTrace data has not been collected] message when mousing over any local variables.
This highly guided data collection is intended to keep the overhead when running in just events mode as low as possible. By default IntelliTrace is always running in this mode for managed application so even minor degradations in performance can have a really big effect. It’s important to know that while unsupported officially you can create your own custom IntelliTrace events for richer debugging on your own applications. And for these events you can use the same DataQuery / IProgrammableDataQuery system to collect just the data known to be most important to you at all these points.
Below I’ve shown an example of IntelliTrace being set back to an event in which an Environment Variable was accessed. In this case the event has been configured to collect data on the name of the variable being access which appears both in the event and in the event item in the autos window.
When you have a little performance overhead to spare and want to collect much deeper IntelliTrace data you can jump into the options pages and turn on calls mode. In this mode in addition to data collection at IntelliTrace events data is also collected at function entry and exit points. At function entry points we will collect all basic types and basic types one level off of objects for all the parameters that are passed into the function. Also we’ll apply the same principle for the return values of the function. By capturing parameters and return values you can often treat the actual function as a black box and at the lowest cost in terms of collection you can tell what function is spitting out bad data that’s leading to a crash or other error.
Pictured below is IntelliTrace stopped and moved back from a breakpoint to a function enter point. In the autos window you can see that all the primitive data type off of the GizmoManager object that was passed in as a parameter have been collected and are available to view.
First off, I think IntelliTrace is a great first step.
What I'd really like to see is what you described in the first paragraph of this article. Full-fidelity historical debugging.
Please let me decide how much of a performance hit I am willing to take in the name of debugging. When I have an issue I need ALL of the information available. Let me turn on/off local variables, etc. Let me decide how big an iTrace file is too big. if you are concerned about folks using it without full understanding, make it harder to find and put a nice warning on it.
Give me the information that I need to make an informed decision, but leave the decision up to me.
The issue more is that a solution like IntelliTrace is just really different at a structural level then a solution that would provide full fidelity debugging. It's not just desiring that customers have full understanding but just plain old limitations with the current way that we log data. There is no real way to turn the dial to eleven that we have hidden that we just don't want customers to use, it just doesn't exist.
To get that you need something more like instruction level tracing and replay, I think something like Valgrind uses a solution of that type. And that's really an entirely different technology type that IntelliTrace is with a whole different set of advantages and disadvantages.
I'm glad that you think that IntelliTrace is a good first step though. I've found via my own dogfooding usage that on top of the default coverage the ability to roll custom events can get you a pretty solid recording of a debug session with a really reasonable overhead.
First, thanks for all the great articles. They've been very interesting and enlightening.
I definitely agree with Steve. See for instance Bil Lewis' Historical Debugger, The Whyline debugger, and
Preshant Deva's "Silver Bullet" debugger (all for Java, and all record EVERYTHING) show some techniques and ideas that are extremely useful, and
that would be impossible to implement with intellitrace.
Specifically, when I'm debugging, I find myself constantly asking myself, what was the value of this variable, before? How was this data changed
after method XXX was called? Intellitrace's current implementation offers _some_ data in the way of answering these questions, but there's a fog
around it - what if the key to solving the bug quickly is somewhere within the lines of method XXX (not just its entry and exit points)?
what if it is deeply nested within some variable's containment hierarchy (more than two levels off)?
The problem with Events as a way of blowing off the fog of missing data IMHO, is that I don't know in advance what data is going to be bad when
I reproduce to bug - If I knew that, it would mean that I know what the bug is, and I can proceed to solve it.
Can you share with us, on a technical level, what are the perceived advantages and disavantages of recording EVERYTHING versus the method you
guys went with using the .NET profiling API?
Is it just the perforamnce trade-off? I think this shouldn't in itself become a dealbreaker, because it can be mitigated:
In other words, I want to record EVERYTHING, and I know that I can't. I can then attack the problem by:
1. Adding more cores (compress the data in memory before writing to disk, thus reducing IO)
2. Writing the intellitrace file to a seperate, dedicated, fast storage device.
3. Adding more RAM - allowing a larger buffer of trace data in memory.
4. Limiting the tracing to just the classes and modules that I really need.
- Omer Raviv
Omar, sorry your comment got flagged as spam for some reason. I re-added it on Monday.
Some comments on a few of the other tools that you mentioned.
First off WhyLine is actually a good example of why IntelliTrace made some of the trade offs that we made. The original Alice prototype for WhyLine was abandoned when it couldn't scale to (in the authors own words) "the types of programs used by professional developers." For the current Java prototypes, which have still not been released in any usable form as far as I can tell, have many more limitations than the original prototypes. For one, they don't support both live debugging and recording at the same time so you have to run and complete an entire debug session before opening the recording up as a WhyLine session due to performance overhead. This is a scenario that we fully support in IntelliTrace as we'd like to give the user the option to step back in time if they see an issue in a current F5 session without having to stop, restart in IntelliTrace mode and lose all their normal debugging tools. Also they don’t collect everything they only collect on specific output primatives such as graphical, textual and exception output. So while for those elements they will have more information than IntelliTrace (mainly assignment via non-function calls) there is a much broader subset of information that IntelliTrace collects information on.
Bil Lewis' Omniscient Debugger is an interesting tool. As far as I can tell it also doesn’t collect everything, but it does add in collection on variable assignments as well as on method calls. From Bil's PDF is appears to have some of the same limitations as IntelliTrace on large objects and appears to mainly collect primitives and specific primitive types off of larger object types. The trade off for collecting on variable assignments seems to be increased slowdown. In the early paper that I'm looking at pretty massive slowdowns were shown when debugging non-trivial applications.
For Silver Bullet all that I was able to find was a google presentation video which I don't really consider enough to offer any type of competitive analysis. If you have a link to an actual techpaper or prototype I'd be glad to examine it.
My point with the above is not to disparage the above tools (I've not used any of them, but they all seem quite interesting) but to point out that none of them are actually "recording everything." All of them are making different trade offs about what they choose to collect and when they choose to collect it. And at each point architecture for the tool is created around those choices. Such as for IntelliTrace we don't support collecting at variable assignment points like in Bil Lewis' debugger. In our prototypes we found the actual useful debugging information gleaned from collecting at method enters and exits (especially with managed code) and from collecting at events points (something that I've not seen supported in any of the tools above) to do much better at giving good data with a usable overhead. And thus both our logging, collection and UI elements moving forward have been based around those choices.
I was with Steve until I discovered the "collecting at method enters/exits" feature. That totally helps debugging crashes post-mortem. Thank you.
New question: Let's say I've got a customer that is reporting crashes to us. We can't repro. Could I theoretically ship IntelliTrace command line with our app and have our app send out .iTrace files from the crash? That would be brilliant.
Thanks for your answer, and cheers on the 'Pinnable Data-tips' blog post - a interesting and thoughtful read, like always.
I have another question regarding what the Intellitrace debugger collects - does it collect (or can I somehow force it to collect) object IDs? IE, if I'm looking at the parameter of a method at a certain point in time, and then jump to a different point in time, can I check whether i'm looking at the same instance? Can I make a retrospective Object ID?
I actually wasn't sure about the Managed ID so I just went and checked. Sadly it looks like these don't show up in historical mode so we're not collecting them. Since they are not "variables" I also thinking that there is no way to add them via PDQs or DataQueries. I can see the utility in this suggestion though, so I'll see if I can get it logged on this side.