If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

Troubleshooting appdomain restarts and other issues with ETW tracing

Troubleshooting appdomain restarts and other issues with ETW tracing

  • Comments 4

You learn something new every day…

I was working on an issue with appdomain restarts caused by page re-compilations because aspx pages were created and modified from the code.  You can get lifetime event information such as shutdown reasons, compilation information etc. in the eventlog  by enabling Application Life Time Events in your web.config (See this post for more info about this and application restarts in general)

We were seeing a lot of the following messages

EventID: 1305
Event code: 1001 Event message: Application is starting

EventID: 1305
Event code: 1002 Event message: Application is shutting down. Reason: Maximum number of recompilations was reached

EventID: 1305
Event code: 1003 Event message: Application compilation is starting

So we know that the application is restarting due to the number of re-compiles but we don’t know what pages we are talking about.

I ran it by my colleague Doug and he introduced me to ETW tracing.

ETW stands for Event Tracing for Windows. In short processes spit out “debug” information (event traces) that you can listen to.  You can even instrument your own code to spit out such trace information.  ETW is meant to be really lightweight and is a really nice tool for troubleshooting a lot of issues.  

IIS and ASP.NET are both heavily instrumented so there is a lot of information that you can get by listening to these. To make a long story short, the reason for these eventlog entries is that batch compilation was turned off, so every time someone hits a new page that no-one has browsed to before, it gets compiled and emits this event.

Logging ETW traces

Doug provided me with a set of cmd files to start and stop ASP.NET and IIS tracing (attached to this post) as well as to parse the output to csv files that can be opened in Excel.  If you want to look at this in more detail you can download Chapter 11 of Troubleshooting IIS 6.0.

Step 1:  Configuring the providers you want to trace  

You can list the providers available on your system using

logman.exe query providers

This will give you a list of GUIDs for these providers and you add them to a text file in the format

GUID  flags  level  #provider   (# here denotes a comment)

In our case, the events we want to trace are ASP.NET Events and IIS:WWW Server

#ASP.NET Events                
{AFF081FE-0247-4275-9C4E-021F3DC1DA35}  0xFFFFFFFF    0x05

#IIS: WWW Server
{3a2a4e84-4c21-4981-ae10-3fda0d9b0f83}     0xFFFFFFFE     0x05

There are a few other IIS providers (SSL etc. but I’ll leave those out for now)

In the zip file this file is called aspnet_trace.txt

Step 2: Starting event tracing

To start listening for events you run the following command

logman start aspnet_trace –pf aspnet_trace.txt –ct perf –o “c:\mylogdir\aspnet_trace_output.etl” –bs 64 –nb 200 400 –ets

If we break this down a bit we have

start start collecting
aspnet_trace the name of the collection
-pf aspnet_trace.txt file with event trace providers
-ct perf session clock type = perf
-o … output path
-bs 64 event trace buffer size = 64k
-nb 200 400 number of event trace session buffers min=200 max=400
-ets send commands directly without scheduling

In the zip file you will find aspnet_trace_start.cmd so to start tracing, simply run aspnet_trace_start from a command prompt.

Step 3: Reproduce your issue

Step 4: Stop event tracing

To stop event tracing, simply run the command

logman stop aspnet_trace –ets

or run aspnet_trace_stop.cmd located in the zip file.

After tracing has finished you will find an aspnet_trace_output.etl file containing all the traces in the directory where the cmd files are located.

Parsing ETL files with LogParser

The final step before we can view all our traces is to parse it into a csv file and this is done using LogParser which you can download from here.

Logparser really deserves a post of its own. It is a really useful tool that allows you to parse IIS logs, eventlogs, ETW logs, netmon logs etc. etc. and query them using sql style syntax.  For example, you can use it to parse out all eventlog events where the message contains “application starting”, or all iis requests where the status is 500.   Really really useful if you have a lot of data to go through.  Doug has written a few posts about it that you might want to have a peak at.

In this case we simply use it to convert the ETW logs to a csv file so we can open it in Excel.

"C:\Program Files (x86)\Log Parser 2.2\logparser.exe" -i:ETW "select * into aspnet_trace_output.csv from aspnet_trace_output.etl"

The cmd file is aspnet_trace_convert.cmd.  Just make sure before you run it that you put your logparser path (as shown below) in lieu of lp in the cmd file.

Once you have run

aspnet_trace_convert

you will find aspnet_trace_output.csv in the directory, and we are now ready to dissect the trace.

Sample data

As I mentioned earlier, ETW for IIS and ASP.NET gives you a lot of information.  If you have an authentication issue you can use it to trace and see who was being authenticated or not authenticated.  The ASP.NET tracing displays events where you enter and leave PageLoad, PageInit etc. and all the different asp.net modules. It also displays when you queue and dequeue asp.net requests, when you compile pages and any and all of the lifetime events.

Just to illustrate how we can narrow down the pages causing the restarts, I have set numRecompilesBeforeAppRestart=”3” so that the application will restart after 3 aspx file modifications for the purposes of the demo (default is 15), and batch=”true” to turn batch compilation on (which is the default)

<compilation numRecompilesBeforeAppRestart="3" batch=”true”>

Here I have made a number of requests and after each request to links.aspx I have modified the page (by entering a space) to make it recompile.

To sort out the interesting bits I have only filtered out the following aspnetrequest events: 

Start – logged when a request is started (shows the url and querystring) 
AppDomainEnter – logged when the request enters the domain (shows the name of the domain)
WebEventRaiseStart – logged when a web event is raised (contains the event id)
CompileLeave – logged when compilation is finished (if batch is turned off this shows the page that is compiled)

Each event logged also contains a contextid which allows you to follow the requests through the asp.net subsystem.  I have colored each request with a different color.

Row EventTypeName UserData
1 Start ConnID=24132632|ContextId={00000000-0000-0000-1700-0060010000fe}|Method=GET|Path=/BuggyBits/CompanyInformation.aspx|QueryString=
2 AppDomainEnter ConnID=24132632|ContextId={00000000-0000-0000-1700-0060010000fe}|DomainId=/LM/W3SVC/1/Root/BuggyBits-1-128703740477482149
3 WebEventRaiseStart ConnID=24132632|ContextId={00000000-0000-0000-1700-0060010000fe}|EventClassName=System.Web.Management.WebApplicationLifetimeEvent|EventCode=1001|EventDetailCode=0
4 Start ConnID=24140680|ContextId={00000000-0000-0000-1800-0060010000fe}|Method=GET|Path=/BuggyBits/Links.aspx|QueryString=
5 AppDomainEnter ConnID=24140680|ContextId={00000000-0000-0000-1800-0060010000fe}|DomainId=/LM/W3SVC/1/Root/BuggyBits-1-128703740477482149
6 WebEventRaiseStart ConnID=24140680|ContextId={00000000-0000-0000-1800-0060010000fe}|EventClassName=System.Web.Management.WebApplicationLifetimeEvent|EventCode=1003|EventDetailCode=0
7 CompileLeave ConnID=24140680|ContextId={00000000-0000-0000-1800-0060010000fe}|Files=Batch compilation: 20 files.|Status=success
8 WebEventRaiseStart ConnID=24140680|ContextId={00000000-0000-0000-1800-0060010000fe}|EventClassName=System.Web.Management.WebApplicationLifetimeEvent|EventCode=1004|EventDetailCode=0
9 Start ConnID=24148728|ContextId={00000000-0000-0000-1900-0060010000fe}|Method=GET|Path=/BuggyBits/Links.aspx|QueryString=
10 AppDomainEnter ConnID=24148728|ContextId={00000000-0000-0000-1900-0060010000fe}|DomainId=/LM/W3SVC/1/Root/BuggyBits-1-128703740477482149
11 WebEventRaiseStart ConnID=24148728|ContextId={00000000-0000-0000-1900-0060010000fe}|EventClassName=System.Web.Management.WebApplicationLifetimeEvent|EventCode=1003|EventDetailCode=0
12 CompileLeave ConnID=24148728|ContextId={00000000-0000-0000-1900-0060010000fe}|Files=Batch compilation: 20 files.|Status=success
13 WebEventRaiseStart ConnID=24148728|ContextId={00000000-0000-0000-1900-0060010000fe}|EventClassName=System.Web.Management.WebApplicationLifetimeEvent|EventCode=1004|EventDetailCode=0
14 Start ConnID=24156776|ContextId={00000000-0000-0000-1a00-0060010000fe}|Method=GET|Path=/BuggyBits/Links.aspx|QueryString=
15 AppDomainEnter ConnID=24156776|ContextId={00000000-0000-0000-1a00-0060010000fe}|DomainId=/LM/W3SVC/1/Root/BuggyBits-2-128703740829979893
16 WebEventRaiseStart ConnID=24156776|ContextId={00000000-0000-0000-1a00-0060010000fe}|EventClassName=System.Web.Management.WebApplicationLifetimeEvent|EventCode=1001|EventDetailCode=0
17 WebEventRaiseStart ConnID=24156776|ContextId={00000000-0000-0000-1a00-0060010000fe}|EventClassName=System.Web.Management.WebApplicationLifetimeEvent|EventCode=1003|EventDetailCode=0
18 CompileLeave ConnID=24156776|ContextId={00000000-0000-0000-1a00-0060010000fe}|Files=Batch compilation: 20 files.|Status=success
19 WebEventRaiseStart ConnID=24156776|ContextId={00000000-0000-0000-1a00-0060010000fe}|EventClassName=System.Web.Management.WebApplicationLifetimeEvent|EventCode=1004|EventDetailCode=0

The event codes for WebApplicationLifeTimeEvents can be found here on Erik Reitan’s blog.

Summarizing the events we have

-- Modifying web.config (restarting the appdomain)
1. Request for /BuggyBits/CompanyInformation.aspx
2. The request enters a newly created appdomain /LM/W3SVC/1/Root/BuggyBits-1…
3. Event 1001 raised = ApplicationStart
-- Modifying links.aspx by entering a space
4. Request for /BuggyBits/Links.aspx
5. This request is executed in the same (existing domain) /LM/W3SVC/1/Root/BuggyBits-1…
6. Event 1003 raised = ApplicationCompilationStart (to compile the changes in Links.aspx)
7. CompileLeave . Batch compilation of the changes
8. Event 1004 raised = ApplicationCompilationEnd
--- Modifying links.aspx again by entering a space
9. Request for /BuggyBits/Links.aspx. 
10. This request is executed in the same (existing domain) /LM/W3SVC/1/Root/BuggyBits-1…
11. Event 1003 raised = ApplicationCompilationStart (to compile the changes in Links.aspx)
12. CompileLeave . Batch compilation of the changes
13. Event 1004 raised = ApplicationCompilationEnd
--- 3rd modification of /BuggyBits/Links.aspx (causing an appdomain restart)
14. Request for /BuggyBits/Links.aspx. 
15. This request is executed in a new appdomain /LM/W3SVC/1/Root/BuggyBits-2
16. Event 1001 raised = ApplicationStart 
17. Event 1003 raised = ApplicationCompilationStart (to compile the changes in Links.aspx)
18. CompileLeave . Batch compilation of the changes
19. Event 1004 raised = ApplicationCompilationEnd

Using this tracing we can follow the events and in this case see that it was changes in Links.aspx that caused the appdomain restarts. 

If batch compilation is turned on we should only see compilation when changes occur or new pages are added.  However if batch compilation is turned off we will see compilation every time we hit pages that haven’t been hit/compiled earlier, but still the last compilation before a restart should give a clue to which page updates cause the restart. 

Final thoughts

I am sure that you can find lots of other good uses for ETW tracing, if you have written posts about where you use it, feel free to link to it in the comments.  

Really what I wanted to show here was just how easy it is to set up and use so that you have a step-by-step for doing it when you need it.

Have fun,

Tess

Attachment: aspnet_trace.zip
  • PingBack from http://www.tmao.info/troubleshooting-appdomain-restarts-and-other-issues-with-etw-tracing/

  • Tess,

    I have heard some great things about microsoft CPR team and how they have some of the smartest people like you. So just wondering how often do you come across (not so interesting) issues?

    For example, in my previous company i was volunteering for critical problem resolution team and a customer complained that our product has stopped working after they applied the hot fix. Basically, this application has a .net front end,activex control with c++ library to talk to scanner device.

    When we asked support for data I saw "connection error message" and support had no idea what was wrong and I just happen to ask do they see the green light ? Did they power on scanner :-)

  • My team does developer support which means that most of our customers are developers, so they have already tried a lot of things before calling us.  So i would say, most issues we get are interesting.

  • HI Tess,

    Thank for great post.

    We successfully found the pages causing to app domain recycle.

    After that what is the next step to stop app domain recycle suddenly.do we need to do changes any changes in pages which are causing app domain recycle.

    Please provide steps to continue to your great post.

    Thanks,

    Bob

Page 1 of 1 (4 items)
Leave a Comment
  • Please add 6 and 5 and type the answer here:
  • Post