Posts
  • PFE - Developer Notes for the Field

    Finally... JavaScript source line info in a dump

    • 0 Comments

    For many years, if your C/C++/.NET application had a performance or stability issue (e.g., unresponsiveness, crash, high cpu consumption, etc.), one of the popular - and effective - ways to troubleshoot the problem was to dump the process when the problem symptom surfaced.  And when you open the user mode dump with windbg or cdb, you get a stack trace for each thread in the process.

    • If you have matching private symbols for the native (e.g., non-.NET) assemblies, you'll get source line information for those frames.
    • For frames involving managed assemblies, source line information hasn't been available since the 1.x days.  But you can use commands from SOS such as !u to get close to the line of source executing when the dump was written.

    But what if your process is executing JavaScript and you want to understand what line of JavaScript is executing?  For the sake of argument, let's assume you're dealing with IE9 or later where the re-vamped JavaScript engine lives in jscript9.dll.  Matching symbols isn't much help in determining what line of JavaScript code was executing, or even which .js file.  Even if you have private symbols - which, as a Microsoft employee, I have access to - it isn't a straightforward exercise to extract this information.  The following is an example stack of what is revealed.  As you can see, there are no pointers to JavaScript source.

    ChildEBP RetAddr 
    000b8e7c 7542790d user32!NtUserGetMessage+0x15
    000b8e98 6f8b8377 user32!GetMessageW+0x33
    000b8f14 6f7f88ca mshtml!InternalShowModalDialog+0x28a
    000baf94 6f7f8a6b mshtml!CWindow::ShowHTMLDialogHelper+0x49c
    000bb00c 6f8a0883 mshtml!CWindow::showModalDialog+0x60
    000bb080 6f54ab59 mshtml!Method_VARIANTp_BSTR_o0oVARIANTp_o0oVARIANTp+0x13d
    000bb104 6f436fe3 mshtml!CBase::ContextInvokeEx+0x84c
    000bb130 6f4397af mshtml!CSelectionObject::InvokeEx+0x2b
    000bb184 6f439c29 mshtml!DispatchInvokeCollection+0x1ab
    000bd214 6f458a64 mshtml!CWindow::InvokeEx+0x5ca
    000bd240 6f458664 mshtml!CBase::VersionedInvokeEx+0x37
    000bd280 6f439142 mshtml!CBase::PrivateInvokeEx+0x82
    000bd2ec 6f439204 mshtml!COmWindowProxy::InvokeEx+0x270
    000bd314 6f458a64 mshtml!COmWindowProxy::subInvokeEx+0x26
    000bd340 6f4392a9 mshtml!CBase::VersionedInvokeEx+0x37
    000bd380 580acbb7 mshtml!PlainInvokeEx+0x95
    000bd3c8 580b2eeb jscript9!HostDispatch::CallInvokeEx+0x106
    000bd3f0 580b2e35 jscript9!HostDispatch::InvokeMarshaled+0x4d
    000bd4cc 580b2c50 jscript9!HostDispatch::InvokeByDispId+0x408
    000bd4e8 580b2c24 jscript9!DispMemberProxy::DefaultInvoke+0x22
    000bd500 580685fe jscript9!DispMemberProxy::DefaultInvoke+0x20
    000bd540 580b375a jscript9!Js::JavascriptFunction::CallFunction+0xc4
    000bd564 580b3703 jscript9!Js::InterpreterStackFrame::OP_CallFld+0x58
    000bd594 580ccadb jscript9!Js::InterpreterStackFrame::Process+0x7ae
    000bd5c4 580cca50 jscript9!Js::InterpreterStackFrame::ProcessThunk+0x65
    000bd6b0 580685fe jscript9!Js::InterpreterStackFrame::InterpreterThunk+0x21f
    000bd6f0 580b375a jscript9!Js::JavascriptFunction::CallFunction+0xc4
    000bd714 580b3796 jscript9!Js::InterpreterStackFrame::OP_CallFld+0x58
    000bd744 580ccadb jscript9!Js::InterpreterStackFrame::Process+0x7c7
    000bd774 580cca50 jscript9!Js::InterpreterStackFrame::ProcessThunk+0x65
    000bd838 580685fe jscript9!Js::InterpreterStackFrame::InterpreterThunk+0x21f
    000bd86c 58068523 jscript9!Js::JavascriptFunction::CallFunction+0xc4
    000bd8d0 5806845a jscript9!Js::JavascriptFunction::CallRootFunction+0xb6
    000bd90c 580683e6 jscript9!ScriptSite::CallRootFunction+0x4f
    000bd934 58093522 jscript9!ScriptSite::Execute+0x63
    000bd9c0 580933bb jscript9!ScriptEngine::ExecutePendingScripts+0x319
    000bda64 58094c46 jscript9!ScriptEngine::ParseScriptTextCore+0x33c
    000bdab0 6f4f0108 jscript9!ScriptEngine::ParseScriptText+0x67
    000bdae8 6f4afb95 mshtml!CActiveScriptHolder::ParseScriptText+0xc7
    000bdb58 6f4f108f mshtml!CScriptCollection::ParseScriptText+0x2d2
    000bdc3c 6f4f0dde mshtml!CScriptData::CommitCode+0x4ea
    000bdc8c 6f331d9e mshtml!CScriptData::Execute+0x1fc
    000bdcb8 6f4aff93 mshtml!CScriptData::ScriptData_OnEnterTree+0x240

    So if you have a random issue involving JavaScript that can't be reproduced on demand, and you can't repro outside of production, your options of getting a stack of the problem are limited to a dump or a profiler.  But in neither case were you able to easily get source line information for the JavaScript portion of the code.  At least, until now.

     

    When you use the latest debugger engine in the Windows 8.1 SDK to create and view a dump of IE11 on Windows 8.1, your stack will show you frames you never saw before.  The numbers at the end of the highlighted frames represent column & row numbers:

    ChildEBP RetAddr 
    (Inline) -------- jscript9!Js::TaggedInt::IsPair+0x9
    0cc6ebb0 074600b9 jscript9!Js::SSE2::JavascriptMath::Modulus_InPlace+0x11
    WARNING: Frame IP not in any known module. Following frames may be wrong.
    0cc6ec30 64bd4094 js!UTIL.powerLoop1 [http://pmav.eu/stuff/javascript-webworkers/assets/js/javascript-webworkers-v1.4.0.js @ 367,17]
    0cc6ef78 64b46548 jscript9!Js::InterpreterStackFrame::Process+0x34b1
    0cc6f09c 05d10fc1 jscript9!Js::InterpreterStackFrame::InterpreterThunk<1>+0x1e8
    0cc6f0a8 64b4669e js!UTIL.power [http://pmav.eu/stuff/javascript-webworkers/assets/js/javascript-webworkers-v1.4.0.js @ 367,17]
    0cc6f3f8 64b46548 jscript9!Js::InterpreterStackFrame::Process+0xbd7
    0cc6f53c 05d10fc9 jscript9!Js::InterpreterStackFrame::InterpreterThunk<1>+0x1e8
    0cc6f548 64b40685 js!onmessage [http://pmav.eu/stuff/javascript-webworkers/assets/js/javascript-webworkers-v1.4.0.js @ 323,17]
    0cc6f590 64b4100e jscript9!Js::JavascriptFunction::CallFunction<1>+0x88
    0cc6f5fc 64b40f60 jscript9!Js::JavascriptFunction::CallRootFunction+0x93
    0cc6f644 64b40ee7 jscript9!ScriptSite::CallRootFunction+0x42
    0cc6f66c 64b4993c jscript9!ScriptSite::Execute+0x6c
    0cc6f6c8 64b49878 jscript9!ScriptEngineBase::ExecuteInternal<0>+0xbb
    0cc6f6e0 6732b78f jscript9!ScriptEngineBase::Execute+0x1c
    0cc6f794 6732b67c mshtml!CListenerDispatch::InvokeVar+0x102
    0cc6f7c0 6732b21e mshtml!CListenerDispatch::Invoke+0x61
    0cc6f858 6732b385 mshtml!CEventMgr::_InvokeListeners+0x1a2
    0cc6f9cc 67245ecc mshtml!CEventMgr::Dispatch+0x35a
    0cc6fa08 6721eb22 mshtml!CMessagePort::HandlePostMessage+0x207
    0cc6fa10 6721ed2c mshtml!CMessagePort::HandleNotification+0x2e

    <clip>

     

    If you're not having success seeing these JavaScript frames in your dumps, here's a quick checklist I compiled during my testing.

    • This is only supported on Windows 8.1.
    • This is supported for any process that loads jscript9.dll. This includes processes which host the browser control for IE11.  Note I only tested iexplore.exe.
    • You'll need to create *and* view the dump with the debugger engine (used by windbg & cdb) that comes with the 8.1 SDK noted in the link I pasted above.  Older versions of the debuggers may not work.
    • Dumps created with DebugDiag v2.0 U1 do not currently yield the JS frames.  But I've had luck replacing the dbghelp.dll that ships with DebugDiag with the build that ships with the 8.1 SDK.  The next release of DebugDiag is scheduled to include a build of dbghelp.dll that will reveal these JS frames.
    • If you use adplus_old.vbs to create the dump, you may consider making an edit in the file.  After you make a copy of the original .vbs, replace all calls to ".dump -u /ma" with "dump /ma".
    • JS frames will not resolve in a live debug.  This is currently only supported in a post-mortem dump.
    • Be sure the process you're dumping has JavaScript executing at the time of the dump.  Otherwise, you're unlikely to see any of the JS frames.
    • As always, it's a best practice to keep the architectures of the debugger and target aligned:
      • Dump a 32-bit process with a 32-bit debugger, and open the dump with a 32-bit debugger.
      • Dump a 64-bit process with a 64-bit debugger, and open the dump with a 64-bit debugger.
      • With DebugDiag, install the 32-bit version on a 32-bit OS and install the 64-bit version on a 64-bit OS.  In the latter case, DebugDiag automagically determines the debugger bitness to use against the target process.
    • If you've followed all the steps above and are still having problems getting the JS frames to display, ensure the following registry keys have these values. 
      • [HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\MiniDumpAuxiliaryDlls, C:\Windows\system32\jscript9.dll = [REG_SZ] C:\Windows\System32\jscript9diag.dll
      • [HKLM\SOFTWARE\Wow6432Node\Microsoft\Windows NT\CurrentVersion\MiniDumpAuxiliaryDlls], C:\Windows\system32\jscript9.dll = [REG_SZ] C:\Windows\SysWOW64\jscript9diag.dll

     

    While the post-mortem debugging experience with JavaScript still isn't as mature as what we've enjoyed for years with .NET or C/C++, the ability to get the source file and line information removes the first significant blocker - now you can easily see what was executing when the process was dumped.

  • PFE - Developer Notes for the Field

    Extracting Pictures from a Byte Array in a Memory Dump

    • 0 Comments

     

    I ran into a case recently where there was a large amount of memory being used in a 32 bit IIS process leading to an out of memory exception.  The steps to getting the correct data are documented all over the place.  Here’s the link to Tess’ site that I learned on: http://blogs.msdn.com/b/tess/archive/2008/02/15/net-debugging-demos-lab-3-memory.aspx

     

    I’ll skip the setup and go straight to the debug of the high memory usage dump.  To start with, as always when I’m debugging, we start with what we know:

    • Managed application running in IIS
    • 32 bit
    • Higher than expected memory usage

     

    Our next step is to list what we need to know:

    • How much memory is being used?
    • Manged or Native?
    • Leak or heavy usage?
    • What can we do about it?

     

    The first thing to do is figure out where the memory usage is, and whether it’s managed or Native.

     

    0:000> !address -summary

    --- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
    Free                                    280          355ac000 ( 853.672 Mb)           41.68%
    <unknown>                     830          34808000 ( 840.031 Mb)  70.34%   41.02%
    Image                                 1013          12479000 ( 292.473 Mb)  24.49%   14.28%
    Heap                                    218           3189000 (  49.535 Mb)   4.15%    2.42%
    Stack                                   129            ac0000 (  10.750 Mb)   0.90%    0.52%
    Other                                    26            14e000 (   1.305 Mb)   0.11%    0.06%
    TEB                                      43             2b000 ( 172.000 kb)   0.01%    0.01%
    PEB                                       1              1000 (   4.000 kb)   0.00%    0.00%

    --- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
    MEM_PRIVATE                             892          36629000 ( 870.160 Mb)  72.86%   42.49%
    MEM_IMAGE                              1314          13df7000 ( 317.965 Mb)  26.62%   15.53%
    MEM_MAPPED                               54            624000 (   6.141 Mb)   0.51%    0.30%

    --- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
    MEM_COMMIT                             1823          369e6000 ( 873.898 Mb)  73.17%   42.67%
    MEM_FREE                                280          355ac000 ( 853.672 Mb)           41.68%
    MEM_RESERVE                             437          1405e000 ( 320.367 Mb)  26.83%   15.64%

    The memory usage we’re concerned about is any committed memory as that will usually be close to the Private Bytes that value in PerfMon.  This is shown in the State Summary section as MEM_COMMIT and is currently at approximately 874 MB.  We can tell a few other things from the results above by comparing State, Type, and Usage.    For one, generally the majority of Image memory will be of state MEM_COMMIT.  Right off the bat, we can extract about 292 MB from MEM_COMMIT leaving us with about 582 MB to place still.   If we look at the Usage summary, we can see that 840 MB is from <unknown>.  That memory usage most often comes from Virtual Allocations.  It’s also where any managed memory will be.  Going back to our MEM_COMMIT memory, it seems likely that the majority of the 58 2MB will be handled fall under this bucket.  We don’t see much memory at all in Heap, so we can conclude that the memory is either allocated by managed memory, or by a native component making use of a managed allocator.

    The next step is to look at managed memory and see if we have around 500 MB of memory usage.  We can do that by loading psscor and running !clrusage.

    0:000> .load psscor2
    0:000> !clrusage
    Native Heap for aspnet_filter: 0x01220000
    Native Heap for .NET: 0x01fd0000
    Number of GC Heaps: 8
    Heap Size  0x62fb198(103,788,952)
    Heap Size  0x303c5b8(50,578,872)
    Heap Size  0x41e9e58(69,115,480)
    Heap Size  0x2ca3010(46,805,008)
    Heap Size  0x3f8209c(66,592,924)
    Heap Size  0x25b3b68(39,533,416)
    Heap Size  0x20b5a18(34,298,392)
    Heap Size  0x17a65a8(24,798,632)
    May contain Reserved segments, run with -v switch to see them.
    ------------------------------
    GC Heap Size  0x19f5617c(435,511,676)
    Total Commit Size  1f9f8000 (505 MB)
    Total Reserved Size  11608000 (278 MB)
    Initial reservation type: All at once
    Initial Allocation Size: 18000000 (402,653,184) (384 MB)
    Reserved Memory Size: 31000000 (822,083,584) (784 MB)
    Reserved Memory Limit Size: 31000000 (822,083,584) (784 MB)

    The managed committed memory is 505MB.  This lines up with what we were expecting to see if the memory pressure came from managed memory.  That means we’re investigating a managed memory issue.  We can now move on to seeing what is taking up all of the managed memory by using !dumpheap -stat

    0:000> !dumpheap -stat
    Using our cache to search the heap.
    0x7aa50c98          1            12 System.Diagnostics.CounterCreationDataCollection
    0x7aa40c38          1            12 System.ComponentModel.Design.RuntimeLicenseContext
    0x7a5eed4c          1            12 System.ComponentModel.WeakHashtable+WeakKeyComparer
    0x7a5eeb38          1            12 System.ComponentModel.BooleanConverter

    <snipped for brevity>

    0x000e8600        521    24,483,528      Free
    0x02af0c6c    119,732    74,368,372 System.String
    0x02af36dc     11,634   313,028,160 System.Byte[]
    Total 672,044 objects, Total size: 435,460,280

     

    Right away, those Byte arrays stand out.  It should be stated, that we usually expect to see Byte arrays at the bottom of the list, but as there are 313 MB of them here, it’s definitely something we want to look into.

    There are a couple ways to proceed.  We could dump all byte arrays and look for a pattern in the sizes, or we could look for large byte arrays.  We’ll start with large byte arrays.  By large, we mean anything that would have ended up in the large object heap.  Since we know LoH objects are all greater than or equal to 85,000 bytes, we can specify a minimum size when we view the heap to limit to these large objects.

    0:000> !dumpheap -mt 0x02af36dc -min 85000 -stat
    Using our cache to search the heap.
       Address         MT     Size  Gen
    Statistics:
            MT    Count    TotalSize Class Name
    0x02af36dc      139  305,074,736 System.Byte[]
    Total 139 objects, Total size: 305,074,736

    This shows us we have 139 Byte Arrays in the large object heap that account for 290.94 MB of our managed space.  Since there were over 11 thousand byte arrays total and all together they accounted for 298 MB, it’s fair to say the LoH Byte Arrays should be the focus of our investigation.  Our next step then is to dump out the Byte Arrays and begin looking to see what they are and if they’re still rooted.

    0:000> !dumpheap -mt 0x02af36dc -min 85000 
    Using our cache to search the heap.
       Address         MT     Size  Gen
    0x202fb838 0x02af36dc 2,077,524 3 System.Byte[]
    0x205dc320 0x02af36dc 3,704,228 3 System.Byte[]
    0x209648d8 0x02af36dc 4,990,036 3 System.Byte[]
    0x20e26d40 0x02af36dc  125,592 3 System.Byte[]
    0x3b1f0038 0x02af36dc 3,449,224 3 System.Byte[]
    0x3b53a1d0 0x02af36dc  109,140 3 System.Byte[]
    0x3b554c38 0x02af36dc 2,625,400 3 System.Byte[]

    <snipped for brevity>

     

    I ran !gcroot on several of the address listed, but did not find any roots.  As it turns out, the dump I was looking at was taken on a terminate process breakpoint and many of the managed structures were already in the process of being torn down.  Since we couldn’t find any roots explaining why the Byte Arrays were still around, we have to look at the Arrays themselves to figure out what they are and hopefully tie them back to some source code.

    A quick db on one of the Byte Arrays shows

    0:000> db 0x202fb838
    202fb838  dc 36 af 02 46 b3 1f 00-ff d8 ff e1 63 1d 45 78  .6..F.......c.Ex
    202fb848  69 66 00 00 4d 4d 00 2a-00 00 00 08 00 0a 01 0f  if..MM.*........
    202fb858  00 02 00 00 00 16 00 00-01 b2 01 10 00 02 00 00  ................
    202fb868  00 25 00 00 01 c8 01 12-00 03 00 00 00 01 00 01  .%..............
    202fb878  00 00 01 1a 00 05 00 00-00 01 00 00 01 ee 01 1b  ................
    202fb888  00 05 00 00 00 01 00 00-01 f6 01 28 00 03 00 00  ...........(....
    202fb898  00 01 00 02 00 00 01 32-00 02 00 00 00 14 00 00  .......2........
    202fb8a8  01 fe 02 13 00 03 00 00-00 01 00 01 00 00 47 46  ..............GF

    Most of what we see is unhelpful, but we see the string “Exif” suggesting this might be a digital photo.  Dumping further into the array we see another interesting string:

    202fb9f8 00 00 00 00 00 00 45 41 53 54 4d 41 4e 20 4b 4f  ......EASTMAN KO
    202fba08 44 41 4b 20 43 4f 4d 50 41 4e 59 00 4b 4f 44 41  DAK COMPANY.KODA
    202fba18 4b 20 45 41 53 59 53 48 41 52 45 20 56 31 30 37  K EASYSHARE V107
    202fba28 33 20 44 49 47 49 54 41 4c 20 43 41 4d 45 52 41  3 DIGITAL CAMERA

    That certainly supports the idea that we have a digital photo taken with a digital camera (a Kodak Easyshare to be precise).  Checking a few of the other byte arrays, we can see the same pattern.

    0:000> db 0x202fb838
    202fb838  dc 36 af 02 46 b3 1f 00-ff d8 ff e1 63 1d 45 78  .6..F.......c.Ex
    202fb848  69 66 00 00 4d 4d 00 2a-00 00 00 08 00 0a 01 0f  if..MM.*........

    0:000> db 0x3c478e80
    3c478e80  dc 36 af 02 1c c6 49 00-ff d8 ff e0 00 10 4a 46  .6....I.......JF
    3c478e90  49 46 00 01 01 01 00 60-00 60 00 00 ff e1 19 78  IF.....`.`.....x
    3c478ea0  45 78 69 66 00 00 49 49-2a 00 08 00 00 00 11 00  Exif..II*.......

    0:000> db 0x3a1f0038
    3a1f0038  dc 36 af 02 0a ff 32 00-ff d8 ff e0 00 10 4a 46  .6....2.......JF
    3a1f0048  49 46 00 01 01 01 00 48-00 48 00 00 ff e1 23 27  IF.....H.H....#'
    3a1f0058  45 78 69 66 00 00 49 49-2a 00 08 00 00 00 10 00  Exif..II*.......

    At this point, we’ll assume they were all digital photos.  We can revise that assumption if source review doesn’t agree with us.  At this point, we really need to look at the source to see why the pictures are staying around.  I don’t have the source available as this was given to me as a dump taken during shutdown, so exporting the modules didn’t work.  The only thing left to really get from the dump is the pictures themselves.  All that’s left to do is export a few of the byte arrays to disk so we can open them and maybe narrow down what we need to look at in the source (are these employee photos? a family outing? product photos?)

    We’ll use the writemem command in windbg and simply write the byte array from the address of the array plus 8 to avoid the metadata.

    0:000> !do 0x202fb838
    Name: System.Byte[]
    MethodTable: 02af36dc
    EEClass: 028aeb7c
    Size: 2077522(0x1fb352) bytes
    GC Generation: 3
    Array: Rank 1, Number of elements 2077510, Type Byte
    Element Type: System.Byte
    Fields:
    None
    0:000> .writemem f:\output.jpg 0x202fb838+8 L?0x1fb352
    Writing 1fb352 bytes........................................

    Then we can open it up and view our result.  I dumped several out, all of which were pictures of kids on what looked to be a field trip.  That helped narrow down the user activity and focus the source review for the resolution.  If the source code didn’t provide a resolution, our next step would have been to take another dump when the memory was high but the process wasn’t shutting down.  This should have shown us how the Byte Arrays were rooted at which point we would have been able to handle them.

     

    In this case, I used the following command to export all of the pictures at once:

    .foreach (var {!dumpheap -mt 0x02af36dc -min 85000 -short}) {.writemem f:\Pictures\${var}.jpg ${var}+8 L?poi(${var}+4)}

     

    -Jarrod

  • PFE - Developer Notes for the Field

    Troubleshooting Event ID 5010 — IIS Application Pool Availability

    • 2 Comments

    With this event you will get a message stating, “A process serving application pool '%1' failed to respond to a ping. The process id was '%2'.”  When an Event ID 5010  appears in your system log and WAS is the source, you can be assured of two things.  The first is that a worker process failed to respond to a ping from the Windows Process Activation Service (WAS).  The second is that WAS attempted to shut down the offending process.  Generally, this means that either the worker process hung or it did not have enough resources to respond to the ping request.  The resolution in the documentation for this Event is listed as:

    Diagnose an unresponsive worker process

    A worker process that fails to respond may exhibit one of the following symptoms:

    • Ping failure.
    • Communication failure with WAS.
    • Worker process startup time-out.
    • Worker process shutdown time-out.
    • Worker process orphan action failure.

     

    There are some links provided as well that provide guidance on troubleshooting memory leaks, long response times, and high CPU usage in the worker process.   In each of those 3 cases, there is a deterministic approach with which you can take a memory dump and investigate the issue.  However, in practice there are times when the worker process may simply hang without displaying any of those behaviors.  So the question becomes, how do we know when the worker process is hung?  The answer is, WAS knows, so we just have to ask WAS.

    I’m going to go through the steps of finding the breakpoints and getting the needed information because the methodology can be used elsewhere.  For example, though the steps at the bottom of the page work for IIS 7, 7.5, and 8, I haven’t tested them with IIS 6 or below.  If you’re currently experiencing this issue and just want to know how to get the memory dump of the Worker Process, feel free to skip down to the bottom.

    When working with an issue like this, our first step is always to clarify what we know and what we need to know.

    What we know:

    • WAS knows when a worker process hangs
    • WAS knows enough about the worker process to shut it down if it appears hung

     

    What we need to know:

    • How to tell when WAS thinks a worker process is hung
    • How to identify the hung process
    • How to create a dump of the worker process before it’s terminated by WAS

     

    To get to the answers we need, we’ll investigate the WAS process with WinDbg.  Our goal will be to find a method call that signifies a failed ping, set a breakpoint on it, then identify the associated worker process at the time of failure.  We’ll start by attaching the debugger to the WAS process and reloading symbols.  Symbols are important here because we’ll need them to investigate the method calls.  Once we have the debugger attached, we can take a look at the loaded modules in an effort to make some guesses at where we should look.  We’ll do this by running lm, for list modules.  We can append 1m and sm to only list the module names and sort them alphabetically.

    0:000> lm 1m sm
    ADVAPI32
    bcryptPrimitives
    clbcatq
    combase
    CRYPT32
    CRYPTBASE
    CRYPTSP
    DPAPI
    GDI32
    HTTPAPI
    IISRES
    iisutil
    iisw3adm
    KERNEL32
    KERNELBASE
    ktmw32
    logoncli
    mlang
    MSASN1
    msvcrt
    nativerd
    NSI
    ntdll
    ntmarta
    ole32
    OLEAUT32
    pcwum
    profapi
    RPCRT4
    rsaenh
    sechost
    Secur32
    SSPICLI
    svchost
    USER32
    USERENV
    w3ctrlps
    W3TP
    WS2_32
    XmlLite

    Now that we have the modules, we can make some guesses as to which ones might be related to the failed ping request processing.  It seems likely that the module responsible will either be one of the IIS processes or a process with w3 in the name.  Next in the debugger we can search for method calls in those modules that could likely be related.  We can use the x command to search for the methods using wild cards.

    0:000> x *iis*!*ping*
    000007fb`45efb2d4 iisw3adm!WORKER_PROCESS::PingResponseTimerExpiredWorkItem (<no parameter info>)
    000007fb`45f2c330 iisw3adm!g_aIISWPSiteMappings = <no type information>
    000007fb`45f247dc iisw3adm!AddDomainMapping (<no parameter info>)
    000007fb`45f13fb0 iisw3adm!CONFIG_CS_PATH_MAPPER::SetConfigFileMappings (<no parameter info>)
    000007fb`45f2c4d0 iisw3adm!g_aIISWPGlobalMappings = <no type information>
    000007fb`45efdde4 iisw3adm!MESSAGING_HANDLER::HandlePingReply (<no parameter info>)
    000007fb`45efc59c iisw3adm!PingResponseTimerExpiredCallback (<no parameter info>)
    000007fb`45f24918 iisw3adm!EnsureLSAMapping (<no parameter info>)
    000007fb`45f2c490 iisw3adm!g_aIISULGlobalMappings = <no type information>
    000007fb`45f2c2d0 iisw3adm!aIISULSiteMappings = <no type information>
    000007fb`45f059f4 iisw3adm!PERF_MANAGER::ProcessPerfCounterPingFired (<no parameter info>)
    000007fb`45f2c168 iisw3adm!g_cIISWPSiteMappings = <no type information>
    000007fb`45efb64c iisw3adm!WORKER_PROCESS::CancelPingResponseTimer (<no parameter info>)
    000007fb`45f2d3cc iisw3adm!g_fWASStoppingInProgress = <no type information>
    000007fb`45f06be0 iisw3adm!PerfCountPing (<no parameter info>)
    000007fb`45f31dc0 iisw3adm!_imp_LsaManageSidNameMapping = <no type information>
    000007fb`45ed98a0 iisw3adm!CheckWASIsStopping (<no parameter info>)
    000007fb`45f31eb0 iisw3adm!_imp_CreateFileMappingW = <no type information>
    000007fb`45f31eb8 iisw3adm!_imp_OpenFileMappingW = <no type information>
    000007fb`45f2c164 iisw3adm!cIISULSiteMappings = <no type information>
    000007fb`45f2c484 iisw3adm!g_cIISULGlobalMappings = <no type information>
    000007fb`45efb164 iisw3adm!WORKER_PROCESS::SendPingWorkItem (<no parameter info>)
    000007fb`45efc51c iisw3adm!SendPingTimerCallback (<no parameter info>)
    000007fb`45f31df0 iisw3adm!_imp_CreateFileMappingA = <no type information>
    000007fb`45f31df8 iisw3adm!_imp_OpenFileMappingA = <no type information>
    000007fb`45f2c488 iisw3adm!g_cIISWPGlobalMappings = <no type information>

    The first method call returned looks very promising and is a good place to start.  We can set a breakpoint on that line item, tell the process to continue, then intentionally cause a failed ping request to see if the breakpoint trips.

     

    0:000> bp iisw3adm!WORKER_PROCESS::PingResponseTimerExpiredWorkItem
    0:000> g

    To cause a failed ping request, we can suspend a worker process with a tool such as Process Explorer.

    image

     

    After the time specified in the Application Pool’s configuration, the WAS process should identify the worker process as hung at attempt to shut it down.  If our breakpoint is correct, the debugger, still attached to the WAS process, should break in.  As it turns out, that is exactly what happens in this case.

    Breakpoint 0 hit
    iisw3adm!WORKER_PROCESS::PingResponseTimerExpiredWorkItem:
    000007fb`45efb2d4 48895c2410      mov     qword ptr [rsp+10h],rbx ss:000000cd`a86cfba8=705541a9cd000000

    The breakpoint has been hit, and if we check Process Explorer, we can see the process is still alive.  Further, we know because of the calling convention that we’re dealing with a WORKER_PROCESS object.  At this point, we could absolutely write a script to dump all of the worker processes on the machine and we would know we got a dump of the hung process.  If however, we’re working on a machine that has a lot of worker processes, it may be more beneficial to find the actual worker process being referred to and just dump that object.  The WORKER_PROCESS object will almost undoubtedly contain a PID for the worker process with which we could use with a script to dump just that worker process.  We can investigate the current thread and try to find the worker process object.  We can then look to see if there is an offset from the object that points to the PID.

    We’ll start by dumping the call stack.

    0:002> k
    Child-SP          RetAddr           Call Site
    000000cd`a86cfb98 000007fb`45ef7385 iisw3adm!WORKER_PROCESS::PingResponseTimerExpiredWorkItem
    000000cd`a86cfba0 000007fb`45ede8e3 iisw3adm!WORKER_PROCESS::ExecuteWorkItem+0x41
    000000cd`a86cfbf0 000007fb`45ede442 iisw3adm!WORK_ITEM::Execute+0x13
    000000cd`a86cfc30 000007fb`45edb129 iisw3adm!WORK_QUEUE::ProcessWorkItem+0xa2
    000000cd`a86cfc70 000007fb`45edafb7 iisw3adm!WEB_ADMIN_SERVICE::MainWorkerThread+0x35
    000000cd`a86cfcb0 000007fb`45ed9aec iisw3adm!WEB_ADMIN_SERVICE::ExecuteService+0xdf
    000000cd`a86cfcf0 000007fb`4e8f167e iisw3adm!WASServiceThreadProc+0x94
    000000cd`a86cfd60 000007fb`50533501 KERNEL32!BaseThreadInitThunk+0x1a
    000000cd`a86cfd90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

     

    This gives us an idea of the calling order, but the public symbols we’re using won’t give us local variables and in this instance they won’t tell us if and what parameters are passed.  x86 Calling conventions dictate that in a 64 bit process, the first 4 parameters are stored in the CPU registers RCX, RDX, R8, then R9 in that order.  Any parameters in addition to that will be stored on the stack.  If this were a 32 bit process, the parameters would most likely all be on the call stack.  Knowing this, we can start by searching the call stack and the 4 registers mentioned for the PID we identified.  In fact, any of the non null registers displayed with the r command are probably a good place to look, excluding RIP which will point to code.

    0:002> r
    rax=0000000000000000 rbx=0000000000000000 rcx=000000cda910e520
    rdx=000000cda94159d0 rsi=0000000000000001 rdi=000000cda910e520
    rip=000007fb45efb2d4 rsp=000000cda86cfb98 rbp=000000cda8709ec0
      r8=000000cda86cfb28  r9=000000cda86cfc10 r10=0000000000000000
    r11=0000000000000246 r12=0000000000000000 r13=0000000000000000
    r14=0000000000000000 r15=0000000000000000
    iopl=0         nv up ei pl zr na po nc
    cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
    iisw3adm!WORKER_PROCESS::PingResponseTimerExpiredWorkItem:
    000007fb`45efb2d4 48895c2410      mov     qword ptr [rsp+10h],rbx ss:000000cd`a86cfba8=705d41a9cd000000


    We can see that RAX and RBX are null, but RCX and RDX are not.  We’ll start with the call stack then move on to the other registers. For the call stack, we’ll search between 100 bytes under the top of the stack pointer (RSP) and up to the stack base (RBP).  Remember, stacks call stacks start at a higher address and grow down as they move up the stack.  We’re searching for a word value because that’s plenty of space for the PID to be stored in and searching for our known PID, 8012.

    0:002> s -[1]w @rsp-100 @rbp 0n8012

    This returned no results.  So, we don’t have the PID on the call stack.  Next we’ll search from RCX to 300 Bytes greater than RCX.

    0:002> s -[1]w @rcx L300 0n8012
    0x000000cd`a910e568
    0x000000cd`a910e56c

    In this case we have two results.  We can see how far from RCX the first result is.

    0:002> ?0x000000cd`a910e568-@rcx
    Evaluate expression: 72 = 00000000`00000048

    We get an offset of 72 bytes, which is very close.  We can do a couple things to verify that we can use this offset in the future.  First, we can check to see if we have any symbol information near RCX.

    0:002> dps @rcx
    000000cd`a910e520  000007fb`45ecc2b8 iisw3adm!WORKER_PROCESS::`vftable'
    000000cd`a910e528  000000cd`43525057
    000000cd`a910e530  000000cd`a90f4998
    000000cd`a910e538  000000cd`a90f4998
    000000cd`a910e540  00000000`00000005
    000000cd`a910e548  000000cd`a9390db0
    000000cd`a910e550  00000000`00000006
    000000cd`a910e558  000000cd`a90f4930
    000000cd`a910e560  000000cd`a90f4b40
    000000cd`a910e568  00001f4c`00001f4c
    000000cd`a910e570  00000000`0000031c
    000000cd`a910e578  00000000`00000000
    000000cd`a910e580  000000cd`a7f7ed20
    000000cd`a910e588  00000000`00000001
    000000cd`a910e590  000000cd`00000000
    000000cd`a910e598  00000000`00000000

     

    It seems RCX is a pointer to a WORKER_PROCESS virtual function table.  This means it is essentially the address for the WORKER_PROCESS object.  It also means our offset of RCX+72 is probably correct and could be used consistently.  We can be sure with one last check.  We need to let the debugger continue, create and pause another worker process, let the debugger break in at our earlier breakpoint, then validate that RCX+72 points to the new PID.

    With the worker process paused, we note our new PID is 4356.  In Hex that comes to 1104, so that’s what we’re looking for.

    0:002> g
    Breakpoint 0 hit
    iisw3adm!WORKER_PROCESS::PingResponseTimerExpiredWorkItem:
    000007fb`45efb2d4 48895c2410      mov     qword ptr [rsp+10h],rbx ss:000000cd`a86cfba8=705d41a9cd000000
    0:002> dw @rcx+0n72 l1
    000000cd`a910e568  1104

     

    Clearly, our offset works as it points to the correct new PID.  This means we can use our breakpoint and this offset to create a script to dump the hung worker process.  Below, I’ve listed the steps to script this with DebugDiag, but it could easily be adapted to work with ADPlus or even CDB and WinDbg directly.

    (On a side note, review of the disassembly for the PingResponseTimerExpiredWorkItem showed that it did not have any parameters.  RCX was clearly not a parameter, but it’s a common enough register that it is still good to check.  It definitely paid off in this instance.)

     

    Steps to get a memory dump using DebugDiag from the faulting worker process when a 5010 is raised.

    The assumption is made that the production machine with the issue does not have internet access as this is a common practice.  If it’s not the case and internet access is available, you can complete Step 1 on the production machine.

     

    Step 1: Get Symbols for WAS Service

    1. Open DebugDiag on a production server
    2. Navigate to the Processes tab
    3. Sort by the Service Column Descending and identify the line containing the process for the WAS service
    4. Right click the WAS service and choose create full memory dump
    5. Zip the dump and copy it to a machine with external internet access
    6. On the new machine use the LoadSymbols script found here: http://blogs.msdn.com/b/puneetgupta/archive/2010/05/14/debugdiag-script-to-load-all-symbols-in-a-dump-file.aspx to collect symbols for the WAS service
    7. On the analysis tab of DebugDiag, highlight the line for LoadSymbols
    8. Click Add Data File
    9. Navigate to the dump from the prod server and click ok
    10. Now click Start Analysis
    11. By default, DebugDiag will save the symbol files to c:\symcache.  Copy the contents of that folder to the same path on the production machine.  (If you’ve changed the symbol settings in DebugDiag previously, you must make sure you’re pointing to the Microsoft symbol server and make note of your local path)

     

    Step 2: Set up DebugDiag on Production machine

    1. On the Rules tab, click Add Rule and select a Crash Rule
    2. Select “A specific NT Service” and click Next
    3. Highlight the WAS service and click Next
    4. Click Breakpoints then Add Breakpoint
    5. In Breakpoint Expression enter: iisw3adm!WORKER_PROCESS::PingResponseTimerExpiredWorkItem
    6. Under Action Type select Custom
    7. Add the following script in the window that pops up:
    Dim w3wpProcessId
    rcx = CausingThread.Register("rcx")
    w3wpProcessId = Debugger.ReadDWORD(rcx + 72)
    WriteToLog "w3wp process id = " & w3wpProcessId
    Debugger.CreateDumpForProcessID w3wpProcessID,"Failed Ping",False

     

    1. Set the Action Limit to 5 then click ok
    2. Click Save & Close
    3. Click Next all the way through.  You can specify the location of the dump file on one of the screens before you click finish.
    4. Check the log for svchost under c:\program files\DebugDiag and make sure it shows that the breakpoint was set.  If it was not set, try restarting the Debug Diagnostic Service.  The log should have something like the following:

     

    [9/30/2012 2:46:53 PM] Initializing control script
    [9/30/2012 2:46:53 PM] Clearing any existing breakpoints
    [9/30/2012 2:46:53 PM]
    [9/30/2012 2:46:53 PM] Attempting to set unmanaged breakpoint at iisw3adm!WORKER_PROCESS::PingResponseTimerExpiredWorkItem
    [9/30/2012 2:46:53 PM] bpID = 0
    [9/30/2012 2:46:53 AM] Current Breakpoint List(BL)
      0 e 000007fb`45efb2d4     0001 (0001)  0:**** iisw3adm!WORKER_PROCESS::PingResponseTimerExpiredWorkItem

     

    You can test the above set up by suspending a worker process with Process Explorer and making sure a memory dump is created.  That’s all there is to it.  Of course, when you have the memory dump, that’s just the beginning as you will have to debug that dump to identify the cause of the hang.

     

     

    -Jarrod

  • PFE - Developer Notes for the Field

    Caller Information – a new concept in Visual Studio 2012

    • 2 Comments

    With the recent release of Visual Studio 2012, there are many new features and updates to explore.  One of the new features is the concept of Caller Information, available in both C# and VB.  Caller Information allows you to obtain certain information about the caller to a method.  More specifically, it allows you to get the name of the calling method as well as the source file information and line number.

     

    Many customers currently use catch blocks to document the name of the current method and maybe even the source file from which an exception is thrown.  But Caller Information provides a way to get this information (and more – source line info), even if no exception occurs.   The way it works is that you include attributes as optional parameters in your function’s signature.  There are three of these, and they’re defined in the System.Runtime.CompilerOptions namespace.  You should supply a default value to whichever ones you include in your function’s signature.

    CallerFilePath – The full path and filename (during compile time) of the caller.

    CallerLineNumber – The line number (in the source code) where the caller calls the callee.

    CallerMembername – The method or property name of the caller.

     

    Caller Information can be of particular value for profiling or tracing tools.  However, you don’t need to write a profiler in order to take advantage of this new feature.  One example where an application could make use of Caller Information would be if you have a function (say, MyFunc) which is called from many different functions.  Let’s say MyFunc takes an argument or two, and occasionally MyFunc gets an exception when using the arguments – for example, a NullReferenceException.  But the fact that MyFunc doesn’t reproduce this exception consistently may make it difficult to determine what the code path is that results in the exception.

    If MyFunc captured the calling function, you’d have a much better idea of how to reproduce the problem and, therefore, find root cause and a resolution.

     

    Here’s a demonstration of how to use CallerInformation.  I created a console application with the following code. (Sincere apologies as to the format shown below.  I struggled for a while to remove the individual boxes around each block of code, but doing so only removed all the indentation, which made the code more difficult to read).

    using System.Runtime.CompilerServices;

    namespace CallerInfoSample
    {
    class Program
    {
    static void Main(string[] args)
    {
    string str = "";
    str = Func1();

    Console.WriteLine(str);
    Console.ReadLine();
    }

    static string Func1([CallerFilePath] string sFilePath = "", [CallerLineNumber] int iLineNbr = 0, [CallerMemberName] string sMemberName = "")
    {
    string s = "";
    s = "Before calling Func2, Func1 reports the line number where " + sMemberName + " called the current function is " + sFilePath + ", Line " + iLineNbr + ".";
    Console.WriteLine(s);
    Console.WriteLine("=====");

    bool b = Func2();

    s = "After calling Func2, Func1 reports the line number where " + sMemberName + " called the current function is still " + iLineNbr + " since we haven't finished executing Func1.";
    Console.WriteLine("Func2 returns " + b.ToString() + ".");
    Console.WriteLine(s);
    Console.WriteLine("=====");

    return "Done Processing";
    }

    static bool Func2([CallerFilePath] string sFilePath = "", [CallerLineNumber] int iLineNbr = 0, [CallerMemberName] string sMemberName = "")
    {
    string s = "";
    s = "Before calling Func3, Func2 reports the line number where " + sMemberName + " called the current function is " + sFilePath + ", Line " + iLineNbr + ".";
    Console.WriteLine(s);
    Console.WriteLine("=====");

    int j = 0;
    int i = Func3(j);

    return true;
    }

    static int Func3(int i, [CallerFilePath] string sFilePath = "", [CallerLineNumber] int iLineNbr = 0, [CallerMemberName] string sMemberName = "")
    {
    string s = "";
    s = "Before returning to Func2, Func3 reports the line number where " + sMemberName + " called the current function is " + sFilePath + ", Line " + iLineNbr + ".";
    Console.WriteLine(s);
    Console.WriteLine("=====");

    return (i + 1);
    }
    }
    }

    It results in the following output:

    Before calling Func2, Func1 reports the line number where Main called the current function is c:\Samples\CallerInfoSample\CallerInfoSample\Program.cs, Line 16.
    =====
    Before calling Func3, Func2 reports the line number where Func1 called the current function is c:\Samples\CallerInfoSample\CallerInfoSample\Program.cs, Line 29.
    =====
    Before returning to Func2, Func3 reports the line number where Func2 called the current function is c:\Samples\CallerInfoSample\CallerInfoSample\Program.cs, Line 47.
    =====
    Func2 returns True.
    After calling Func2, Func1 reports the line number where Main called the current function is still 16 since we haven't finished executing Func1.
    =====
    Done Processing
     

    For more information on Caller Information, see MSDN at http://msdn.microsoft.com/en-us/library/hh534540.aspx.

    NOTE: I could only make use of Caller information when I configured VS2012 to compile against 4.5 of the .NET Framework; the functionality isn't there in 4.0.

  • PFE - Developer Notes for the Field

    Exploratory Testing in Visual Studio 2012

    • 3 Comments

    With the recent release of Visual Studio 2012, there are many new features and updates to explore.  One of the new features is Exploratory Testing, which allows you to test your software without a pre-defined test or script.  It’s perfect for simply “exploring” your application to see what issues you may encounter.  Exploratory Testing comes with the Test Professional, Premium, and Ultimate editions of VS 2012.

    Technically, Exploratory Testing is part of Microsoft Test Manager (MTM).  In VS2010, one could achieve exploratory testing by filing an exploratory bug in Test Runner.  From there, users/testers could execute unscripted actions until a bug was found.  But in VS2012, exploratory testing is pulled into MTM along with your other tests, plans, suites, tracks, work items, etc. You can learn about some of the other new features in MTM 2012 in MSDN Magazine.  In order to use Exploratory Testing in MTM, you must be connected to Team Foundation Server (TFS) 2012.  If you don’t have access to a TFS 2012 instance, you can try out TFS in the cloud free, for a limited time, at http://tfspreview.com.  The instructions at that site are simple and include everything you need to create an account, project, upload source, automate builds, etc. 

    NOTE: At the time of this writing, no information was available as to how long tfspreview.com would be available (or at least, available for free).  While the end goal of TFS Azure is to have all the features of TFS, you may find that not all features of TFS - such as Reporting Services, AD Federation, etc. - are available at tfspreview.com.

     

    With exploratory testing, you can literally browse through your application at your leisure, until you find a bug or some other part of the software that should be considered for change – editing text in a label, background color change, adding or removing a feature, etc.  And the whole time you’re “exploring” the application, MTM records your actions.  By default, actions you execute in MTM or Office applications are not recorded, though you can change this in your test plan properties (Testing Center, Plan, Properties).  Exploratory testing in MTM allows you to do the following:

    • Pause and resume recording of your exploratory test case
    • Enter comments and screenshots for your exploratory test.  This can help clarify recorded activities in your test.
    • Create a bug
    • Generate a Test Case for the bug you just created

     

    Exploratory testing can be used on web applications, windows applications, and if it’s a client/server application, some steps on the server side can also be recorded.  More details on this, as well as the steps to run an exploratory test can be found at http://msdn.microsoft.com/en-us/library/tfs/hh191621(v=vs.110).aspx.  While I have no desire to copy what can already be found on MSDN, I also know that sometimes links change, content is moved, etc.  Therefore, I’ll repeat some of those steps in this blog entry.

    1. Open MTM.
      1. You can perform exploratory testing with no existing work items.  To do this, simply right-click a Test Plan or Test Suite and select Explore.

    b. If you do have an existing Work Item, you can perform exploratory testing on a specific work item by going to Test, Do Exploratory Testing.  Then right-click on your work item of choice and select Explore.

     

    2. When you select Explore, the Exploratory Testing window opens, waiting for you to Start your test.  Once clicked, MTM records your actions.

     

    3. You can pause or end the test, and this pauses or ends recording of your actions. At any time you can enter comments and/or add screen captures in the Exploratory Testing window.  Additionally, if you run into a bug you can click the Create Bug button during recording, and fill out details of your newfound bug.  When you save your new bug, you’ll have the option to create a Test Case for the bug you just created.

     

    These are just the steps to get you started, and there are many other options for saving Test Cases, assigning & editing bugs, and more.  Exploratory testing is a great option when you don’t have a defined test, a defined script as to how to navigate directly to a specific bug, or even when you don’t have a specific idea of what problem you seek to uncover in your application. 

  • PFE - Developer Notes for the Field

    Using DebugDiag's LeakTrack with ProcDump's Reflected Process Dumps

    • 1 Comments

    DebugDiag and Procdump are two of my favorite tools.  They're both incredibly useful and I rely on them heavily to gather MemoryDumps when debugging production issues.  They have a myriad of overlapping features, but they each have (at least) one feature that makes them stand apart from each other:

    • DebugDiag has the LeakTrack feature.  This lets users inject a module that tracks callstacks for allocations and stores them to heuristically identify the cause of native memory leaks in an application.  I use the feature whenever I have a native memory leak and find it indispensable.
    • ProcDump has the ability to reflect a process before creating a memory dump.  This substantially reduces the amount of time the process is interrupted when a memory dump is taken.  When dealing with production issues, this is an extremely efficacious feature.

    The catch is, DebugDiag currently can't create reflected process dumps, and ProcDump doesn't have a way to inject the LeakTrack dll to track allocations.  We can get around this by working with both tools.  Assuming both tools are installed on the server with a leaky application, the process is as follows:

    1. Identify the application that has a native memory leak: http://blogs.microsoft.co.il/blogs/sasha/archive/2008/07/13/is-it-a-managed-or-a-native-memory-leak.aspx
    2. Use DebugDiag to inject the LeakTrack dll and begin tracking allocations
    3. Set up ProcDump to create a dump of the reflected process when a memory threshold is breached

    Step 1 is easy and documented elsewhere.  Step 2 is easy if we have direct GUI access to the machine.  We can simply go to the Processes tab in DebugDiag, right click the process, and chose "Start Monitory for Leaks."  For the sake of argument, let's say we want to script the whole process.  We can do that by scripting DebugDiag and ProcDump to do the individual tasks we've set out for them.  Once we have the PID of the troubled process, we can use a script to inject the LeakTrack dll into the process.  The following vbs script will, when used by DebugDiag, start the native leak tracking process:

    Sub Debugger_OnInitialBreakpoint()
       Debugger.Write "Injecting LeakTrack" & vbLf
       Debugger.InjectLeakTrack
       Debugger.Write "Detaching2" & vbLf
       Debugger.Detach
    End Sub

    The above code can be saved into a vbs file.  For this post we'll assume it was saved to c:\scripts\InjectLeakTrack.vbs.  We'll also assume our PID was 1234

    With the PID known and the script created, we can launch DebugDiag from a command line as such (if you installed DebugDiag elsewhere, you'll have to adjust the install location): C:\PROGRA~1\DEBUGD~1\DbgHost.exe -script "c:\scripts\InjectLeakTrack.vbs" -attach 1234

    This will cause DbgHost, the debugger process DebugDiag uses, to attach to the process with PID 1234 and run the script we provided.  The script simply injects the LeakTrack dll and detaches the debugger.  Now that LeakTrack is injected, we can launch ProcDump and tell it to reflect the process and create a memory dump when we've crossed a memory threshold.  Let's say we know our application has leaked if it ever crosses 600MB.  We can kick off ProcDump as follows: procdump 1234 C:\Dumps\Over600.dmp -accepteula -r -m 600 -ma

    ProcDump will then monitor the process and create a reflected process dump when the committed bytes of the process breach 600 MB.  This memory dump can be analyzed by DebugDiag on the Advanced Analysis tab, and the offending callstacks can be identified.

     

    -Jarrod

     

     

  • PFE - Developer Notes for the Field

    The Art of Guessing

    • 2 Comments
    Often when debugging, we’re presented with a problem scenario where our application specific knowledge is limited.  This could mean we don’t have access to the source code or even that we do have access to the source code but simply don’t have time to read millions of lines of code to completely grasp the situation at hand.

    In this type of scenario, it’s generally not immediately clear what steps we should take for a resolution.  This means we often have to guess.  Guessing is somewhat of a misnomer in this situation.  I’m not advocating random guessing by any means, but rather taking a logical approach to guessing.  When selecting an action based on guess work, you should be able to explain each step along the way.  This means we need a way to guess well.

     The key to guessing well is leveraging what we do know about a system to in an attempt to make some logical assumptions to fill in the gaps of information that we don’t know.   These are still guesses, but they’re logical guesses based on reason and sound thinking.

    Below is a simple case study of a guess the number game written in Silverlight.  The goal of the case study is not to demonstrate how to hack a guess the number game, but rather to illustrate the reasoning behind the guesswork needed to beat the game without relying on…well, guessing. 

    The game works by prompting the user to guess a number and click a button.  If the user guesses incorrectly, they see the following screen:

    Our goal is to figure out what number to enter in order to guess correctly.  To start with, we want to aggregate what we know about the application:

    1. The user enters a number into a text box and clicks a button.
    2. The number is likely evaluated against something in the application and if they match, we expect to get a message indicating success.  Currently we are trying randomly and only getting failures.

    Next we have to figure out what we want to know.

    1. What happens if the user enters the correct value?
    2. How was the application created and where is it hosted?
    3. How can we figure out what the correct value is?

    With these questions, the logical first step is to talk to one of the developers that support the project. We’ll assume we got the following back:

    1. What happens if the user enters the correct value? The documentation says the label at the bottom should read, “You chose wisely.”
    2. How was the application created and where is it hosted? All I know is that it’s a Silverlight application that is viewed through the browser.
    3. How can we figure out what the correct value is?  I have no idea.  The code is all on Tommy’s computer and Tommy is on vacation in Jamaica.

    Ok, we have a couple of good answers now.  We’ve expanded what we know about the application.  Now it’s time to take some action.  Since the developer told us this is a Silverlight application hosted in a browser we also know that to debug this we can connect a debugger to the browser.

    So, we hook up to the browser, and now we have to decide what to look for.   We don’t know much about how the application works.  We don’t know where the application stores the correct value.  We don’t know enough about the application to know what method does the comparison of values.  We don’t even know if the comparison code is hosted locally or if a call is made to a server.

    That’s a lot of things we don’t know, so let’s go back to what we do know.  We know that to kick off the comparison operation the user has to click a button.  We also know this is a Silverlight application.  We can guess that the button will be a Silverlight button.

    This means we have a first step.  We can use the debugger to search for the button in the managed heap.

    First step is to load SOS for Silverlight:

    0:038> .loadby sos coreclr

    With SOS loaded, we can look on the managed heap for a button object.  We might not know going into to it exactly what type is needed to create a Silverlight button, but we can guess that it probably contains the word “Button.”

    0:038> !dumpheap -stat -type Button

    Statistics:

          MT    Count    TotalSize Class Name

    50878ec8        1           32 MS.Internal.FXCallbackDelegates+PasswordBox_AttachRevealButtonListenerDelegate

    50878c1c        1           32 MS.Internal.FXCallbackDelegates+TextBox_AttachDeleteButtonClickHandlerDelegate

    508ad1c0        1          128 System.Windows.Controls.Button

    Total 3 objects

     

    System.Windows.Controls.Button looks promising.  There is only 1 of them and it ‘feels’ right.  Next step is to find the address for that type:

    0:038> !dumpheap -mt 508ad1c0       

     Address       MT     Size

    0c862790 508ad1c0      128

    Now we have an address, so clearly the next step is to dump the object:

    0:038> !do 0c862790

    Name:        System.Windows.Controls.Button

    MethodTable: 508ad1c0

    EEClass:     504f3fd4

    Size:        128(0x80) bytes

    File:        C:\Program Files (x86)\Microsoft Silverlight\5.0.61118.0\System.Windows.dll

    Fields:

          MT    Field   Offset                 Type VT     Attr    Value Name

    50898494  4000499        4 ...eObjectSafeHandle  0 instance 0c86281c m_nativePtr

    508986f4  400049a        8 ... System.Windows]]  0 instance 0c862ad4 _valueTable

    50e36b40  400049b       24       System.Boolean  1 instance        0 _propagateInheritanceChanged

    50894fa4  400049c        c ....DependencyObject  0 instance 00000000 _inheritanceParent

    508989fc  400049d       10 ...reTypeEventHelper  0 instance 0c862810 _coreTypeEventHelper

    508777b8  400049e       14 ...angedEventHandler  0 instance 00000000 DPChanged

    50e139ec  400049f       18  System.EventHandler  0 instance 00000000 _inheritaneContextChanged

    50875ab4  40004a1       1c ...bject, mscorlib]]  0 instance 0c8628a8 _treeChildren

    508735b0  40004a2       20 ...rnal.IManagedPeer  0 instance 0c836118 _treeParent

    50e42e50  40004a0       10 ...flection.Assembly  0   shared   static _executingAssembly

    … 

    50e36b40  4001120       64       System.Boolean  1 instance        1 _isLoaded

    50e36b40  4001121       65       System.Boolean  1 instance        0 _isMouseCaptured

    50e36b40  4001122       66       System.Boolean  1 instance        0 _isSpaceOrEnterKeyDown

    50e36b40  4001123       67       System.Boolean  1 instance        0 _isMouseLeftButtonDown

    508aa2a4  4001124       6c System.Windows.Point  1 instance 0c8627fc _mousePosition

    5089acf4  4001125       58 ...mation.Storyboard  0 instance 00000000 _currentState

    50e36b40  4001126       68       System.Boolean  1 instance        0 _suspendStateChanges

    50e36b40  4001127       69       System.Boolean  1 instance        0 _isSuspendingIsEnabled

    5089eb88  4001128       5c ...tArgs, mscorlib]]  0 instance 00000000 _weakCanExecuteChangedListener

    50873a68  4001129       60 ...outedEventHandler  0 instance 0c862fb0 Click

    508987dc  400111a      e80 ...ependencyProperty  0   shared   static ClickModeProperty

        >> Domain:Value  0b0b0cd0:NotInit  0b0807c0:0c84dc6c <<

    508987dc  400111b      e84 ...ependencyProperty  0   shared   static IsFocusedProperty

        >> Domain:Value  0b0b0cd0:NotInit  0b0807c0:0c84dd4c <<

    508987dc  400111c      e88 ...ependencyProperty  0   shared   static IsMouseOverProperty

        >> Domain:Value  0b0b0cd0:NotInit  0b0807c0:0c84dd88 <<

    508987dc  400111d      e8c ...ependencyProperty  0   shared   static IsPressedProperty

        >> Domain:Value  0b0b0cd0:NotInit  0b0807c0:0c84ddf4 <<

    508987dc  400111e      e90 ...ependencyProperty  0   shared   static CommandProperty

        >> Domain:Value  0b0b0cd0:NotInit  0b0807c0:0c84df18 <<

    508987dc  400111f      e94 ...ependencyProperty  0   shared   static CommandParameterProperty

        >> Domain:Value  0b0b0cd0:NotInit  0b0807c0:0c84df78 <<

    5088fe30  400112a      e98 ...tArgs, mscorlib]]  0   shared   static CS$<>9__CachedAnonymousMethodDelegate5

        >> Domain:Value  0b0b0cd0:NotInit  0b0807c0:00000000 <<

     

    Some of the above output was trimmed for brevity.  We have some more information.  We might not have the internals of the System.Windows.Controls.Button memorized, but we do know enough to know that there is very likely a click event.  We can guess that event will have the word click in it.  Looking in the output above, we find this line:

    50873a68  4001129       60 ...outedEventHandler  0 instance 0c862fb0 Click

     

    Dumping this object, we find:

    0:038> !do 0c862fb0

    Name:        System.Windows.RoutedEventHandler

    MethodTable: 50873a68

    EEClass:     504ddb60

    Size:        32(0x20) bytes

    File:        C:\Program Files (x86)\Microsoft Silverlight\5.0.61118.0\System.Windows.dll

    Fields:

          MT    Field   Offset                 Type VT     Attr    Value Name

    50e36684  400002d        4        System.Object  0 instance 0c8163e8 _target

    50e36684  400002e        8        System.Object  0 instance 00000000 _methodBase

    50e3dfd8  400002f        c        System.IntPtr  1 instance  803c430 _methodPtr

    50e3dfd8  4000030       10        System.IntPtr  1 instance        0 _methodPtrAux

    50e36684  4000031       14        System.Object  0 instance 00000000 _invocationList

    50e3dfd8  4000032       18        System.IntPtr  1 instance        0 _invocationCount

     

    Again, we have an object whose internals may not be familiar to us, so we have to guess.   _target and _methodPtr both seem like good guesses.  We’ll start by dumping _target:

    0:038> !do 0c8163e8

    Name:        GuessTheNumber.MainPage

    MethodTable: 080356b8

    EEClass:     0835374c

    Size:        120(0x78) bytes

    File:        GuessTheNumber, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null

    Fields:

          MT    Field   Offset                 Type VT     Attr    Value Name

    50898494  4000499        4 ...eObjectSafeHandle  0 instance 0c81646c m_nativePtr

    508986f4  400049a        8 ... System.Windows]]  0 instance 00000000 _valueTable

    50e36b40  400049b       24       System.Boolean  1 instance        0 _propagateInheritanceChanged

    50894fa4  400049c        c ....DependencyObject  0 instance 00000000 _inheritanceParent

    508989fc  400049d       10 ...reTypeEventHelper  0 instance 0c816460 _coreTypeEventHelper

    508777b8  400049e       14 ...angedEventHandler  0 instance 00000000 DPChanged

    50e139ec  400049f       18  System.EventHandler  0 instance 00000000 _inheritaneContextChanged

    50875ab4  40004a1       1c ...bject, mscorlib]]  0 instance 0c8637c8 _treeChildren

    508735b0  40004a2       20 ...rnal.IManagedPeer  0 instance 00000000 _treeParent

    50e42e50  40004a0       10 ...flection.Assembly  0   shared   static _executingAssembly

    08035848  4000002       54 ...er.MyRandomObject  0 instance 0c8164a8 randomObject

    5089f2f4  4000003       58 ...ows.Controls.Grid  0 instance 0c836118 LayoutRoot

    08035b18  4000004       5c ...ws.Controls.Label  0 instance 0c81dd78 label1

    08035b18  4000005       60 ...ws.Controls.Label  0 instance 0c836730 label2

    50897f28  4000006       64 ....Controls.TextBox  0 instance 0c83779c guessTextBox

    508ad1c0  4000007       68 ...s.Controls.Button  0 instance 0c862790 submitButton

    08035b18  4000008       6c ...ws.Controls.Label  0 instance 0c863124 resultLabel

    50e36b40  4000009       70       System.Boolean  1 instance        1 _contentLoaded

     

    The abbreviated content above looks promising.  The controls listed as members of the object look promising.  We still don’t have a click event though.  We’re left guessing one more time.  This time, we’ll guess that the click method is associated with the MethodTable for GuessTheNumber.MainPage. Based on this guess, we’ll dump the method descriptors for the MethodTable:

    0:038> !dumpmt -md 080356b8

    EEClass:         0835374c

    Module:          080346e4

    Name:            GuessTheNumber.MainPage

    mdToken:         02000003

    File:            GuessTheNumber, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null

    BaseSize:        0x78

    ComponentSize:   0x0

    Slots in VTable: 63

    Number of IFaces in IFaceMap: 4

    --------------------------------------

    MethodDesc Table

       Entry MethodDe    JIT Name

    50c32330 50a1b150 PreJIT System.Object.ToString()

    50c32350 50a1b158 PreJIT System.Object.Equals(System.Object)

    50c323c0 50a1b178 PreJIT System.Object.GetHashCode()

    50c323d0 50a1b190 PreJIT System.Object.Finalize()

    5074cd34 50508d44 PreJIT

    50785800 50514cbc PreJIT System.Windows.Controls.Control.OnManipulationCompleted(System.Windows.Input.ManipulationCompletedEventArgs)

    50785828 50514cc4 PreJIT System.Windows.Controls.Control.OnTap(System.Windows.Input.GestureEventArgs)

    50785850 50514ccc PreJIT System.Windows.Controls.Control.OnDoubleTap(System.Windows.Input.GestureEventArgs)

    50785878 50514cd4 PreJIT System.Windows.Controls.Control.OnHold(System.Windows.Input.GestureEventArgs)

    08370240 0803566c    JIT GuessTheNumber.MainPage..ctor()

    0803c071 08035674   NONE GuessTheNumber.MainPage.submitButton_Click(System.Object, System.Windows.RoutedEventArgs)

    083703d0 08035680    JIT GuessTheNumber.MainPage.InitializeComponent() 

    Looks like we’re on to something.  The submitButton_Click event looks promising.  We can guess that is the method we’ve been looking for and dump out the IL:

    0:038> !dumpil 08035674  

    ilAddr = 083302c8

    IL_0000: nop

    IL_0001: ldarg.0

    IL_0002: ldfld GuessTheNumber.MainPage::guessTextBox

    IL_0007: callvirt System.Windows.Controls.TextBox::get_Text

    IL_000c: ldloca.s VAR OR ARG 0

    IL_000e: call System.Int32::TryParse

    IL_0013: pop

    IL_0014: ldloc.0

    IL_0015: ldarg.0

    IL_0016: ldfld GuessTheNumber.MainPage::randomObject

    IL_001b: callvirt GuessTheNumber.MyRandomObject::get_randomValue

    IL_0020: ceq

    IL_0022: ldc.i4.0

    IL_0023: ceq

    IL_0025: stloc.1

    IL_0026: ldloc.1

    IL_0027: brtrue.s IL_003c

    IL_0029: ldarg.0

    IL_002a: ldfld GuessTheNumber.MainPage::resultLabel

    IL_002f: ldstr "You chose wisely."

    IL_0034: callvirt System.Windows.Controls.ContentControl::set_Content

    IL_0039: nop

    IL_003a: br.s IL_0058

    IL_003c: ldarg.0

    IL_003d: ldfld GuessTheNumber.MainPage::resultLabel

    IL_0042: ldstr "Your guess of {0} was incorrect."

    IL_0047: ldloc.0

    IL_0048: box System.Int32

    IL_004d: call System.String::Format

    IL_0052: callvirt System.Windows.Controls.ContentControl::set_Content

    IL_0057: nop

    IL_0058: ret

    IL isn’t always easy to follow without practice, but the above makes it clear we are in the right place since we can see both, “You chose wisely,” and “Your guess of {0} was incorrect.” To make the code easier to read, we can extract the module and decompile the IL into C#

    0:038> !dumpmodule 080346e4

    Name:       GuessTheNumber, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null

    Attributes: PEFile

    Assembly:   0b0c2310

    LoaderHeap:              00000000

    TypeDefToMethodTableMap: 08030124

    TypeRefToMethodTableMap: 0803013c

    MethodDefToDescMap:      0803020c

    FieldDefToDescMap:       08030248

    MemberRefToDescMap:      0803027c

    FileReferencesMap:       0803035c

    AssemblyReferencesMap:   08030360

    MetaData start address:  08330d00 (4512 bytes)

    0:038> lmm GuessTheNumber

    start    end        module name

    08330000 08332600   GuessTheNumber C (no symbols)          

    0:038> !savemodule 08330000 d:\GuessTheNumberModule.dll

    3 sections in file

    section 0 - VA=2000, VASize=1da4, FileAddr=200, FileSize=1e00

    section 1 - VA=4000, VASize=370, FileAddr=2000, FileSize=400

    section 2 - VA=6000, VASize=c, FileAddr=2400, FileSize=200

     

    Using a decompiler such as Redgate’s reflector, we see the following C#:

    private void submitButton_Click(object sender, RoutedEventArgs e)

    {

        int num;

        int.TryParse(this.guessTextBox.get_Text(), out num);

        if (num == this.randomObject.randomValue)

        {

            this.resultLabel.set_Content("You chose wisely.");

        }

        else

        {

            this.resultLabel.set_Content(string.Format("Your guess of {0} was incorrect.", num));

        }

    }

    Now it’s clear what’s going on here.  The value entered by the user into the text box is parsed into an int then compared to another int stored in this.randomObject.randomValue.  All that’s left for us to do is to find the current value for randomValue.  We know that in this case, this refers to a MainPage object.  Interestingly, that’s the exact type we dumped previously as the _target from the event handler that we found.  Reviewing our previous output, we find:

    08035848  4000002       54 ...er.MyRandomObject  0 instance 0c8164a8 randomObject

    Dumping the object we see:

    0:038> !do 0c8164a8

    Name:        GuessTheNumber.MyRandomObject

    MethodTable: 08035848

    EEClass:     08353800

    Size:        12(0xc) bytes

    File:        GuessTheNumber, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null

    Fields:

          MT    Field   Offset                 Type VT     Attr    Value Name

    50e371d4  4000001        4         System.Int32  1 instance      174 <randomValue>k__BackingField

    We see a member variable named randomValue which is exactly what we were looking for.  Using the value current value of 174, we’ll try another guess in the application:

     

    Good guess.

    To recap, we were tasked with troubleshooting an application that we knew very little about.  We asked a few questions, used a bit of prior knowledge, and a handful of reason driven guesses to get us to a solution.

    Although it’s unlikely you will ever need to cheat at a Guess the Number game in a production environment, the approach here is applicable to any situation in which you are approaching with limited domain specific knowledge.

    Guess wisely my friends.

     

    -Jarrod

     

  • PFE - Developer Notes for the Field

    PInvoke Error in .NET 4: Array size control parameter index is out of range

    So in a code-base I was working in yesterday, we use PInvoke to call out to the Performance Data Helper (PDH) API’s to collect performance information for machines without using Perfmon.  One of those PInvoke calls looked like this: /* PDH_STATUS...
  • PFE - Developer Notes for the Field

    Starting a new Chapter in the Career of Mike Huguet

    A couple of months ago an opportunity unexpectedly made its way into my life that would offer something different for my career.  At first I ignored it, but after a while I started thinking about the possibilities before me.  There were a couple of job...
  • PFE - Developer Notes for the Field

    Research for the Week-Memory Management Part 1

    Do you know what happens when you create a new instance of a class?  Why do we need a value type and a reference type?  Do you know what is meant when the .NET framework is referred to as a “managed” platform?  What is managed versus unmanaged code? ...
  • PFE - Developer Notes for the Field

    Research for the Week–How IIS Works

    In the world of custom web application development a critical component of our success is the hosting environment and services.  We can choose to deploy our web applications using 3rd party hosting providers such as GoDaddy.com, Rackspace, DiscountASP...
  • PFE - Developer Notes for the Field

    Getting more information from the Word error box when troubleshooting OpenXML / WordML issues

    So, many apologies for dropping off the face of the blogosphere lately.  Fortunately (or unfortunately, depending on your perspective), I’ve been really busy at work.  I’ve been working on some really cool things that I hope I’ll be able to...
  • PFE - Developer Notes for the Field

    Upcoming Speaking Schedule for 2011 Q2

    I have stepped it up a notch and am speaking at numerous user groups and technical conferences in the South East in the coming months.  These include .NET & SharePoint user groups as well as SQL Saturday and SharePoint Saturday all day conferences...
  • PFE - Developer Notes for the Field

    Silverlight Error - “Failed to create a system.double from text ‘xx’”

    I encountered the error “Failed to create a system.double from text ‘50’” when trying to develop a SketchFlow project.  After doing some digging I found that it was just that I had a space after my number “50.”  After removing the space, SketchFlow was...
  • PFE - Developer Notes for the Field

    SketchFlow Build Error - “The Name ‘InitializeComponent’ does not exist in the current context”

    This is my first real go round with Expression Blend and SketchFlow and it is taking quite a while to get used to it.  After several iterations of names and screen dragging, dropping in the SketchFlow Map, and adding a few elements to some screens, I...
  • PFE - Developer Notes for the Field

    Visual Studio 2010 Type or namespace ‘xyz’ does not exist in…

    It pains me to write this post as I feel like an idiot for having wasted my time on this “problem.”  Hopefully in posting this, I can keep some other poor lost soul working at 4 AM in the morning from spending wasteful minutes scratching his head and...
  • PFE - Developer Notes for the Field

    The (potentially) dark side of parallelism…

    Read a great blog entry by Scott Hanselman recently talking about the parallel dilemma that I’m sure we’ll see folks face in the future with the (old/new) Parallel classes.  I wanted to add a few things to this discussion as he focused on the...
  • PFE - Developer Notes for the Field

    Yield Return…a little known but incredible language feature

    “We shall neither fail nor falter; we shall not weaken or tire…give us the tools and we will finish the job.” – Winston Churchill I don’t often blog about specific language features but over the past few weeks I’ve spoken to a few folks that...
  • PFE - Developer Notes for the Field

    Linking your pivot collections the fun and easy way

    In my previous post , I added to my series of entries on making sense of your ASP.NET event log error messages.  Note that this is entry #4 in this series.  The previous three entries can be found here: Part 1:  Parsing ASP...
  • PFE - Developer Notes for the Field

    Pivoting ASP.NET event log error messages

    Unless you’ve been hiding under the proverbial rock, you’ve probably seen the recent Pivot hoopla .  If you’re not familiar with it, it’s a way to visualize a large amount of data in a nice filterable format.  The nice thing about it is that...
  • PFE - Developer Notes for the Field

    Visualizing Azure performance counters…

    I’ve blogged before about how some things are different in the cloud, namely Azure .  That post dealt with finding out why your Azure Role crashed by using some of the logging facilities and storage available in Azure.  That’s kind of a worst...
  • PFE - Developer Notes for the Field

    Don’t guess when it comes to performance…a RegEx story.

    Many times, when I work with a customer, it’s because they’ve tried to accomplish something and need a little extra help.  Often, this falls into the application optimization area.  For example, a few years ago, I had a customer that was...
  • PFE - Developer Notes for the Field

    Optimizing your life: E-Mail Edition

    As with most people, I hate to do the same thing more than once.  In-fact, I had a job in college where I audited expense reports and then had to enter them into an application so they could be paid.  The auditing of the expense report was...
  • PFE - Developer Notes for the Field

    Exposing any class as a WCF endpoint…

    Often, I’ll have a customer that wants to utilize a new feature in the .NET framework but they don’t necessarily know where to start or they have a large investment in another area and they’re afraid of the amount of rework required to get something...
  • PFE - Developer Notes for the Field

    Parsing ASP.NET event log error messages for fun and profit

    Sometimes a customer will ask me to look at their site and make some recommendations on what can be improved.  One of the many things I’ll look at is their event logs.  One of the nice things about ASP.NET is that when you encounter an unhandled...
Page 1 of 4 (76 items) 1234