Today's journey debugging into DCOM process creation code
I love the feeling of euphoria after you finish debugging a difficult problem. My difficult problem of the day was figuring out why one of the test suites that I own was failing. The test works by running a macro in Visual Studio. The problem was that my macro was failing to read an environment variable. Here is how I debugged it:
Step #1: Determine what process macros run in. To figure this out, I put a message box in my macro, and saw what process the message box came from. It turns out that the answer to this is vsmsvr.exe.
Step #2: Check the process environment of that process. To do this, I attached a debugger, did a Debug->Break. Then I went to the immediate window, and evaluated '@env=0'. This uses a new Whidbey feature of dumping the process environment.
Step #3: Determine how the process is being created. For this I ran ‘tlist.exe -t' this prints out a tree of the current processes. I found vsmsvr.exe, and went up one level. It turns out that its parent process is winlogon.exe.
Step #4: Look at the process creation code in winlogon. I attached a debugger to winlogon. I have never had a chance to actually debug winlogon before, so this was something of a thrill. I then set a breakpoint on kernel32!CreateProcessW. I ran my scenario again, but my breakpoint wasn’t hit. So I deleted my breakpoint, and set breakpoints on the process creation functions from ntdll instead -- {,,ntdll}_NtCreateProcessEx@36 and {,,ntdll}_ZwCreateProcess@32. Then I reran my scenario again. It turns out that this code works by calling CreateProcessAsUser:
ntdll.dll!_ZwCreateProcessEx@36()
kernel32.dll!CreateProcessInternalW()
advapi32.dll!CreateProcessAsUserW()
wlnotify.dll!ProcessExecRequest()
wlnotify.dll!ExecServerThread()
kernel32.dll!BaseThreadStart()
Step #5: Look at the arguments to CreateProcessAsUser. I was debugging on the machine that worked, so I noticed that the environment was being passed along. I looked into the code for wlnotify.dll!ProcessExecRequest, and noticed that it was waiting on a named pipe and calling process created based on a request from the pipe. Therefore, I wanted to know who was sending the request.
Step #6: Find client of winlogon request. I started by using procexp.exe from www.sysinternals.com. I knew the handle of the pipe, so using that I found its name in procexp.exe. But that turned out to be unhelpful. Next I looked at some of the other local variables with the locals window (you need full symbols for this one). I noticed that one was a handle to a process. I looked up that handle in procexp.exe. It was the svchost.exe process that was hosting the RPC service.
Step #7: Debug into svchost.exe. I then attached a debugger to svchost.exe. I found the bellow thread that was clearly making the request.
ntdll.dll!_ZwReadFile@36()
kernel32.dll!ReadFile()
rpcss.dll!CreateRemoteSessionProcess()
rpcss.dll!CClsidData::LaunchActivatorServer()
rpcss.dll!CServerTableEntry::StartServerAndWait()
rpcss.dll!Activation()
rpcss.dll!ActivateFromProperties()
rpcss.dll!CScmActivator::CreateInstance()
rpcss.dll!ActivationPropertiesIn::DelegateCreateInstance()
rpcss.dll!ActivateFromPropertiesPreamble()
rpcss.dll!PerformScmStage()
rpcss.dll!SCMActivatorCreateInstance()
rpcrt4.dll!_Invoke@12()
Step #8: Try and figure out where the environment was coming from. I looked at the bottom frame (SCMActivatorCreateInstance). It turns out that this frame took the process environment as a member of a structure.
Step #9: Jump to a conclusion that turned out to be correct. At this point I made a guess. I knew that NT didn’t like it if the process environment got too big. I also knew that the team that runs the lab used big configuration scripts that puts a lot of stuff in the process environment. So I tried removing some of the crap, and it turns out that I was correct.
Lessons learned:
- Don’t fill your process environment with too much stuff
- DCOM appids that are set to run as ‘launching user’ do get the process environment of their parent process