Welcome to MSDN Blogs Sign in | Join | Help

ThreadPool.BindHandle

I mentioned that we can use ThreadPool.BindHandle to implement asynchronous IO. Here are roughly the steps necessary to make it happen:

1.       Create an overlapped file handle

            SafeFileHandle handle = CreateFile(

                                filename,

                                Win32.GENERIC_READ_ACCESS,

                                Win32.FILE_SHARE_READ | Win32.FILE_SHARE_WRITE | Win32.FILE_SHARE_DELETE,

                                (IntPtr)null,

                                Win32.OPEN_EXISTING,

                                Win32.FILE_FLAG_OVERLAPPED,

                                new SafeFileHandle(IntPtr.Zero, false));

        [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]

        private static extern SafeFileHandle CreateFile(

           string lpFileName,

           uint dwDesiredAccess,

           uint dwShareMode,

            //SECURITY_ATTRIBUTES lpSecurityAttributes,

           IntPtr lpSecurityAttributes,

           uint dwCreationDisposition,

           int dwFlagsAndAttributes,

           SafeFileHandle hTemplateFile);

2.       Bind the handle to thread pool.

            if (!ThreadPool.BindHandle(handle))

            {

                Console.WriteLine("Fail to BindHandle to threadpool.");

                return;

        }

3.       Prepare your asynchronous IO callback.

                byte[] bytes = new byte[0x8000];

 

                IOCompletionCallback iocomplete = delegate(uint errorCode, uint numBytes, NativeOverlapped* _overlapped)

                {

                    unsafe

                    {

                        try

                        {

                            if (errorCode == Win32.ERROR_HANDLE_EOF)

                                Console.WriteLine("End of file in callback.");

 

                            if (errorCode != 0 && numBytes != 0)

                            {

                                Console.WriteLine("Error {0} when reading file.", errorCode);

                            }

                            Console.WriteLine("Read {0} bytes.", numBytes);

                        }

                        finally

                        {

                            Overlapped.Free(pOverlapped);

                        }

                    }

                };   

 

4.       Create a NativeOverlapped* pointer.

                    Overlapped overlapped = new Overlapped();

 

                    NativeOverlapped* pOverlapped = overlapped.Pack(iocomplete, bytes);

 

                pOverlapped->OffsetLow = (int)offset;

5.       Call the asynchronous IO API and pass the NativeOverlapped * to it.

                    fixed (byte* p = bytes)

                    {

                        r = ReadFile(handle, p, bytes.Length, IntPtr.Zero, pOverlapped);

                        if (r == 0)

                        {

                            r = Marshal.GetLastWin32Error();

                            if (r == Win32.ERROR_HANDLE_EOF)

                            {

                                Console.WriteLine("Done.");

                                break;

                            }

 

                            if (r != Win32.ERROR_IO_PENDING)

                            {

                                Console.WriteLine("Failed to read file. LastError is {0}", Marshal.GetLastWin32Error());

                                Overlapped.Free(pOverlapped);

                                return;

                            }

                        }

                    }

 

        [DllImport("KERNEL32.dll", SetLastError = true)]

        unsafe internal static extern int ReadFile(

            SafeFileHandle handle,

            byte* bytes,

            int numBytesToRead,

            IntPtr numBytesRead_mustBeZero,

            NativeOverlapped* overlapped);

 

Your IO callback will be invoked by CLR thread when the IO completed.

 

So when should you use ThreadPool.BindHandle? The answer is almost *Never*. .Net Framework's FileStream class internally uses ThreadPool.BindHandle to implement the async IO. You should always use FileStream if possible.

Posted by junfeng | 2 Comments
Filed under:

ThreadPool.UnsafeQueueNativeOverlapped

CLR’s thread pool has two pools of threads. The first pool is used by ThreadPool.QueueUserWorkItem. The second pool is an IoCompletionPort thread pool used by ThreadPool.BindHandle and ThreadPool.UnsafeQueueNativeOverlapped.

ThreadPool.BindHandle is used by CLR to implement asynchronous IO. For example, FileStream uses it to implement BeginRead/BeginWrite. Developers can take advantage of it too. We will talk about that in a separate article.

ThreadPool.UnsafeQueueNativeOverlapped can be used to queue a non IO work item to the IoCompletionPort thread pool, just like ThreadPool.QueueUserWorkItem.

Why will you want to use ThreadPool.UnsafeQueueNativeOverlapped instead of ThreadPool.QueueUserWorkItem?

In our development, we discover an inefficiency of ThreadPool.QueueUserWorkItem. If we have some alternate high and low number of work items, some of the threads may do busy waiting, artificially increase the CPU usage of our application.

If you have the same pattern, and you have observed high CPU usage when it should not, you can try ThreadPool.UnsafeQueueNativeOverlapped.

The following is an example how to ThreadPool.UnsafeQueueNativeOverlapped.

using System;

using System.Runtime.InteropServices;

using System.Threading;

 

namespace UQNO

{

    internal class AsyncHelper

    {

        WaitCallback callback;

        object state;

 

        internal AsyncHelper(WaitCallback callback, object state)

        {

            this.callback = callback;

            this.state = state;

        }

 

        unsafe internal void Callback(uint errorCode, uint numBytes, NativeOverlapped* _overlapped)

        {

            try

            {

                this.callback(this.state);

            }

            finally

            {

                Overlapped.Free(_overlapped);

            }

        }

    }

 

    class Program

    {

        static void Main(string[] args)

        {

            ManualResetEvent wait = new ManualResetEvent(false);

 

            WaitCallback callback = delegate(object state)

            {

                Console.WriteLine("callback is executed in thread id {0} name {1}", Thread.CurrentThread.ManagedThreadId, Thread.CurrentThread.Name);

                ManualResetEvent _wait = (ManualResetEvent)state;

                _wait.Set();

            };

 

            AsyncHelper ah = new AsyncHelper(callback, wait);

 

            unsafe

            {

                Overlapped overlapped = new Overlapped();

                NativeOverlapped* pOverlapped = overlapped.Pack(ah.Callback, null);

                ThreadPool.UnsafeQueueNativeOverlapped(pOverlapped);

                wait.WaitOne();

            }

        }

    }

}

Posted by junfeng | 3 Comments
Filed under:

Conversion between System.String and char *

We can convert a char * to System.String with System.String’s constructor

string str = new string((char*)p);

And for the reverse:

fixed(char *p = str){}

Why do we care about conversion between System.String and char *?

From this article, this is the fastest way to marshal strings between managed and native boundary.

Posted by junfeng | 3 Comments
Filed under:

XXX is not a valid Win32 application

If you have used your Vista SP1-based computer for extended period, you may experience some problems starting large applications, for example, Office 2007 applications. Specifically, you may receive a message “XXX is not a valid Win32 application”.

If you do experience this problem, you can install hotfix KB952709.

http://support.microsoft.com/kb/952709/

Posted by junfeng | 1 Comments
Filed under:

Managed ThreadPool vs Win32 ThreadPool (pre-Vista)

The following is a conversation between me and a CLR dev. The conversation is very informative so I quote it here.

From: 
Sent:
To:
Subject: RE: ThreadPool.QueueUserWorkItem

 

There might be some confusion here around the meaning of the term "I/O Thread."  In the Windows thread pool (the old one, not the new Vista thread pool), an "I/O thread" is one that processes APCs queued by other threads, or by I/O initiated from the I/O threads.  The "non-I/O" threads get their work from a completion port, either as a result of QueueUserWorkItem, or I/O initiated on a handle bound to the threadpool with BinIoCompletionCallback.  So they are both geared toward processing I/O completions, but they just use different mechanisms.

 

In the managed ThreadPool, we use the terms "worker thread" and "I/O thread."  In our case, an I/O thread is one that waits on a completion port; i.e., it's exactly equivalent to Windows' non-I/O thread.  How confusing!  Our "worker threads" wait on a simple user-space work queue, and never enter an alertable state (unless user code does so), and so explicitly do not process APCs.  Managed "worker threads" have no equivalent in the Windows thread pool, just as Windows "I/O threads" have no managed equivalent.

 

The managed QueueUserWorkItem queues work to the "worker threads" only.  UnsafeQueueNativeOverlapped queues to the I/O threads, as do completions on handles that have been bound to the ThreadPool via BindHandle.

 

Why don't we support APCs as a completion mechanism?  APCs are really not a good general-purpose completion mechanism for user code.  Managing the reentrancy introduced by APCs is nearly impossible; any time you block on a lock, for example, some arbitrary I/O completion might take over your thread.  And they don't scale well, except in certain very constrained scenarios, because there's no load-balancing of completions across threads.  You can, of course, implement your own load balancing, but you'll never do better in user-space than the kernel does with completion ports.  So we provide a rich async I/O infrastructure based on completion ports, and nothing else.

 

From: 
Sent:
To:
Subject: RE: ThreadPool.QueueUserWorkItem

 

With ThreadPool.QueueUserWorkItem, the callback is not called on an I/O thread. You might be looking for ThreadPool.UnsafeQueueNativeOverlapped.

http://msdn.microsoft.com/en-us/library/system.threading.threadpool.unsafequeuenativeoverlapped.aspx


From: Junfeng Zhang
Sent:
To:
Subject: ThreadPool.QueueUserWorkItem

The native kernel32 QueueuserWorkItem has a flag to indicate to schedule the callback in I/O thread or not. The flag is missing in the managed ThreadPool.QueueUserWorkItem.

 

Why is so? Is it because the callback is always scheduled on a I/O thread?

 

 

Posted by junfeng | 5 Comments
Filed under:

Managed Watson Dump

From .Net Framework 2.0, Dr. Watson is able to generate dump compatible with .Net framework. This means, dumps with heap data generated by Dr. Watson contains information about managed heap so they can be analyzed with sos.dll.

Some update to Dr.Watson is necessary to make this happen. Dumps generated by older Dr.Watson is not compatible with .Net framework.

To see if a dump is compatible with .Net framework, make sure the mini dump's version is equal or higher than 6400.

 

0:003> .dumpdebug

----- User Mini Dump Analysis

 

MINIDUMP_HEADER:

Version         A793 (6407)

NumberOfStreams 8

Flags           120

                0020 MiniDumpWithUnloadedModules

                0100 MiniDumpWithProcessThreadData

 

Posted by junfeng | 1 Comments
Filed under:

Inspect a 32 bit Process Dump Generated by a 64 bit Debugger

64 bit Windows can run both 32 bit process and 64 bit process. For debugging though, you want to use 32 bit debugger to debug 32 bit process, and 64 bit debugger for 64 bit process. Otherwise it won’t be pretty.

Occasionally, I receive a 32 bit process dump generated by a 64 bit debugger.

When load some dump in the debugger, this is how it looks like:

0:020> .sympath SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols

  20  Id: 1994.15f4 Suspend: 1 Teb: 00000000`7efa4000 Unfrozen
RetAddr           : Args to Child                                                           : Call Site
00000000`78b84191 : 00000023`7d61c918 00000000`00000023 00000000`00000202 00000000`0518fffc : wow64cpu!CpupSyscallStub+0x9
00000000`6b006a5a : 00000000`00000003 00000000`00000000 00000000`00000000 00000000`051cf7d0 : wow64cpu!Thunk2ArgNSpNSpReloadState+0x21
00000000`6b005e0d : 00000000`051cfd00 00000000`051cf1d0 00000000`051cf7d0 00000000`00000000 : wow64!RunCpuSimulation+0xa
00000000`77f109f0 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`7efdf000 : wow64!Wow64LdrpInitialize+0x2ed
00000000`77ef30a5 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!LdrpInitialize+0x2aa
00000000`7d4d1504 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!KiUserApcDispatcher+0x15
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadStartThunk
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 0001002f`00000000 : 0x0
00000000`00000000 : 00000000`00000000 00000000`00000000 0001002f`00000000 00000000`00000000 : 0x0
00000000`00000000 : 00000000`00000000 0001002f`00000000 00000000`00000000 00000000`00000000 : 0x0
00000000`00000000 : 0001002f`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
0001002f`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x1002f`00000000
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0

I do not know if you can make sense of it. I cannot.

Fortunately, Windows debugger allows you to change the target processor type. Here is how it works:

0:020> .effmach x86
Effective machine: x86 compatible (x86)
0:020:x86> .load wow64exts

  20  Id: 1994.15f4 Suspend: 1 Teb: 00000000`7efa4000 Unfrozen
ChildEBP          RetAddr           Args to Child                                        
0518ff0c 7d4d0ec5 00000000 0518ff4c 7d4d8bfa ntdll_7d600000!ZwDelayExecution+0x15
0518ff74 7d4d14ef 0000ea60 00000000 0518ffac kernel32!SleepEx+0x68
0518ff84 776bbb0f 0000ea60 083e6ca8 776bbab4 kernel32!Sleep+0xf
0518ff90 776bbab4 00000000 00000000 083e6ca8 ole32!CROIDTable::WorkerThreadLoop+0x14
0518ffac 776b1704 00000000 0518ffec 7d4dfe21 ole32!CRpcThread::WorkerLoop+0x26
0518ffb8 7d4dfe21 083e6ca8 00000000 00000000 ole32!CRpcThreadCache::RpcWorkerThreadEntry+0x20
0518ffec 00000000 776b16e4 083e6ca8 00000000 kernel32!BaseThreadStart+0x34

Much better, isn’t it?

For best result though, please use 32 bit debugger to generate the dump for a 32 bit process.

Posted by junfeng | 2 Comments
Filed under:

Getting a Full Memory Dump for a Process

To diagnose a problem for a remote customer, sometimes the easiest way is to have the customer generate a full memory dump for the process, and share the memory dump.

In Vista, task manager can generate a full memory dump from the Processes tab.

image

In Windows XP , this functionality does not exist. However, Windows XP ships a debugger ntsd.exe in the box. We can use it to generate a full memory dump.

C:\temp>ntsd -p 316
...
0:002> .dump /f c:\temp\foo.dmp
Creating c:\temp\foo.dmp - user full dump
0:002>

Posted by junfeng | 0 Comments
Filed under:

Event Handles “leak”

On our stress run, we saw our process’ handle count steadily increases until certain point, then it stabilizes. However the number of handles is high. Most of those handles are Event handles. We are concerned about it. So we went off and did some investigation.

Turns out the Event handles are coming from the use of Monitor.

When there is contention on the lock object, CLR internally creates an Event handle, presumably to facilitate the thread scheduling. The event handle is not cleaned up until the object is garbage collected.

It appears we were using Monitor in a lot of places, and we had lock contentions, which triggers CLR to allocate a lot of Event handles.

So if you have a lot of long lived objects, be careful about the usage of Monitor.

Posted by junfeng | 3 Comments
Filed under: ,

Use !htrace to debug handle leak

Windbg Debugger’s !htrace extension is very handy to debug handle leak.

The process essentially boils down to the following simple steps:

1.       Enable trace

2.       Take a snapshot

3.       Run scenario

4.       Show the diff

On step 4, !htrace will show all the extra opened handles after the last snapshot, along with the callstack if available. This greatly helps to debug what handles are leak, and by whom.

Like any other resource leak detection tool, there will be false positives. You need to understand what is a real leak, and what is just a transient allocation.

 

 

!htrace

The !htrace extension displays stack trace information for one or more handles.

Syntax

User-Mode Syntax

!htrace [Handle [Max_Traces]] 
!htrace -enable [Max_Traces]
!htrace -snapshot
!htrace -diff
!htrace -disable
!htrace -? 

Kernel-Mode Syntax

!htrace [Handle [Process [Max_Traces]]] 
!htrace -? 

Parameters

Handle

Specifies the handle whose stack trace will be displayed. If Handle is 0 or omitted, stack traces for all handles in the process will be displayed.

Process

(Kernel mode only) Specifies the process whose handles will be displayed. If Process is 0 or omitted, then the current process is used. In user mode, the current process is always used.

Max_Traces

Specifies the maximum number of stack traces to display. In user mode, if this parameter is omitted, then all the stack traces for the target process will be displayed.

-enable

(User mode only) Enables handle tracing and takes the first snapshot of the handle information to use as the initial state by the -diff option.

-snapshot

(User mode only) Takes a snapshot of the current handle information to use as the initial state by the -diff option.

-diff