By letting the operating system to the heavy lifting! But first, what inspired this article?

I was doing some work, and I come across a class called IOThreadPoolTimer or something. It was a fancy timer, where they were going to all sorts of trouble to try and make their timer super light-weight, without any memory leaks, without any thread leaks, and so on etc, and apparently they achieve this by running on the IO thread pool… Well anyway, it seemed very complicated, and a lot of code, and involved a TimerManager, and two separate timer types, and a custom queue class, oh my goodness, all sorts of perf optimizations.

My immediate reaction to seeing that is to think, if you really want to optimize perf, wouldn’t the kernel developers, who are used to doing very clever things, do a much better job of writing a timer queue, and wouldn’t the performant thing to do in managed code be to create as few managed objects as possible that leverage that clever kernel code?

I then wondered, why are they not just using System.Threading.Timer or some other .Net timer class, I mean, surely these are very optimized? Well, this IOThreadPoolTimer appeared to be at its core a wrapper of System.Threading.Timer. I dug into System.Threading.Timer and saw that it, itself had much complexity of queuing, managing, and many small timer objects. Oh my goodness all over again. And System.Timers.Timer is the same, only worse (another wrapper over System.Threading.Timer). And System.Windows.Forms.Timer and DispatcherTimer are uninteresting to me for obvious reasons.

So… I decided it was time to do the learning exercise (i.e. fun). I wrote a timer which just uses the threadpool timer APIs in the Vista Kernel (CreateThreadpoolTimer, etc). Of course I used SafeHandle for managing the timer handles safely. In the process of that I got to answer a previously unanswered question on stackoverflow.

In the end it was only about 50 lines of code, and didn’t take that long.

I then wondered if what I had done was really a good idea. So I searched and I came across a blog of Eric Eilebrecht’s, where he mentions the thread pool has been revamped in Windows Vista. MSDN mentions this too (Thread Pool Architecture)

“The original thread pool has been completely rearchitected in Windows Vista. The new thread pool is improved because it provides a single worker thread type (supports both I/O and non-I/O), does not use a timer thread, provides a single timer queue, and provides a dedicated persistent thread. It also provides clean-up groups, higher performance, multiple pools per process that are scheduled independently, and a new thread pool API.”

Now to me, I don’t really understand why APCs are such a big deal. Maybe because I never use them. APCs are basically evil – have the kernel do lots of calling back into user land, with context switch overheads, at unpredictable times, which can turn into very weird reentrancy from user code’s point of view? How did anyone ever think that was a good idea?

So, here is an idea. Assuming we are not using APCs, and assuming we are running on Vista or something more modern (seems reasonable to me as I mostly write server apps…) shouldn’t we just use the regular Vista thread pool?

And in that case, maybe we should also use Vista thread pool’s built in native timer API? In which case maybe my learning exercise was actually a pretty good idea?

Well, I’m still not sure!

So… does anyone know a good benchmark for performance testing a timer class?! Smile

In the meantime I came up with a few basic tests to run and compare it to System.Threading.Timer. In a simple app which creates a timer event which recurs at the minimum interval (1ms?) and prints out a line each time.

a) when you set a period interval, how many timer events does it actually create versus what you’d ideally expect? (are you losing events, or are they occurring slower than you expect due to perf overhead of the timer itself? not accounting for race conditions in incrementing the counter)

b) how many threads are running at the end of the scenario? and how many threads exited? (according to VS debugger)

c) how many total samples will the CPU profiler take while the app is running? (since this app isn’t exactly CPU bound, we can just try to estimate what % of the CPU it uses)

d) how much memory does it use?

 

Here are the results:

 

New timer

System.Threading.Timer

# callbacks (release)

9901

634

# threads (debug)

About 500! Ohno!

7 (not counting 1 destroyed)

# cpu samples (release)

780 samples

0 samples!

# memory overhead - 1 timer (debug)

1-2MB

0 (baseline)

Whoopsies! It turns out this lightweight idea was a total performance disaster, because the Vista thread pool creates up to 500 threads! Owch. What is going on?

MSDN says “Do not queue too many items too quickly in a process with other components using the default thread pool. There is one default thread pool per process, including Svchost.exe. By default, each thread pool has a maximum of 500 worker threads. The thread pool attempts to create more worker threads when the number of worker threads in the ready/running state must be less than the number of processors.”

What if I change my callback to not call System.Console.Writeline, I wonder?

Suddenly the numbers are drastically improved!

 

 

New timer (no writeline!)

System.Threading.Timer

# callbacks (release)

9904

634

# threads (debug)

7 or sometimes a couple more (not counting 1 destroyed)

7 (not counting 1 destroyed)

# cpu samples (release)

5 samples

0 samples!

# memory overhead - 1 timer (debug)

+a couple extra thread stacks

0 (baseline)

Also, 4 of the 5 CPU samples were in GCHandle.FromIntPtr which maybe points me at ideas for optimizing the timer a little further. I will play around with a ConcurrentDictionary of IntPtrs to see if its any better than GCHandles. It seems to be slightly faster, but you give up the ability to make timers act as weak references that GCHandles gives you.

Anyway there’s a much more interesting puzzle. How come System.Console.WriteLine is such a total performance disaster when used with the thread pool timer API? Is it something specific that WriteLine does? Or is it just that WriteLine is expensive? Maybe because it’s a synchronous API? I have no idea. Let’s try some things out.

System.Threading.Thread.Sleep(1) ? resulted in 12 threads.
for (int j = 0; j < 10000; j++) { k += j-5000; } resulted in 8 threads. With only 26 CPU samples
for (int j = 0; j < 300000; j++) { k += j-15000; } resulted in 9 threads. With 1,422 CPU samples

We’re seeing we can use more CPU than writeline does, or more wall-clock-time on the thread (by sleeping) and yet still require less worker threads? What am I missing? Oh yeah, I was counting the number of threads by running under the debugger. That might actually be a bad idea, because starting a thread under the debugger could have a lot of extra overheads.

Going back to Console.Writeline, for a release build, running without the debugger, I see…. 90 odd threads! Sigh. I’m not exactly sure what the moral of the story is here, however, I think we can probably draw one conclusion. If you care about perf, don’t use Console.Writeline in your server app!

OK, now that you’ve read the rambling, you’re probably like ‘show me the codez’! Here it is, minus the boring bits (namespaces). you will be using it at your own risk, given that I don’t really understand why it created so many threads so easily. Smile 

 

    ///< summary>

    /// Timer that receives periodic event callbacks on a worker thread.

    ///

    /// Note:

    /// As you would likely hope, if the TIMER gets GC'ed or disposed,

    /// YOU WILL ALSO STOP RECEIVING CALLBACKS... but not necessarily

    /// in a race-free way. It is possible for your callback function

    /// to be called after the timer is disposed/finalized.

    ///

    /// Note #2:

    /// Your callback can be called back from multiple timer events from a single timer

    /// in multiple threads simultaneously.

    /// Use a lock() to prevent unexpected re-entrancy if you use the timer period.

    ///

    /// Note #3:

    /// Reentrancy will not occur if you use a timer period of zero

    ///< /summary>

    public sealed class ThreadPoolTimer : SafeHandle

    {

        delegate void TimerCallback(IntPtr pCallbackInstance, IntPtr context, IntPtr ptpTimer);

        static TimerCallback globalTimerCallback = new TimerCallback(OnTimer); // garbage-collection safe delegate

 

        static ConcurrentDictionary<IntPtr, Action> lookup =

            new ConcurrentDictionary<IntPtr, Action>();

 

        public ThreadPoolTimer()

            : base(IntPtr.Zero, true)

        {

        }

 

        public ThreadPoolTimer(IntPtr preexistingHandle, bool ownsHandle)

            : base(IntPtr.Zero, ownsHandle)

        {

            base.SetHandle(preexistingHandle);

        }

 

        public static ThreadPoolTimer Create(Action userCallback)

        {

            var ret = CreateThreadpoolTimer(globalTimerCallback, IntPtr.Zero, IntPtr.Zero);

            if (!lookup.TryAdd(ret.handle, userCallback))

            {

                throw new InvalidOperationException("Timer intptr value conflict should never happen");

            }

            return ret;

        }

 

        public override bool IsInvalid

        {

            get { return this.handle == IntPtr.Zero; }

        }

 

        protected override bool ReleaseHandle()

        {

            CloseThreadpoolTimer(this.handle);

            return true;

        }

 

        protected override void Dispose(bool disposing)

        {

            Action ignored;

            lookup.TryRemove(this.handle, out ignored);

            base.Dispose(disposing);

        }

 

        public unsafe void SetTimer(DateTime expiryTime, uint msPeriod, uint acceptableMsDelay)

        {

            long ft = expiryTime.ToFileTimeUtc();

            SetThreadpoolTimer(this, &ft, msPeriod, acceptableMsDelay);

        }

 

        private static void OnTimer(IntPtr pCallbackInstance, IntPtr context, IntPtr ptpTimer)

        {

            Action action;

            if (lookup.TryGetValue(ptpTimer, out action))

            {

                action.Invoke();

            }

        }

       

        [DllImport("kernel32.dll")]

        private static extern ThreadPoolTimer CreateThreadpoolTimer(

            TimerCallback callback, IntPtr context, IntPtr pCallbackEnvironment);

 

        [DllImport("kernel32.dll")]

        private unsafe static extern void SetThreadpoolTimer(

            ThreadPoolTimer timer, long* dueTime, uint msPeriod, uint msWindowLength);

 

        [SuppressUnmanagedCodeSecurity]

        [DllImport("kernel32.dll")]

        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)]

        private static extern void CloseThreadpoolTimer(IntPtr timer);

    }

I’ve also published a NuGet package of this in case you want to try it out.

Install-Package UltimateTimer

Please tell me if it works terrible in practice, I’m interested to learn more from this experiment. Smile