I was surprised and dismayed to read a recent article in Embedded Systems Programming (http://www.embedded.com/showArticle.jhtml?articleID=159902113) that gets so many things wrong about the GetTickCount API and portrays Windows CE so negatively.

Apparently I’m behind on my reading, since the same author also wrote a previous article on the same subject, to which Mike Hall wrote a rebuttal on his own blog: http://blogs.msdn.com/mikehall/archive/2005/01/28/362498.aspx  Reading the more recent article makes me want to write my own rebuttal.

GetTickCount is a pretty simple API from the caller’s perspective.  Every millisecond the tick count increments, and you can use GetTickCount to retrieve the number of milliseconds since boot.  Since GetTickCount returns only a 32-bit number, after about 49 days, the counter wraps.  This is documented behavior, and to properly use GetTickCount you have to understand that.  Typically GetTickCount is used to time the duration between two events, in which case you’re generally safe if you subtract two values; subtraction is safe in the presence of rollover.  (eg. If you get a tick count of 0xFFFFFF00 before the rollover, and a tick count of 0x200 after the rollover, subtraction gives you get a difference of 0x300 as expected.)  The only time subtraction can get you into hot water is if there’s a chance the time delta will exceed 49 days, because you may end up needing a difference that’s larger than you can represent in 32 bits.  In which case GetTickCount is the wrong API for you.  I guess in that case you would probably need to implement something using GetSystemTime and SystemTimeToFileTime.  Applications that use GetTickCount get into trouble if they are subtracting over such a long time period, or if they are using something besides subtraction with the tick count.  For example,

   if (GetTickCount() > MyTickValue) { ... }

will also get you into trouble.  If I were to guess, that’s where I’d say most applications probably go wrong using this API.

To help catch such errors in applications and drivers, Windows CE does a little thing on debug builds – it initializes the tick count such that it rolls over 3 minutes after boot.  The author talks about our 3-minute rollover as if it’s indicative of a problem in the OS, when really it’s just a meager attempt to help catch bugs in applications.  It’s not much help really, if you ask me, but it might help catch a bug or two.  I’d love to improve on it, but it’s tough to arrange for the timer to roll over at a really useful time for testing your application.  For example you might think we could create an IOCTL you could call, to set the timer at run-time.  But making the timer jump at run-time could mess things up.  Suddenly drivers and applications would think 47 days have passed, maybe network drivers time out, all your appointments fire, who knows…  I’m making things up since I don’t really know how networking or appointments are implemented, but you get the idea.  If you really want to be careful about testing your application or driver that uses GetTickCount, probably the best thing to do is create a wrapper that your code uses to call GetTickCount, and arrange for your wrapper to manipulate the times.  I’d love to see suggestions, and if you can come up with something good we can do in the OS, hey, maybe we will take your advice.

So now I’ll describe how GetTickCount is implemented internally.  First off, the function is technically owned completely by the OEM, though we provide as many implementations as we can manage, so that OEMs can use those.  The implementation varies per CPU and per OAL, but in general, there’s a 32-bit counter, CurMSec, that is incremented once per millisecond with a timer interrupt.  That millisecond timer interrupt is also used for other things, like scheduling threads.

To conserve power, when the kernel has no threads to schedule, the system goes into an idle state (implemented by the OEM’s OEMIdle function) where the timer interrupt is extended to a longer period.  That allows the CPU to spend a longer time in a low-power state.  The idle period is ended when an interrupt fires (the extended timer interrupt or some other interrupt).  When the system leaves the idle state, OEMIdle updates CurMSec with the amount of time that the system was idle, using whatever timer the hardware has.

All the magic is really in the OEM’s implementation of OEMIdle.  If you want some code to look at, see the CE 5.0 help article.

Now, back to the article.  Here are my responses:

  • One of the things that floors me about this article is that it claims that GetTickCount counts downward, not upward.  That is just plain wrong.  I don’t know what gave the author that impression but the counter starts out at 0 and goes up.  If you see any documentation or code that claims otherwise, please tell me and we’ll get it corrected.
  • The author also asks whether the counter “sticks” at the same value after rollover, which it doesn’t.  He seems to imply that it does though.
  • GetTickCount also works just fine on 16-bit and 64-bit CPUs, though the author implies that it doesn’t.
  • The article brings up cases where counters are non-monotonic, where the counter jumps backwards by a few ticks.  If that happens, it means the timer is not implemented correctly by the OEM.  Most likely something in OEMIdle is not right.  We do provide documentation of how to implement a timer (here’s some), standard implementations for different CPUs, and tests to verify that the timer is implemented correctly.  But I’m sure some people would argue that we don’t do enough to help OEMs get this right, and maybe they’re right.  Let’s discuss it.  What else would you like to see?  What trouble have you had?
  • All the other examples of badness that the author uses come from desktop Windows as far as I can tell, and they are problems with applications that use GetTickCount improperly, not with Windows itself.
  • You could argue that this is too complicated, that GetTickCount should just read the timer hardware straight, and scale to milliseconds.  I think the main reason it was done this way was to standardize between hardware with a count-compare type timer and hardware with a count-down timer.  It also avoids doing division to convert to milliseconds, especially 64-bit division since 64-bit timers are common.
  • The author actually suggests using a stopwatch to measure elapsed time during a performance test.  Even if that was accurate enough, it’s a pain and it’s not automatable.  How could you run tests regularly to make sure that nothing got worse?  I fully support the author’s idea that you should loop many times, so that the performance test runs long enough that you could time it with a stopwatch.  But that’s for reasons of repeatability, to get rid of variance.  I don’t believe a stopwatch is the right answer.  For timing code run-times, GetTickCount and QueryPerformanceCounter have satisfied every need I’ve seen so far.

I guess the thing I disliked the most was that the author took a list of implementation complications, application bugs from desktop Windows, incorrect information and “what if’s,” and turned it into a negative portrayal of Windows CE.  Maybe I am just too sensitive about the product I pour so much energy into.  Too many people assume that Microsoft developers don’t care, when the truth is very much the opposite.

Oh man has this discussion gotten long.  Have I ever got just a few words to say about something?  Oh well.  Write back if you have opinions to add or if you think I got any technical details wrong.

Sue