Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

One in a million redux

One in a million redux

  • Comments 7

I mentioned my “one in a million is next Tuesday” to my wife the other day, and she asked me “So did you include the bit about the PCs clocks?”

And it hit me that I hadn’t.  Doh!  So here it is.  It’s kinda fascinating actually.

Time on a PC is kept via counting the number of clock interrupts that have occurred.  Every PC contains a crystal that operates a clock chip that interrupts the CPU approximately every 10 milliseconds. So NT increments the system time by 10 milliseconds every time it receives a hardware interrupt.

But the problem is that the crystals used internally in the system have a failure rate of as high as 100ppm – in other words, 100 times every million clock ticks, the clock chip won’t actually generate an interrupt.  For most applications, this isn’t a significant problem – instead of the system context switching every 10 milliseconds, every once in a while, the system context switches in 20 milliseconds.

But for time, this is an utter disaster.  Given a 10 millisecond timer, there are 8,640,000 clock ticks per day.  If 100 per million clock ticks are missed, then that means that the system misses 864000 clock ticks, which is about 864 seconds.  That’s over fourteen minutes per day!

Now, in practice, the amount of drift is actually much lower, but still it can be quite significant.

So how does NT fix this?  Well, back in NT 3.1, once an hour, NT would interrogate the on-board real time clock chip (the hardware that keeps your date and time up-to-date even when your computer is powered off).  If the system time differed from the real time clock chip, then it would simply reset the system time to match the time on the RTC.  Which meant that time could jump forward or backwards significantly – so it was possible for the assert to fire in the following code:

            GetFileTime(&time1);
            GetFileTime(&time2);
            ASSERT(CompareFileTime(&time1, &time2) < 0);

Clearly this was an unacceptable situation, so something had to be done to fix it.  The fix (in NT 3.5) was to change how time was accounted for in the system.  In the old system, every clock interrupt bumped the time by 10 milliseconds.  With the change, when the system measured the time from the RTC, instead of applying the new time immediately, it calculated an adjustment to the 10 millisecond amount.  If the clock was behind, each tick might count as 11 or 12 milliseconds.  If the clock was head, each tick might count for 8 or 9 milliseconds.

This is actually pretty cool (ok, I think it’s amazingly clever), but again, there can be problems.  What if you’re using the current time and some high performance counter (like QueryPerformanceCounter)?  Then the clock drift will cause your measurements to be skewed from the real time measurements.  We actually ran into this problem in the SCP project – our clock tests were showing that the clock on the SCP chips was drifting, but we couldn’t see why it was happening – it turned out that the SCP chip clock wasn’t drifting, it was the PC’s clock that was drifting.

To allow people to compensate for this drift, a new API was added: GetSystemTimeAdjustment.  The GetSystemTimeAdjustment API allows you to determine the clock interrupt frequency (that’s the lpTimeIncrement parameter), and the adjustment that’s applied to each tick (that’s the lpTimeAdjustment parameter).

Edit: Fixed the result of CompareFileTime.

 

  • Am I right in thinking that the system time adjustment is also altered by the W32time service, to keep in sync with the master clock?

    The master clock is your domain controller for any workstations; your domain controller(s) ought to be configured to use an external source. See net time /setsntp for more details, and the w32tm program to force synchronisation.
  • Sort-of. The W32time service provides another authoritative time source (actually it's more trusted than the RTC clock). And the clock skew algorithms will skew the clock time to match the W32time service's time.

    There's also some logic in the system to deal with the case where there is a radicial shift in time (the user sets the date on their PC to Jan 1, 2001, and then W32time comes along and adjusts it do Apr 2, 2004) but I don't know how that works.
  • Wait a second. 100 per million out of 8.64 million is about 864 ticks missed, which is just 8.64 seconds per day.
  • Ilya is right!
  • Ilya's absolutely right, and I feel MUCH better actually - my memory was that there was an 8 second clock drift but I kept on doing the math and coming up with 800 seconds - it seemed wrong but... That's why the text above says "Now, in practice, the amount of drift is actually much lower" :)

    Either way, 8 seconds a day is unacceptable on a system that is going to stay up for 6 months at a time. Which is why the NT guys solved the problem.
  • Speaking of high-performance counters, I notice that it is takes way more time to QueryPerformanceCounter than to timeGetTime or GetTickCount. It’s kind of funny that the more precise you try to measure time, the more the measurement affects the result, exactly conforming to the Uncertainty Principle :)
  • A good point. Part of the reason for that is that GetTickCount (and timeGetTime) just read from a variable in the PEB (Process Environment Block). So does GetSystemTimeAsFileTime. But, they're not accurate (ok, GetTickCount is, sort-of).

    QueryPerformanceCounter reads the high frequency CPU performance counters - so it has to actually read these off the CPU, and that takes time - especially since the numbers must be accurate for MP machines.
Page 1 of 1 (7 items)