One of our automation scripts was failing last week.  The test it was performing is very easy to describe: it pasted text on a page and verified the text was pasted.  Then it called Undo to undo the paste and verified the text was indeed gone.  And it started failing about 31% of the time.

 

I had a healthy dose of skepticism that Undo was broken.  I figured if CTRL+Z ever quit working, everyone on the team would immediately get deluged with email and bug reports since that is such a core piece of functionality.  Instead, I suspected a problem with our test code.  Let's go over the algorithm the test used.

 

  1. Get the XML of a page
  2. Paste the content on the page.  In this case, a simple string of text.
  3. Get the XML of the resulting page
  4. Verify the text was pasted
  5. Undo the operation
  6. Get the XML of the page
  7. Verify the text had been removed

 

Step 7 was the step that was failing.  The first thing I did was set this test to run in a debugger and examine the XML I got after step 6 each time to see if the text had been removed.  In each of my trials, it had been removed, but was failing in the lab.  This gave me a clue. It was passing each time I ran the script in the debugger and failing when running outside a debugger - this made me think of the biggest difference between the two scenarios, and some sort of timing issue became my prime suspect.  Applications run very slow (relatively) in a debugger, and since I was breaking into the code after step 6, I was really throwing off the logic of the implementation of the algorithm.

 

I started looking at the code which retrieves the XML of the page in step 6.  It was a straight up call to GetPageContent(), but had a little optimization built into it.  It checked the Last Modified Time on the XML object it already had for the page against the Last Modified Time of the page in OneNote.  If the times matched, the code assumed the two pages were the same, so it did not update the full page content.  Here's the XML from a section:

 

<one:Section name="General" ID="{3134C044-CFEA-0AF3-2602-96F4CC75CD54}{1}{B0}" path="\\johngui-xp\local\General.One" lastModifiedTime="2008-11-05T22:39:43.000Z" color="#D5A4BB" xmlns:one="http://schemas.microsoft.com/office/onenote/2007/onenote">

 

Notice that "lastModifiedSection" attribute:  lastModifiedTime="2008-11-05T22:39:43.000Z

 

It's in UTC, and was last modified at 10:39:43.000.  Those final three zeroes are the decimal fraction of a second that the event occurred, so this means the last time anyone modified this section happened to be exactly on the second mark (or so I thought).

 

A little more investigation showed the time stamp across every object always ended in .000 - in other words, OneNote always saves the last Modified Time in our XML to the nearest second.  So, in some cases, the XML could have changed and the code we used to retrieve it would not think it had changed at all.  The timing would go something like this:

 

 

Actual Time

What the test did

lastModifiedTime "stamped" in XML

1

8:22:01.003

Paste the text in the page

8:22:01.000

2

8:22:01:012

Get the XML from the page

8:22:01.000

3

8:22:01.101

Verify the XML is expected

8:22:01.000

4

8:22:01.288

Undo the operation

8:22:01.000

5

8:22:01:414

Check to see if the page has changed based on the time stamp

8:22:01.000

 

So on step 5, the code looked at the last Modified Time of 8:22:01.000 and compared it to the time from step 2 to see if the page in OneNote was newer than what it had.  Since the times were the same, it assumed it was up to date.  The changes made to the page in step 4 were missing, and the test failed.

 

We tried a fix to enforce getting new XML each time we needed to do a verification, essentially bypassing the time check we had implemented.  That proved surprisingly problematic, so a developer hopped in and removed all the code which was controlling if we needed to update the XML.  Now the script is passing 100% of the time again.

 

Questions, comments, concerns and criticisms always welcome (and believe me, there are enough loose ends with this particular problem to discuss for quite a while),

John