So the issue here is setting a batch job to run with a recurrence of 1 minute in AX2009 - a server bound batch job. The job itself takes just a few seconds but the batch job isn't completing for a few minutes, and so the next run doesn't happen after 1 minute, it happens after several minutes, which is not what is needed at all.

 

Initially I'd like to explain what is happening in this situation and then based on that I will explain the solution:

 

What is actually happening in the background:

If we take an example batch job, it's not important specifically what it is, just something which runs in under 1 minute. It is sent to batch with a recurrence of 1 minute starting from 12:00:00 and continuing forever.

 

There is a background thread on an AOS which checks, once every 60 seconds for new batches to process. It does this by calling batchRun.serverGetTask(), it is also important to note that each batch thread configured in the AX configuration, will trigger this same process, whenever they finish their work and need a new task to pick up - they call into batchRun.serverGetNextTask(), which calls straight back into batchRun.serverGetTask() as before. I mention this because the polling for new tasks is once every 60 seconds if there are current no batch threads running, but on a server with lots of batch tasks to process, each batch thread will be picking up tasks as it becomes free, this could be more or less than 60 seconds, but the threads will work continuously until there are no more tasks ready, when there are no more tasks then we're only polling every 60 seconds for a new task.

 

This is the first part that might cause a delay - if we use our simple example above, and say that there are currently no other batch processes running on the AOS, then once this job is sent to the queue it will wait for the AOS to do it's poll after 60 seconds before it begins to be processed. So if it's sent to the queue at 12:00:00 then in the worst case it would not be picked up to begin processing until 12:00:59, due to this 60 second interval.

 

The next part that may cause a delay is the time taken for the job to run itself, in this example let's say it takes 20 seconds. So now we're at 12:01:18.

 

Finally, there is another background thread on the AOS which runs at a 60 second interval which calls batchRun.serverProcessFinishedJobs(). This will not be called by the batch threads themselves, only by the AOS master session. You will see in the serverProcessFinishedJobs() method that there is also a check to the BatchGlobal table so that in a multi-AOS environment we still do not call this more than once every 60 seconds. Our batch finishes at 12:01:18, but due to the 60 second timing of this process in the worst case it may not be processed until 12:02:17. It's this method that sets the batchJob status to finished.

 

This method also sets the next recurrence, this will be the original start time, in our case 12:00:00, compared with the time now, 12:02:17, it will add the interval until the new start time is greater than the time now. This would cause the next recurrence to be 12:03:00. This ensures the next recurrence is not in the past. SO the next run is now at 12:03:00 but we would have expected it to be 12:01:00 based on our recurrence of 1 minute, so it's gone wrong.

 

 

How we can work with it:

So we have seen above that in the worst case the job might wait for 59 seconds to get picked up, it will take some seconds for the task itself to process, perhaps 20, and it may take up to 59 seconds to get set to finished. This gives us a combined time of "1:58+task running time" in the slowest case. So we can avoid this threshold with some more creative configuration. Instead of just 1 we create 5 staggered batch jobs for the same process:

 

- Job 1 is configured to start at 12:01:00, and run every 5 minutes

- Job 2 is configured to start at 12:02:00, and run every 5 minutes

- Job 3 is configured to start at 12:03:00, and run every 5 minutes

- Job 4 is configured to start at 12:04:00, and run every 5 minutes

- Job 5 is configured to start at 12:05:00, and run every 5 minutes

 

The result of these jobs together gives us a regular interval of 1 minute for an instance of this task to run, and the 5 minutes interval means that the potential "1:58+task running time" does not affect the recurrence of the job, as we have some space in the 5 minutes to absorb this extra time. Working in this way ensures that the 1 minute recurrence is preserved.

 

In addition to this I made a small change to Classes\BatchRun (see attached XPO - you need to rename back to *.xpo) to increase the frequency that we check for finished jobs. In the serverGetTask() method I add an extra call to serverProcessFinishedJobs() and within the serverProcessFinishedJobs() method I modified the check to the batchGlobal table to enable checking once every 30 seconds instead of once every 60 seconds. There can be a performance overhead with this, so if you want to try it then your testing is important. It is for performance reasons that in standard we check only every 60 seconds. You can of course fine tune the interval to something between 1-60 seconds through your testing.

 

To give an example of the timings before and after the changes, the origStartDateTime is the scheduled start time, the startDateTime is the actual start time, and the endDateTime is the logged end time. These are taken from Tables\BatchJobHistory.

 

Before:

 

StartDateTime

EndDateTime

OrigStartDateTime

30/03/2011 09:14:09

30/03/2011 09:16:09

30/03/2011 09:12:34

30/03/2011 09:17:21

30/03/2011 09:18:09

30/03/2011 09:16:34

30/03/2011 09:19:33

30/03/2011 09:20:09

30/03/2011 09:18:34

30/03/2011 09:20:42

30/03/2011 09:22:09

30/03/2011 09:20:34

30/03/2011 09:22:52

30/03/2011 09:24:09

30/03/2011 09:22:34

30/03/2011 09:25:00

30/03/2011 09:26:09

30/03/2011 09:24:34

30/03/2011 09:27:09

30/03/2011 09:29:09

30/03/2011 09:26:34

30/03/2011 09:30:18

30/03/2011 09:31:09

30/03/2011 09:29:34

 

 

After:

 

StartDateTime

EndDateTime

OrigStartDateTime

31/03/2011 10:58:35

31/03/2011 10:59:35

31/03/2011 10:58:00

31/03/2011 10:59:45

31/03/2011 11:00:35

31/03/2011 10:59:00

31/03/2011 11:00:53

31/03/2011 11:01:35

31/03/2011 11:00:00

31/03/2011 11:01:01

31/03/2011 11:01:35

31/03/2011 11:01:00

31/03/2011 11:02:09

31/03/2011 11:02:35

31/03/2011 11:02:00

31/03/2011 11:03:17

31/03/2011 11:04:25

31/03/2011 11:03:00

31/03/2011 11:04:25

31/03/2011 11:05:32

31/03/2011 11:04:00

31/03/2011 11:05:32

31/03/2011 11:06:39

31/03/2011 11:05:00

 

You will notice in the "after" example that two threads get the same end time as due to specific timing of the threads in this test run they were picked up by the same instance of serverProcessFinishedTasks(), but even with this anomaly the timings are still regular approximately 1 minute intervals.

--author:  Tariq Bell
--editor: Tariq Bell
--date: 14/04/2011