In this article I am going to explain how batch processing in AX2009 works, I don't mean how to set up a batch group or any of that kind of thing that you find in the manual, what I mean is what each AOS is doing in the background to decide how and when to pick up batches and process and complete them. Understanding this background can help in advanced batch troubleshooting or development scenarios.
In AX2009 batch processing changed. Now we have AOSes which can run batch processes directly, if you want to see what's happening with a batch process, it can be more difficult than in AX3 or AX4 as there is no client sitting there running to look at.
What happens now is that each AOS has a dedicated thread which checks for batches, basically all this does is calls Classes\BatchRun.ServerGetTask() once every 60 seconds (timing is not configurable) and if there is any work for that AOS to do then the AOS will pick up a task from here.
I'll give an example of an end-to-end batch process to show what happens where and when:
- A report is sent to batch by a user, it goes into the batch queue in BATCHJOB (header) and BATCH (the batch tasks).
- Once every 60 seconds each AOS that has been configured for batch processing (in administration->setup->server configuration) will call the X++ method - Classes\BatchRun.serverGetTask()
- In serverGetTask() the logic is exposed in X++ so we can all see what happens, this is the main place that we decide what to pick up for batch processing. Basically it checks if there is any tasks in the BATCH table waiting for this AOS - based on the batch groups that this AOS is configured to process, and based on the time that the records in BATCH are due to be processed (i.e. something processes at 21:00 each day then it won't get picked up until 21:00 despite the fact that the AOS polls every 60 seconds). There are a few stages to this method:
1. First we check if there is a task (a task is a record in BATCH table) ready for us, the query for this is like this:
select firstonly pessimisticlock RecId, CreatedBy, ExecutedBy, StartDateTime, Status, SessionIdx,SessionLoginDateTime, Company, ServerId, Info from batch where batch.Status == BatchStatus::Ready && batch.RunType == BatchRunType::Server && (Session::isServer() || batch.CreatedBy == user) join Language from userInfo where userInfo.Id == batch.CreatedBy && userInfo.Enable == true exists join batchServerGroup where batchServerGroup.ServerId == serverId && batch.GroupId == batchServerGroup.GroupId;
2. If a task is returned in step 1 then there's nothing more to do and we start processing that task. If no task is returned then we look to see if any batch jobs need to be started, the query for this is like this:
update_recordset batchJob setting Status = BatchStatus::Executing, StartDateTime = thisDate where batchJob.Status == BatchStatus::Waiting && batchJob.OrigStartDateTime <= thisDate exists join batch where batch.BatchJobId == batchJob.RecId exists join batchServerGroup where batch.GroupId == batchServerGroup.GroupId && batchServerGroup.ServerId == serverId;
3. After step 2 we will run Classes\batchRun.serverProcessDependencies(). In here something interesting happens - we see that we use this table "BatchGlobal", this is used as a focal point, because we might have several AOSes running batch processing in the same environment, and so for some operations we look to this table to see if another AOS has already done something, to decide whether the current AOS needs to do it as well or not. For dependencies we just make sure that another AOS is not doing this in the same second. So if we continue here, the queries we run to set more tasks (again tasks are just records in the BATCH table) ready for processing are below - you can see in the queries how we update the status on the BATCH table records, checking that we only do it for records which are ready and do not have any constraints that are not completed yet:
update_recordset batch setting Status = BatchStatus::Ready where batch.Status == BatchStatus::Waiting && batch.ConstraintType == BatchConstraintType::And exists join batchJob where batchJob.Status == BatchStatus::Executing && batch.BatchJobId == batchJob.RecId notexists join constraintsAnd exists join batchDependsAnd where ( constraintsAnd.DependsOnBatchId == batchDependsAnd.RecId && constraintsAnd.BatchId == batch.RecId && ((batchDependsAnd.Status != BatchStatus::Finished && batchDependsAnd.Status != BatchStatus::Error) || (constraintsAnd.ExpectedStatus == BatchDependencyStatus::Finished && batchDependsAnd.Status == BatchStatus::Error) || (constraintsAnd.ExpectedStatus == BatchDependencyStatus::Error && batchDependsAnd.Status == BatchStatus::Finished)) );
- So our report which we sent to batch, if in the steps numbered 1-4 above, we found this record was ready to process, and we picked it up, what happens next inside the AOS kernel is that we start a worker session, which can be thought of a bit like a client session, just without a client, it will have it's own session ID and you'll see the ID recorded against the record in the Batch table. From this point it calls BatchRun.runJobStatic() and actually runs the batch process - this is just normal X++ running the process here. When this runJobStatic() completes we call BatchRun.ServerFinishTask(), which just sets the status of the record in BATCH to either "finished" or "error" (if it failed for some reason).
- Now our batch task is finished - the record in the BATCH table. But the header for this batch, the Tables\BatchJob record is not set to finished yet. For this part there is another background process running every 60 seconds on each AOS which just calls into BatchRun.serverProcessFinishedJobs(). Now we can see in this X++ method what it does - we use this BatchGlobal table again, to make sure that between all AOSes we only check for finished jobs a maximum of once every 60 seconds, if it has been 60 seconds then we will run a whole load of queries (too many to copy here but you can check there to see them) to create the batch history (various tables), set the BatchJob record to finished and delete the completed tasks and constraints.
There are a couple of other background things that happen in the AOS kernel for batch processing:
1. Every 5 minutes it will call to BatchRun.serverCleanUpDeadTasks() - again we use the BatchGlobal table, so that we'll only run this once every 5 minutes between all AOSes. This just sets tasks back to "ready" if the session ID for the worker session (I mentioned this earlier - we create this worker session when we start processing a task) is no longer a valid session - basically if a task fails with an X++ exception, or something like that, then the worker session will end, and if you have configured this batch task to allow some retries, then it's this method which will reset the task for it to have a retry.2. Every 5 minutes each AOS will check the server settings, to see if it's supposed to process the same batch groups - or if it's not supposed to be a batch server any more, all those settings.