The Microsoft Dynamics CRM Blog
News and views from the Microsoft Dynamics CRM Team

Performance Tuning - Batch Process

Performance Tuning - Batch Process

  • Comments 4

“We are facing a big trouble with a Batch process that we developed using SDK… The SDK “Create” and “Update” methods are causing a delay in the process, and we are facing a very slow process with throughput of just 15 records per second...”

I have been working very closely with a team facing a similar issue, so thought I might share my experience in solving their problem.

I started by reading some of the performance optimization techniques published by CRM team. I am sure whosoever reads these will agree that CRM team did great job in ensuring that all the performance optimization techniques are well documented to help customers when they face situations like one mentioned above.

Here is list of recommended articles:

So the challenge we faced was to bring the throughput close to 30 records per second (that is double the existing throughput). After referring to CRM optimization techniques we did some research on SQL tuning and reviewed overall architecture of the application running batch process. After research and analysis we came up with following performance improvement action plan:-

1.  Firstly we proposed to keep data in “Ready-To-Be-Imported” state in the staging database. So we decided to perform all the pre-processing tasks, like identifying appropriate references, setting correct owners, setting picklist values etc, on the staging database.

2.  We decided to disable all the callouts registered in the system during the batch process. Again prototype showed us that callouts can reduce throughput by almost 50% - 60%.

3.  Next huge improvement we got was by pre-assigning GUIDs to the records to be created (we did this on staging database). We generated sequential GUID’s for all the records to be created in the batch.

Since GUID’s used were mostly indexed like AccountId etc, so we did some bit twiddling and we observed improvement of ~50% after this change.

Sample code for bit twiddling we did:

Text Box: System.Guid tmpGuid = System.Guid.Empty;
long ret = UuidCreateSequential( ref tmpGuid );

                // now twiddle the bits so that SQL will sort them
                // sequentially (not the same sort order as UuidCreateSequential)
                guidData = tmpGuid.ToByteArray();

                // Low timestamp field (4 bytes)
                b = guidData[2];
                guidData[2] = guidData[1];
                guidData[1] = b;

                b = guidData[3];
                guidData[3] = guidData[0];
                guidData[0] = b;

// Mid timestamp field (2 bytes)
               b = guidData[4];
                guidData[4] = guidData[5];
guidData[5] = b;

                // High timestamp field (2 bytes)
                b = guidData[6];
                guidData[6] = guidData[7];
                guidData[7] = b;

                System.Guid seqGuid = new Guid(guidData);

4.  Next in pipeline was multi-threading the batch processing step. As recommended in “Performance Best Practices” article (and testified by a quick experiment) we decided to have 4 threads (most optimal) making web-service calls.

5.  Followed with these was some SQL optimization. Following are some of the SQL resources which we found very useful in current problem space:-

    a.  Importing and Exporting Bulk Data

    b.  Optimizing Bulk Import Performance

    c.  SQL Server 2000 Incremental Bulk Load Case Study

    d.  Storage Top 10 Best Practices

As you can see that there is huge amount of information on this subject, so we thought to try few of them which were most applicable in our case like

  1. Reviewing indexes (both on server and client staging database). We observed that creating few covering indexes (read more here) did give us returns but one need to be very careful while defining new (custom) indexes as it may have impact on upgrade scenarios and more importantly non-optimal indexes can in fact lead to degradation in performance.
  2. Additionally, as recommended in one of the above articles, we decided to keep the SQL logs on a separate drive.
  3. In one of the cases we even dropped index online (read more here) and re-created them after batch process was completed. Be very diligent in choosing this option as it may have huge impact if not done correctly.

With techniques stated above and few more defined in articles referred in this blog we were finally able to achieve throughput of ~40 records per second (way beyond team’s target). But this is not the highest you can achieve through Dynamics CRM 3.0 SDK; if you follow guidelines prescribed then you can achieve even much better throughput, and I know people who are really getting it (in this case we just picked few techniques to hit immediate targets).

The most important thing we learned from this experience was to understand that return for an optimization is always significant. When dealing with batch process even smallest of improvement gets magnified and turns out to be very significant.

Rohit Bhatia

  • You may NOT reorganize the bytes of a GUID and preserve it's guarantee of global uniqueness.  THIS IS VERY BAD ADVICE. What you are doing can very easily result in duplicate GUID assignments.  Please don't do that.

  • Nice!

  • Batch running at 30 to 40 records per second..is not an improvement.  

    That is still slow. If you run your enhanced batch program over 800,000 contact records...your batch will finish more than 5hours!!!!  

    More so, input record volumes less than a million records (read, insert, update) should not even take that long. Unless i am running the batch using a desktop PC.  

  • there was any response with regard to processing large amounts of records and performance of this?

Page 1 of 1 (4 items)
Leave a Comment
  • Please add 8 and 5 and type the answer here:
  • Post