My name is Ratna Rekha Koukuntla and I am a Senior SDET with Microsoft’s Sales and Marketing IT (SMIT) organization. I am working on a Data warehousing application involving multiple ETL jobs and Packages. Based on our customer needs to have monthly releases, we moved to an Agile software development lifecycle, and since then, have been experimenting with ways for quick test turnaround, particularly in the area of performance testing.

In this post I have described the approach we have taken to auto baseline and monitor performance of our production environment.

Below are some release challenges we were facing with our project.

  • Significant amount of time spent on performance testing during QA phase.
  • ETL timings were changing randomly in Production environment.
  • The code changes in new and legacy code were causing ETL time and volume changes.

 

To solve the above mentioned challenges, we took the approach of auto base lining performance of the production environment and monitoring data over time.

Below is the outline of the approach we have taken to generate some informational charts related to performance:

To start with, we stored information related to ETL Packages and stored procedures execution time in the database. The information stored has the IDs, start/end timings, volume information and Pass/Fail details.


A Test engine is designed which will read the above information and for each object it auto baselines the previous information (last month averages in our case) and compares current information against the history for that object. As a team we have defined some threshold which is used
as an alerting mechanism. The engine will also send an email based on the information collected.


Email is scheduled to send weekly and contains below information (Note: Graphs are plotted using Excel API in email).

  • Graphs related to ETL Job execution timings by comparing it with earlier month’s benchmark data.
  • Information Related to ETL jobs success rate.
  • Top stored procedures which took more time compared to earlier months with graphs which helps to show the variances quickly.
  • Top Stored procedures which improved in performance.
  • Top Tables which increased/decreased in volume.

Some benefits we could see with this approach:


When we used to do this effort manually, it took at least 8 hours of time in each release (once monthly). And additionally, there was a chance for human error, resulting in us having to re-run the tests.


The 40 hours we invested to automate this effort are paying great dividends as we now know which stored procedure requires optimizing, and needs to be scoped into the next release. Manual efforts are now focused on analyzing the data and logging bugs accordingly.


Additionally, the tool helps to identify other issues such as ETL failures  occurring in production.

Sample ETL Analysis Email

Information Related to ETL jobs Success rate:

Process Name

In Current Month
 

Average Time

  In Previous Quarter ( excludes failed runs)

  ( Month A – Month B )

Average Time

Execution Time
  Variance

Passed

Failed /

  Cancelled

           

  ETL
  Process  1

100 Minutes.

100 Minutes To 125
  minutes.

30 times

2  times

75 Minutes.

 

ETL Job Trend using execution timings (graph compares current data with the past months benchmark data):

 

Stored procedures with improved performance:

 

 

 Stored procedures that require attention: