In Part 1 of this series I covered testing SQL queries using the Microsoft OLE DB Provider for DB2 using a linked server. In Part 2 I showed how you can test performance with SSIS. However, in many cases, customers may be using C# or VB.NET, writing web services or stand-alone applications to query DB2 for data. So, let’s begin.
First I wanted to again test different settings for the Rowset Cache Size option on the parameter. But, since these queries were going to be smaller, I was wondering if using Connection Pooling would help. The Microsoft OLE DB Provider that comes with Host Integration Server (and the SQL Feature Pack) implements its own connection pooling internally.
When Connection Pooling is enabled, the data provider does not immediately issue a close to the DB2 database when it receives a close from the consumer (application). It waits before closing the connection (this is controlled by the “Connection Timeout” property, which defaults to 30 seconds). When a new Open() is issued, for the SAME connection string, this helps speed up reconnecting as some frames do not have to be repeated to the host system (security for instance). This is most beneficial when you are making many small queries to the host, one behind the other. It may not be as beneficial when queries take many seconds or minutes to complete.
So, to begin with, I decided to limit my query to 5,000 rows. I also wanted to make the DB2 server do a bit of work, so I decided to use the following query (with the fetch first it would return over 2 million rows):
const string DB2QueryString = "select * from schema.table where amount BETWEEN 5.00 AND 10.00 FETCH FIRST 5000 ROWS ONLY";
I created my initial connection string with the Data Access Tool, and then copied this into my C# application, after striping out the “Connection Pooling” and “Rowset Cache Size” options, and I ended up with this:
const string cnStringDB2OLEDB = @"Provider=DB2OLEDB;User ID=USERNAME;Password=password;Initial Catalog=DB2400;Network Transport Library=TCP;Host CCSID=37;PC Code Page=1252;Network Address=DB2400;Network Port=446;Package Collection=SCHEMA;Default Schema=SCHEMA;Process Binary as Character=False;Units of Work=RUW;Default Qualifier=SCHEMA;DBMS Platform=DB2/AS400;Persist Security Info=True;Defer Prepare=False;Derive Parameters=False;";
At this point I was almost ready, but I needed to decide how many times I wanted my query to run, and how many different settings of Rowset I wanted to use. I decided I wanted to the query to run 10 times for each connection, and increment the Rowset by 30 each time, starting at 0 and finishing at 300.
// number of times to run each query
const int iQueryIteration = 10;
// rowset cache size increments from 0 to 300 using this value
const int iCacheInc = 30;
From various other tests, I know that simulating working with the incoming data makes a difference, so I setup a variable that can be adjusted as necessary, which would cause the data to actually be read into a variable later. For my purposes, I left it at true.
// set bReadValues to true to simulate working with the data coming in
bool bReadValues = true;
Now, at this point, everything is setup to begin. The main procedure controls building the connection string, and then calling another procedure to make the calls.
The 1st thing that the Main procedure does is calls a procedure to create a database in SQL Server. Then it builds the connection string as follows:
// multiple tests at this point
// test with pooling off, then on
// each test to be done with different settings of rowset
// to gauge performance
int iParamTest = 1;
while (iParamTest < 3)
{
string poolString = "";
switch (iParamTest)
case 1:
poolString = "Connection Pooling=False;";
break;
case 2:
poolString = "Connection Pooling=True;";
}
iRowSet = 0;
while (iRowSet < 301)
myconstr = "";
rowString = "Rowset Cache Size=" + iRowSet.ToString() + ";";
// build final connection string here
myconstr = cnStringDB2OLEDB + poolString + rowString;
After the connection string is built, it runs the query, 10 times as follows:
for (int iquery = 1; iquery < iQueryIteration + 1; iquery++)
testno++;
// start timer watch
watch.Start();
rows = RunDbConnectionTest(myconstr, DB2QueryString);
watch.Stop();
Console.WriteLine("Test # {0}, ParamString: {1}, Rowset: {2}, Elapsed Time {3}", testno, poolString, iRowSet, (int)watch.Elapsed.TotalMilliseconds);
UpdateSQLPerfTable(testno, poolString, iRowSet, /*(int)*/watch.Elapsed.TotalMilliseconds);
watch.Reset();
if (rows == -1) // error happened, exit
Environment.Exit(rows);
iRowSet = iRowSet + iCacheInc;
The UpdateSQLPerfTable() call logs the times into SQL Server for each query. Then the watch is reset. If an error was to happen (rows returns = -1), the program will exit. The loops will then repeat until all tests are run.
The RunDbConnectionTest() procedure builds the OleDbConnection object, creates a command object, and a data reader, and the command is executed. At this point the data is read in (by assigning each column to a string value). After all columns and rows are read in, the objects created are closed and disposed of accordingly. Any errors are caught in Try/Catch statements, which will return a -1 to the main procedure.
Once the data is in SQL Server, you can query against the table to see the differences at each setting. I ran a pivot query against the table, with the following results (times in seconds):
rowset
Connection Pooling=False;
Connection Pooling=True;
0
2.05654372
1.28185158
30
1.41009238
0.87079308
60
1.44409547
0.87172373
90
1.46562513
1.10562903
120
1.3914555
1.05005861
150
1.38745682
0.88520811
180
1.44412404
0.84566784
210
1.41111907
0.87019778
240
1.41200047
0.83981306
270
1.42906141
0.82953586
300
1.44249608
1.1492816
So, just looking at the averages, you can see there is a difference when using connection pooling. But looking at the raw numbers gives a better idea. My first query was to look at rowset 0:
TestNumber
Pooling
Seconds
1
2.65
2
2.44
3
1.78
4
2.50
5
1.94
6
1.91
7
1.83
8
9
1.84
10
111
1.89
112
1.31
113
1.21
114
1.20
115
1.19
116
117
118
119
Now, looking at the above, you can see that times without connection pooling were higher than when using connection pooling.
Next, I wanted to see if setting the Rowset Cache Size made any difference with and without connection pooling. Using a query to get the minimum of the duration across all the results, I was able to determine that in my test it was test #208, with a rowset of 270. I then issued a query to pull out the tests results using “rowset = 270”:
SELECT TestNumber, ParamString AS Pooling, rowset, CAST(durationMs / 1000 AS decimal(6, 2)) AS Seconds
FROM CSPerfTrials
WHERE (rowset = 270)
ORDER BY TestNumber, Pooling
This query gives me the following results:
91
1.39
92
1.41
93
94
1.40
95
96
97
1.46
98
1.52
99
100
1.48
201
1.30
202
0.87
203
0.78
204
0.77
205
0.76
206
207
208
0.75
209
In the above, with connection pooling turned off, the times for each query were between 1.39 and 1.52 seconds. However, with connection pooling, the initial query (TestNumber 201) took 1.30 seconds, and subsequent queries were significantly faster, and in most cases below .8 seconds. The savings in time (about 0.4 to 0.5 seconds) would be a benefit as more and more queries were being ran, and with this particular query, you can run 2 queries with connection pooling (after the initial connection) almost as fast as 1 query without connection pooling.
I ran another pivot query which summed all the times over 30 iterations of the query, and came back with this chart (in seconds), which shows the times savings a bit better:
55.7799701
36.5997885
42.7464447
23.7765142
42.1101308
24.9684609
42.5139897
23.7936823
41.9289347
24.7713371
42.5943495
24.1104076
42.2610654
23.8675952
42.2377875
24.3038118
43.31585
24.105081
42.7937661
25.1228602
42.2332467
23.6615868
And again, with 100 iterations of the query at each setting, using this SQL pivot query:
USE [DB2TestDatabases]
go
declare @comstr nvarchar(4000) -- command string
declare @success int -- variable for executing SQL commands
declare @params nvarchar(255) --
declare @tempstr nvarchar(1500)
declare param_cursor CURSOR FOR
select distinct ParamString
from [dbo].CSPerfTrials
set @tempstr = ''
OPEN param_cursor
FETCH NEXT FROM param_cursor into @params
WHILE @@FETCH_STATUS = 0
BEGIN
set @tempstr = @tempstr + '[' + @params + ']'
if @@FETCH_STATUS = 0
begin
set @tempstr = @tempstr + ', '
end
END
CLOSE param_cursor
DEALLOCATE param_cursor
set @comstr = 'select rowset, ' + @tempstr + ' from (select rowset, ParamString, durationms/1000 as durationms '
set @comstr = @comstr + 'from dbo.CSPerfTrials) AS rf PIVOT ( sum(durationms) FOR ParamString IN ('
set @comstr = @comstr + @tempstr + ')) AS p'
exec @success=sp_executesql @comstr
121.6649042
190.0007958
78.565167
144.2463663
78.1296505
141.0669099
77.8703857
142.5785544
78.3527013
142.036522
78.0573694
141.4086889
77.8508053
145.7663853
78.8497313
141.7899574
77.4381023
146.7418084
78.2401951
141.6710446
78.3692842
142.6525498
Executing 100 queries with connection pooling takes almost half the time it takes without, when used in conjunction with RowSet Cache Size, with this particular query.
For my final test, I wanted to see what impact this would have on a large table, where the results were not limited. I used this query (select count(*) from schema.table where amount BETWEEN 100.00 AND 110.00), takes 51 seconds from SQL to return the count of 1296 rows. So a good test to pull in those 1296 rows and read the data, 100 times for each connection string (100 * 11 * 2) for 2,200 reads against the table. I thought this would more accurately demonstrate trying to pull in a small amount of data across a large table.
After the tests were run, I ran a query to sum up the times for each setting, with these results (times are in seconds):
3721.2146623
4052.1730781
3720.9639454
3976.7106773
3712.2296273
4068.358878
3724.0935948
4026.1043348
3675.8706398
4084.5275822
3718.429541
3970.0594041
3689.3583401
4035.6269915
3785.033983
4010.1079362
3703.0019921
4007.4356928
3689.1961373
3973.2904402
3685.1553973
3944.8446177
So, I was thinking wow. Some savings, 200 – 300 seconds total. But, before you get excited, remember this was only across 100 queries. So, in reality, the savings is only 2 to 3 seconds for each query. Changing the query to show the average confirms this:
37.212146623
40.521730781
37.209639454
39.767106773
37.122296273
40.68358878
37.240935948
40.261043348
36.758706398
40.845275822
37.18429541
39.700594041
36.893583401
40.356269915
37.85033983
40.101079362
37.030019921
40.074356928
36.891961373
39.732904402
36.851553973
39.448446177
So, the savings, on average, is only 2 – 3 seconds for each query when accessing a large table, in my particular case.
As the above charts show, using Rowset by itself helps, but when you combine that with pooling, on small queries, a significant time savings can be achieved. Since the numbers are somewhat close once Rowset is implemented here, you should see significant improvement with a setting of at least 30 for the RowSet Cache Size. Also, when querying against a large table, implementing pooling can save a few seconds, but you may not see as much of a performance impact as you would against ‘smaller’ queries where data can be retrieved much faster.
One thing to note here – all the times above were generated on a system that had little overhead against a DB2 server with minimal resources being used. Your results will vary, depending on hardware, network speed and utilization, loads on DB2, loads on the servers running the tests. But, this test and the previous 2 should give an indication of which settings may be most beneficial in your environment.
The test hardware being used in all 3 tests were identical. Windows 2003 SP2 + SQL Server 2005 (with all service packs and fixes applied for both), 2GB of memory on a 100MB network. The queries were run against DB2/400 V5R4, which was a couple of hops away from the test servers.
I hope this series of articles will help in your performance testing. If you have any questions, please do not hesitate to ask.
The code for this particular test is below.
using System;
using System.Collections;
using System.Text;
using System.Data;
using System.Data.OleDb;
using System.Data.Common;
using System.Data.SqlClient;
using System.Diagnostics;
namespace DB2PerfTest
class Program
static bool bReadValues = true;
// connection string is missing rowset cache size and connection pooling options
// below query returns 3,581,029 rows without the fetch first statement
// limiting to 5000 as it is not realistic that many C# applications will be handling
// millions of rows at one time
static void Main(string[] args)
int rows = 0;
string rowString = "";
string myconstr = "";
int iRowSet = 0;
CreateSQLPerfTable();
Stopwatch watch = new Stopwatch();
int testno = 0;
iParamTest++;
} // iparamtest loop
} // end main
static int RunDbConnectionTest(string cnString, string cmString)
int result = 0;
try // outer try
OleDbConnection cn = new OleDbConnection(cnString);
cn.Open();
try
{ // inner try
OleDbCommand cm = cn.CreateCommand();
cm.Connection = cn;
cm.CommandText = cmString;
if (bReadValues)
OleDbDataReader dr = cm.ExecuteReader();
while (dr.Read())
int cnt = dr.FieldCount;
for (int i = 0; i < cnt; i++)
// simulate actually doing something with the data so it is 'read'
string msg = dr.GetValue(i).ToString();
finally
dr.Close();
cm.Dispose();
} // bReadValues
else // not reading data, just execute
result = cm.ExecuteNonQuery();
catch (Exception ex)
System.Console.WriteLine("");
System.Console.WriteLine(ex.Message);
result = -1;
} // !bReadValues
} // inner try
cn.Close();
cn.Dispose();
} // outer try
catch (Exception e)
System.Console.WriteLine(e.Message);
return result;
static void CreateSQLPerfTable()
SqlConnection mySqlConnection = new SqlConnection("Persist Security Info=True;Integrated Security=SSPI;database=DB2TestDatabases;server=(local);pooling=false");
SqlCommand mySqlCommand = mySqlConnection.CreateCommand();
mySqlConnection.Open();
// drop perf table
Console.WriteLine("Dropping CSPerfTrials table");
mySqlCommand.CommandText =
"IF EXISTS (SELECT * FROM sys.objects " +
"WHERE object_id = OBJECT_ID(N'[dbo].[CSPerfTrials]') " +
"AND type in (N'U')) " +
"DROP TABLE [dbo].[CSPerfTrials]";
int result = mySqlCommand.ExecuteNonQuery();
Console.WriteLine("mySqlCommand.ExecuteNonQuery() = " + result);
// create new table
"CREATE TABLE dbo.CSPerfTrials" +
"( TestNumber int NULL" +
", ParamString Nvarchar(2000) NULL" +
", rowset int NULL" +
", durationMs float NULL" +
")";
Console.WriteLine("Creating dbo.CSPerfTrials table");
result = mySqlCommand.ExecuteNonQuery();
mySqlConnection.Close();
static void UpdateSQLPerfTable(int testno, string ParamString, int rowset, double durationMS)
// added pooling=false to the sql connection string to resolve an error condition
// may not be needed when in all cases but on 'fast' reads this appears to be needed
Console.WriteLine("Connection state is " +
mySqlConnection.State.ToString());
Console.WriteLine("inserting data into SQL table dbo.CSPerfTrials");
mySqlCommand.CommandText = @"insert into dbo.CSPerfTrials " +
"(TestNumber" +
",ParamString" +
",rowset" +
",durationMS"
+ ") values (" +
testno.ToString() + ",'" +
ParamString + "'," +
rowset.ToString() + "," +
durationMS.ToString() + ")";
The SQL Queries I ran to give me the pivot charts is below. Modifying the “avg(durationms” to “sum(durationms” will give you the total times for each test run:
-- SSISPerf Pivot report
-- ***********************
-- Builds a Pivot report
set @comstr = @comstr + 'from dbo.CSPerfTrials) AS rf PIVOT ( avg(durationms) FOR ParamString IN ('