Logging all managed exceptions --- CDB.exe -pn LoadGenWin.exe -cf clr.txt --- You'd log all managed exceptions when something goes wrong.
· CDB.exe -pn LoadGenWin.exe
· .logopen /t C:\clr.log
· .loadby sos mscorwks; !eeversion; sxe -c "!clrstack; !pe; gc" clr
Snapping suspicious managed exceptions (1st chance) --- CDB.exe -pn LoadGenWin.exe -cf 1st.txt --- You have to edit script to snap suspicious exceptions.
· .loadby sos mscorwks; !eeversion; sxi -c "!soe Microsoft.Mapi.MapiExceptionNetworkError 5; !soe System.NullReferenceException 6; !soe System.ArgumentNullException 7; !soe System.OutOfMemoryException 8; .if @@((@$t5==0) && (@$t6 == 0) && (@$t7 == 0) && (@$t8 == 0)) { !pe -nested; gc } .else { !threads; clrstack -a; !dso; !pe -nested; .dump /ma /u c:\\1st.dmp; gc }" clr; gc
Snapping unhandled managed exceptions (2nd chance) --- CDB.exe -pn LoadGenWin.exe -cf 2nd.txt --- You'd snap exceptions unhandled when process crashes.
· .loadby sos mscorwks; !eeversion; sxi -c "!threads; clrstack -a; !dso; !pe -nested; .dump /ma /u c:\\2nd.dmp; qd" clr; gc
Snapping Process Exit --- CDB.exe -pn LoadGenWin.exe -cf exit.txt --- You'd take a snap when process suddenly exits itself.
· .loadby sos mscorwks; !eeversion; sxi -c "!dumpheap -stat -min 100; ~*e!clrstack -a; !threads; !dso; !pe; .dump /ma /u C:\\exit.dmp" epr; gc
Snapping Process Hang --- CDB.exe -pn LoadGenWin.exe -cf hang.txt --- You'd take multiple snaps when process goes hang.
· .dump /ma /u C:\hang.dmp; .loadby sos mscorwks; !eeversion; !dumpheap -stat -min 100; !threads; !clrstack -a; !dso; !pe; qd
JetstressCmd has an undocumented /report command --- which means this function is unsupported.
I couldn't publish a neat command interface due to a tight schedule for this function. The program should be able to find all the arguments such as machine name, process name, process id, and start and end times. But, it currently requires you to specify so many arguments about the host machine.
D:\Jetstress\JetstressCmd.exe /c JetstressConfig.xml /report "ROSWELL; JetstressWin; 0; Performance_2007_5_30_10_13_54.blg; 5/30/2007 10:15:49 AM; 5/30/2007 12:15:51 PM”
5/31/2007 12:59:31 AM -- Command Line: D:\Jetstress\JetstressCmd.exe /c JetstressConfig.xml /report "ROSWELL; JetstressWin; 0; Performance_2007_5_30_10_13_54.blg; 5/30/2007 10:15:49 AM; 5/30/2007 12:15:51 PM"
5/31/2007 12:59:31 AM -- Database read latency thresholds: (average: 0.02 seconds/read, maximum: 0.05 seconds/read).
5/31/2007 12:59:31 AM -- Log write latency thresholds: (average: 0.01 seconds/write, maximum: 0.05 seconds/write).
5/31/2007 12:59:32 AM -- Creating test report ...
5/31/2007 12:59:37 AM -- Volume F: has 0.0000 for Avg. Disk sec/Read.
5/31/2007 12:59:37 AM -- Volume G: has 0.0000 for Avg. Disk sec/Read.
5/31/2007 12:59:37 AM -- Volume H: has 0.0000 for Avg. Disk sec/Write.
5/31/2007 12:59:37 AM -- Volume H: has 0.0000 for Avg. Disk sec/Read.
5/31/2007 12:59:37 AM -- Volume I: has 0.0000 for Avg. Disk sec/Write.
5/31/2007 12:59:37 AM -- Volume I: has 0.0000 for Avg. Disk sec/Read.
5/31/2007 12:59:37 AM -- Test has 0 Maximum Database Page Fault Stalls/sec.
5/31/2007 12:59:37 AM -- Test has 0 Database Page Fault Stalls/sec samples higher than 0.
5/31/2007 12:59:37 AM -- Performance_2007_5_30_10_13_54.xml has 479 samples queried.
ROSWELL is the machine name that is part of performance counters and instances.
JetstressWin is the process name that is part of Jet database performance counter instance names. You have to specify JetstressCmd if JetstressCmd has the performance log generated.
0 is the process id that is part of Jet database performance counter instance names. You don’t have to specify the process id unless you used NAS (network attached storage).
NOTE: You may see unstable/uneven latency slope failure if Jetstess cannot see stable latency slope within 10 minutes.
The tuning process takes many tuning cycles until it succeeds or fails in tuning goals. It starts with the initial parameter values (which can be loaded from a previous successful test configuration file).
In each tuning cycle, there are two internal states: transient state and stable state. The transient state may vary from a few seconds to a few minutes, depending on the disk sub-system.
In the transient state, it collects 30 samples of database read latencies and switches to the stable state when it sees the slope of latency change is less than 5 milliseconds per sample.
In the stable state, it collects 120 samples of database and log latencies and then moves onto next tuning cycle with appropriate parameter value changes.
It succeeds the whole tuning phase when it sees the tuning goals are met. It fails the whole tuning phase, if it cannot meet the tuning goals within an hour.
It also fails the whole tuning phase, if it cannot see stable latency change slope within 10 minutes in each tuning cycle.
The following pseudocode make 'evanLatencies' false when it sees greater than 0.5 ms change each sample.
double slope = getSlope(getDatabaseDiskReadAverageLatencies(instanceId));
if (usesNasDevice) // Database Performance Counter
{
if (slope > 0.5) evenLatencies = false; // 0.5 ms. per sample
}
else // Logical Disk Performance Counter
{
if (slope > 0.0005) { evenLatencies = false; } // 0.5 ms. per sample
}
Jetstress quits a test run prematurely while attaching databases at the same time.
Internally, there is unhandled JET_errDatabaseSignInUse due to a race condition.
This is a known issue --- I will make available the fix in July web release.
Here is an example of log out text window:
3/28/2007 2:32:01 PM -- Jetstress testing begins ...
3/28/2007 2:32:01 PM -- Prepare testing begins ...
3/28/2007 2:32:03 PM -- Attaching databases ...
3/28/2007 2:32:03 PM -- Prepare testing ends.
3/28/2007 2:32:03 PM -- Dispatching transactions begins ...
3/28/2007 2:32:03 PM -- Database cache settings: (minimum: 256.0 MB, maximum 2.0 GB)
3/28/2007 2:32:03 PM -- Database flush thresholds: (start: 20.5 MB, maximum 41.0 MB)
3/28/2007 2:41:01 PM -- Jetstress testing ends.
Performance counter thresholds are updated. But, there are some differences from Exchange 2003 Performance Troubleshooting Guide for some reasons.
Database Disks:
· Average Disk sec/Read (or I/O Database Reads Average Latency): The average should be below 20 ms. Spikes (maximum values) should not be higher than 50 ms; more than 6 violations fail the test.
Transactional Log Disks:
· Average Disk sec/Write (or I/O Log Writes Average Latency): The average should be below 10 ms. Spikes (maximum values) should not be higher than 50 ms; more than 6 violations fail the test.
Here are the latest performance counter thresholds:
|
Disk Sub-system Thresholds |
Performance |
Stress |
Troubleshooting Guide |
|
Data Disk Read Latency (average) |
20 ms |
20 ms |
|
Data Disk Read Latency (maximum) |
50 ms |
100 ms |
50 ms |
|
Data Disk Write Latency (average) |
(n/a) |
20 ms |
|
Data Disk Write Latency (maximum) |
50 ms |
|
Log Disk Read Latency (average) |
(n/a) |
5 ms |
|
Log Disk Read Latency (minimum) |
50 ms |
|
Log Disk Write Latency (average) |
10 ms |
10 ms |
|
Log Disk Write Latency (maximum) |
50 ms |
100 ms |
50 ms |
|
Host Computer Thresholds |
Performance/Stress |
|
% Processor Time (maximum) |
90% |
|
% Processor Time (average) |
80% |
|
Available Memory Bytes (minimum) |
50 MB |
|
Page Table Entries (minimum) |
5000 |
|
Transition Pages Repurposed/Sec (average) |
100 |
|
Memory Pages Per Sec (average) |
100 |
|
Pool Nonpaged Bytes (maximum) |
75 MB |
|
Pool Paged Bytes (maximum) |
180 MB |
|
Database Page Fault Stalls Per Sec (average) |
1.0 |
Reference: Troubleshooting Exchange Server 2003 Performance
This posting is provided "AS IS" with no warranties, and confers no rights.
There is a known issue:
Symptom: If you have Standards and formats to be German (Germany) or Dutch (Belgium), you will have all zeros for European numbers in all test reports as follows:
|
LogicalDisk |
Avg. Disk sec/Read |
Avg. Disk sec/Write |
Disk Reads/sec |
Disk Writes/sec |
Avg. Disk Bytes/Write |
|
Database (F:) |
0 |
0 |
0 |
0 |
(n/a) |
|
Database (J:) |
0 |
0 |
0 |
0 |
(n/a) |
|
Log (E:) |
0 |
0 |
0 |
0 |
0 |
|
Log (I:) |
0 |
0 |
0 |
0 |
0 |
Workaround: Download the Stylesheet.xsl to the test output directory. Open the test report xml files (not html files) so that Stylesheet.xml will render them properly as follows:
|
LogicalDisk |
Avg. Disk sec/Read |
Avg. Disk sec/Write |
Disk Reads/sec |
Disk Writes/sec |
Avg. Disk Bytes/Write |
|
Database (F:) |
0,0179525684841602 |
0,0181200286925999 |
734,781140788265 |
475,885350221015 |
(n/a) |
|
Database (J:) |
0,0162713383315612 |
0,0161009084228538 |
725,950885410535 |
472,687259736909 |
(n/a) |
|
Log (E:) |
0,00978239983279157 |
0,000545113645105628 |
1,2681714921609 |
331,411668761336 |
3584,35984241272 |
|
Log (I:) |
0,00729845620356779 |
0,000446559196223516 |
1,26356959361992 |
331,678115958897 |
3577,9008088978 |
This is a known issue in Jetstress.
This is so benign that it never affects your testing in terms of correctness, performance, and security.
Please, train yourself to ignore this benign error until I can make the fix available in a later web release:
Event Type: Error
Event Source: ESE
Event Category: Logging/Recovery
Event ID: 215
Date: 3/28/2007
Time: 10:47:26 PM
User: N/A
Computer: (machine-name)
Description:
JetstressWin (2760) Instance2760.4: The backup has been stopped because it was halted by the client or the connection with the client failed.
Reference: KB 810333: You may receive the "ESE event ID 215 the backup has been stopped because it was halted by the client"
Jetstress streaming backup has a comparable IO on disk sub-systems of the Exchange backup. It uses the Exchange Database Engine functions:
· JetBeginExternalBackupInstance
· JetOpenFileInstance
· JetReadFileInstance
· JetCloseFileInstance
· JetEndExternalBackupInstance2 --- this call always logs the event ID 215.
Using ExternalBackup reports the error by putting the backup instance in ‘statDatabases’ state.
if ( m_fBackupStatus == backupStateDatabases )
{
// if backup client calls BackupEnd without error before logs are read, force the backup as "with error"
The user always sees the event ID 215. The fix must be using Surrogate backup instead of External backup.
Question:
I have uneven distribution of users across storage groups. Storage group 1 has 700 GB with 739 users and storage group 2 has 350 GB with 261 users.
Can I run multiple instances of JetstressWin.exe or JetstressCmd.exe (one instance per storage group) for database files of different sizes that I have manually prepared?
Answer:
I should discourage running multiple instances of Jetstress --- that can possibly lead to conflicting database performance counter names. I should encourage even distribution of load generation across storage groups.
Jetstress makes even load distribution by default. But, Jetstress accommodates the case where a user can set in the configuration file to use different numbers of IOPS for each storage group.
The following configuration has two things to note:
· Each storage group has four database files on J:\ and 2 database files on K:\ --- every database file has the same size when they are prepared.
· Each storage group has IOPS bias numbers --- each database transactions randomly picks a database instance (also called a storage group) based on IOPS bias. The distribution will be like 73.9% of transactions on storage group 1 and 26.1% of transactions on storage group 2.
<StorageGroups>
<StorageGroup IopsBias="739">
<DatabasePaths>
<Path>J:\</Path>
<Path>J:\</Path>
<Path>J:\</Path>
<Path>J:\</Path>
</DatabasePaths>
<LogPath>E:\</LogPath>
</StorageGroup>
<StorageGroup IopsBias="261">
<DatabasePaths>
<Path>K:\</Path>
<Path>K:\</Path>
</DatabasePaths>
<LogPath>F:\</LogPath>
</StorageGroup>
</StorageGroups>
Jetstress logs batch transactions stats at the end of a test as follows.
The following log text shows that Jetstress has run 132505 and 46791 batch transactions (74% and 26%) on storage group 1 and 2.
3/16/2007 12:39:43 PM -- JetInterop batch transaction stats: 132505, and 46791.
This posting is provided "AS IS" with no warranties, and confers no rights.
--- Excerpted from Jetstress CHM help documentation ---
There are options to size the test databases using the percentage of the maximum storage capacity, and target I/O throughput (IOPS) by the percentage of the maximum throughput capacity of the disk subsystem.
Jetstress reserves 25% of the initial database file size for its future growth during test runs. For example, if you decide to size database 100% of the storage capacity of 100 GB, Jetstress creates initial databases of 80 GB and reserves 25% of 80 GB (20 GB) for the database file growth.
--- Here is an example of a main success scenario ---
If you choose 100% storage capacity for 100 GB LUN, the initial database size will be 80 GB and the reserved space for the database file growth will be 20 GB (which is 25% of the initial database file size).
If you choose 80% throughput percentage, in the first tuning phase, Jetstress determines the maximum throughput, for say 2000 IOPS, and in the second tuning phase, it determines the database transaction cycle rate/interval for 1600 IOPS (which is 80% of 2000 IOPS), and then, if the tuning phases got succeeded, it will sustain the disk sub-system at 1600 IOPS for the test duration, for say 2 hours (default).
AdPlus hang-mode creates a directory in the current directory (or the output directory if specified) called Hang_Mode__Date_mm-dd-yyyy__Time_HH-MM-SS. This means you can continue to run the same ADPlus command line repeatedly without losing your data.
1. Install Debugging Tools for Windows 64-bit Version --- Download 6.6.7.6 , or
Install Debugging Tools for Windows 32-bit Version --- Download 6.6.7.6
2. (optional) Execute this adplus crash-mode command to take a full memory dump of process exits (normal and abnormal exits):
D:\Debuggers\adplus.vbs -c CRASH-BreakOnExitProcess.xml -pn JetstressWin.exe
3. Execute this adplus hang-mode command to take a full memory dump of process hangs:
D:\Debuggers\adplus.vbs -hang -pn JetstressWin.exe , or
D:\Debuggers\adplus.vbs -c Hang-Dump.xml -pn JetstressWin.exe
NOTE: I assume that you have Debugging Tools installed at D:\Debuggers.
The Hang-Dump.xml has custom actions to log unmanaged handles, managed threads, managed stacks, and managed objects (over 100 bytes in size).
If you wanted to be hard-core, you could use debugger commands to take snaps for 2nd chance managed exception (clr) and process exit (epr) events.
cdb.exe -pn JetstressWin.exe
.logopen /t D:\Debuggers\Jetstress.log
sxi -c ".loadby sos mscorwks; !eeversion; !threads; !clrstack; !pe; gc" -c2 "!threads; ~*e!clrstack; .dump /ma /u D:\\Debuggers\\Crash.dmp; gn" clr
sxi -c ".loadby sos mscorwks; !eeversion; !handle 0 f; !threads; ~*e!clrstack; !dumpheap -stat -min 100; .dump /ma /u D:\\Debuggers\\ProcessExit.dmp" epr
gc
... ... ... as time goes by ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Press {Ctrl} + {Break} to break the process, and then you could use debugger commands to take snaps for process hangs.
.loadby sos mscorwks; !eeversion; !handle 0 f; !threads; ~*e!clrstack; !dumpheap -stat -min 100; .dump /ma /u D:\\Debuggers\\ProcessHang.dmp; gc
NOTE: I assume that you want to keep .log and .dmp files at D:\Debuggers.
KB 286350: How to use ADPlus to troubleshoot "hangs" and "crashes"
1. Jetstress or Load Generator would send ‘offline’ Watson reports to the admin queue --- which is located at “%SystemRoot%\PCHEALTH\ERRORREP\QSignoff”--- you can send them through a file transfer service on your own.
2. The customer uses eventvwr.msc to see the application event generated.
3. The customer makes a note on the bucketing parameters --- P7 has a hash code of stack trace (eb22 in the following image) --- this parameter makes it easier to find this bucket from flooding error reports.

4. The customer logs on as an administrator and deletes a registry value named “LastQueuePesterTime” which is located at [HKEY_CURRENT_USER\Software\Microsoft\PCHealth\ErrorReporting\DW].
6. The customer logs off and then logs on as an administrator.
7. The customer takes a coffee break to see Windows Error Reporting to pop up within 4 minutes (a.k.a. QueuePesterUI).
Exchange Server Jetstress (08.01.0075)
March 7, 2007. Verify the performance and stability of the disk subsystem before putting the Exchange server into a production environment.
Here is a list of issues that are all fixed in this March release.
Months ago, I got a storage vender's feedback that his job will become a lot easier if he can define additional performance counters for his storage hardware to the performance log. I decided to provide him with a convenient way to add new performance counters (but, no way to suppress existing performance counters).
When you save your configuration to an xml file that has a list of performance counters as follows:
<PerfLibs>
<Include Name="host">
<CounterPath>\LogicalDisk(*)\*</CounterPath>
<CounterPath>\Memory\*</CounterPath>
<CounterPath>\Processor(*)\*</CounterPath>
<CounterPath>\Process(*)\*</CounterPath>
<CounterPath>\System\*</CounterPath>
<CounterPath>\PhysicalDisk(*)\*</CounterPath>
<CounterPath>\Network Interface(*)\*</CounterPath>
</Include>
</PerfLibs>
You can define additional performance counters in an xml config file from which you will open the configuration later on. The following example shows how to add .NET CLR Memory counters for JetstressWin process and all Windows Kernel objects.
<PerfLibs>
<Include Name="host">
<CounterPath>\LogicalDisk(*)\*</CounterPath>
<CounterPath>\Memory\*</CounterPath>
<CounterPath>\Processor(*)\*</CounterPath>
<CounterPath>\Process(*)\*</CounterPath>
<CounterPath>\System\*</CounterPath>
<CounterPath>\PhysicalDisk(*)\*</CounterPath>
<CounterPath>\Network Interface(*)\*</CounterPath>
<CounterPath>\.NET CLR Memory(JetstressWin)\*</CounterPath>
<CounterPath>\Objects\*</CounterPath>
</Include>
</PerfLibs>
NOTE: This feature is so un-documented and no fully supported. But, it can help you quite a bit sometimes. I believe customer interaction lead to effective designs and quality improvements.
This posting is provided "AS IS" with no warranties, and confers no rights.
Here is a list of issues that are all fixed in the March release (08.01.0075 build):
-
Exchange 2003 database engine (ese.dll ) should have 500 MB and 900 MB database cache size min and max.
-
DataGridView: System.InvalidOperationException: This operation cannot be performed while an auto-filled column is being resized.
-
Help: the minimum cache size would be 128 MB for 4 storage groups.
-
Should not tune to throughput percentage when tuning suppressed.
-
Make more sense to have a label: “Suppress tuning and use thread counts (per-storage group).
-
WATSON: Invalid Jet interop operation. Source: JetEndSession. Error: JET_errSuccess, Successful Operation.
-
Should not allow putting same logs and databases from same SG on the same LUN.
-
HTM report is failing tests based on max database disk latencies when logs and db's are place on the same volume.
-
Warning about making sure aggregate database size is large enough for an accurate test.
-
No report html created for stress tests --- which is longer than 6 hours.
This posting is provided "AS IS" with no warranties, and confers no rights.
Exchange Load Generator (08.01.0061)
February 9, 2007. Perform benchmarking, pre-deployment validation, and stress testing tasks that introduce various types of workloads into a test (non-production) Exchange messaging system. Simulate the delivery of multiple MAPI client messaging requests to an Exchange server.