*Please note that the storage calculator has been updated since this blog was released and so some of the calculations in the blog may need to be altered. I'll post the updates as soon as I can...
The DPM 2007 Storage Calculator for Exchange Server was released last year and the blog that accompanied the release of the calculator provided a load of information about what the results actually mean but I thought I would attempt to deconstruct some of the calculations to understand a bit better for myself where the results have come from. For the purposes of this blog I am going to focus on the two main factors that are going to concern most large scale DPM designs; the number of DPM servers that should be deployed and the storage capacity required for each DPM server. There are numerous other calculations that the calculator produces including the number of storage groups per protection group and RAM and processor core sizing guidelines but I will not go into these here. I am not going to make any recommendations about how to size DPM either but I hope this article will be of use when you are involved in discussions over designing for DPM...
I'm going to start by running through some of the data that the calculator uses with the name values from the calculator itself:
Storage Capacity Requirement for each DPM Server
This is the difficult bit but the calculations for storage capacity and server numbers are dependant on each other so might as well start here. The capacity requirement is based on a series of calculations as follows:
Or to put in a single formula:
Capacity required per DPM Server = (((Replica Size per SG + Recovery Point Volume per SG) * No. of storage groups per Exchange Server) * No. of Exchange Servers)/No. of DPM Servers
So the basis of the capacity calculation is an understanding of the 'Replica Size' requirement and the 'Recovery Point Volume Size' requirement at a single storage group level. The replica size is the amount of space required for a single mailbox database plus changes. The recovery point volume size is based on the number of transaction logs generated on an Exchange Server and the time for which the logs are kept; the retention range.
...so for replica size; think database, and for recovery point; think transaction logs.
The formula used by the DPM storage calculator to understand the 'Replica Size' is as follows:
Replica Size = ExSGSize*(1+((2 * CalcNumExSGTLogs*varExLogSize)/ExSGSize)/1024)/0.85 * (1 + DPMStorageSafetyNet)
To put it a little more simply:
Replica Size = Single database disk space requirement*(1+((2 * Avg no. of TL's generated per storage group per day in MB)/Single database disk space requirement)/1024)/0.85 + safety buffer
So if we break this out into the different parts as follows:
So in summary we determine the rate of change to the database based upon the number of transaction logs generated per storage group each day. To account for the difference between database changes in terms of database pages and corresponding disk blocks versus the numbers of transaction logs generated we add approximately 18% (or more accurately divide by 0.85). The replica size is the result of this calculation added to the original database disk space required per storage group value.
(Note that the retention range of data on disk in not included in the calculations. In other words whether the data is retained for 7 days or 14 days makes no difference to the capacity requirements of the replica volume.)
So to express the formula even more simply:
Replica Size = 1GB + (database changes per day/0.85) + % safety buffer
The formula used by the storage calculator to understand the 'Recovery Point Volume Size' is more simple and is as follows:
Recovery Point Volume Size = 3 * DPMRetentionRange * ((CalcNumExSGTLogs*varExLogSize)/1024) * (1 + DPMStorageSafetyNet) + 1
Recovery Point Volume Size = 3 * Retention Range * (Avg no. of TL's generated per server per day in MB/1024) + safety buffer + 1
So in summary to determine the size of the recovery point volume you take the retention range and multiply this number of days by 3 to take into account a tolerance to DPM failure and to factor in days where the number of transaction logs generated is relatively high; multiply this value by the average number of transaction logs generated per storage group and add a safety buffer followed by 1GB.
Recovery Point Volume Size = 3 * Retention Range * Avg no. of TL's generated per server per day in GB + safety buffer + 1GB
Required Number of DPM Servers
The formula used by the storage calculator to understand the number of servers that you require is very simple. It is as follows:
No. of DPM Servers = MAX(ROUNDUP((NumExSvrs*SG)/250, 0), ROUNDUP(TotalDPMStoragePerExSvr * NumExSvrs/varMaxStorageOnDPM, 0))
In plain text the result is the greater of 2 values:
The two values that form the basis of this calculation are recommendations from the DPM product group following their own testing of the product. I believe DPM will scale beyond these guidelines but any design that does exceed these recommended maximums must be validated by appropriate testing for both performance and manageability.
An Example Calculation:
So just to illustrate these calculations I am going to make up some figures and work out the number of DPM Servers that would be required based upon these calculations and the storage capacity per DPM Server.
Example data:
Capacity required per DPM Server = (((Replica Size per SG + Recovery Point Volume per SG) * No. of storage groups per Exchange Server) * No. of Exchange Servers)/No. of DPM Servers => (((53GB + 183GB) * 50) * 5)/No. of DPM Servers => 59,000GB/No. of DPM Servers No. of DPM Servers = the greater of... (No. of Storage Groups per Exchange Mailbox Role Server * No. of Exchange Servers)/250 => (50 * 5)/250 = 1 AND, (Total Storage Capacity to protect all Exchange Servers/40TB) => (((53GB + 183GB) * 50) * 5)/40TB = 2 No. of DPM Servers = 2 Capacity required per DPM Server = 59,000GB/2 = 29,500GB or 29.5TB
Capacity required per DPM Server = (((Replica Size per SG + Recovery Point Volume per SG) * No. of storage groups per Exchange Server) * No. of Exchange Servers)/No. of DPM Servers => (((53GB + 183GB) * 50) * 5)/No. of DPM Servers => 59,000GB/No. of DPM Servers
No. of DPM Servers = the greater of...
(No. of Storage Groups per Exchange Mailbox Role Server * No. of Exchange Servers)/250 => (50 * 5)/250 = 1
AND,
(Total Storage Capacity to protect all Exchange Servers/40TB) => (((53GB + 183GB) * 50) * 5)/40TB = 2
No. of DPM Servers = 2
Capacity required per DPM Server = 59,000GB/2 = 29,500GB or 29.5TB
So in this example the requirement is for 2 DPM Servers each with 29.5TB of storage capacity.
Some Conclusions
Consultants and architects designing DPM solutions will use the DPM Storage Calculator to provide a quick and accurate way of calculating their DPM requirements, which is aligned with the recommendations and guidelines from Microsoft and the DPM product group. I hope this blog makes it a bit easier to understand where these calculations come from. Of course whichever method you use to size your DPM deployments it is vital that your design is verified with appropriate testing....
The DPM 2007 Storage Calculator for Exchange Server was released last year and the blog that accompanied
I wrote a blog some time ago deconstructing the calculations within the DPM Storage Calculator. 
..thought I'd pass on some of the links to blogs that I read on a regular basis and particular articles
Transaction logs on mailbox server - if message is destined to another mailbox on the same server, is the message represented twice in the transaction log - once when the message was sent and again when received from hub?